Created attachment 287271 [details]
kernel log with error
iwlwifi crashes with BAD_COMMAND error on the AC3168NGW card shipped with the Motile m142. This is a separate issue from bug #206329, although the motile also suffers from that bug. The first bad commit is 39c1a9728f93: refactor the SAR tables from mvm to acpi. I haven't been able to revert the commit cleanly.
My work-around is to use the iwlwifi driver from v5.4.
Completely a shot in the dark since I'm not familiar with the driver code and I can't reproduce the problem but...
Something that looks a bit suspicious in that "iwlwifi: refactor the SAR tables from mvm to acpi" commit is that whereas before iwl_mvm_sar_geo_init used to return early without sending a command if iwl_mvm_sar_geo_support or iwl_mvm_sar_get_wgds_table failed, with that commit now that those calls are split into a separate function iwl_sar_geo_init, it will go ahead anyway and send the command anyway. Unless I am missing something this seems like a behaviour change in what should be a refactor.
Any chance it does work if you apply the following patch (in addition to the one from #206329)? Again, I'm just guessing from the code without being able to reproduce, so sorry if this doesn't work.
drivers/net/wireless/intel/iwlwifi/mvm/fw.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/drivers/net/wireless/intel/iwlwifi/mvm/fw.c b/drivers/net/wireless/intel/iwlwifi/mvm/fw.c
index c09624d8d..8a7f12498 100644
@@ -753,6 +753,7 @@ static int iwl_mvm_sar_geo_init(struct iwl_mvm *mvm)
cmd.geo_cmd.ops = cpu_to_le32(IWL_PER_CHAIN_OFFSET_SET_TABLES);
+ return 0;
cmd.geo_cmd.table_revision = cpu_to_le32(mvm->fwrt.geo_rev);
Nice! That seems to work, so far, in both the vanilla kernel and gentoo 5.5.2-r1. I see no errors in dmesg or signs of any problems on the network. I'm not sure what would be required to fix this in the general case.
Created attachment 287283 [details]
Fix error validation in iwl_mvm_sar_geo_init / iwl_sar_geo_init after 39c1a9728f93
Here's a patch that should (hopefully) fix the problem correctly by not sending the GEO_TX_POWER_LIMIT command in the same cases it was doing before 39c1a9728f93.
I haven't been able to test this on a 100% real case but I've found a way to hack up the driver to get a trace like Duane's. Applying this patch fixes that.
Attachment 287283 [details] works on the Motile.
Great, this confirms that the cause of the problem is the change in the error handling logic in 39c1a9728f93. If so, and since this change was likely accidental, there should hopefully be relatively little problem getting this patch accepted.
(In reply to Duane Robertson from comment #5)
> Attachment 287283 [details] works on the Motile.
Works for me too
could this be a duplicate of https://bugzilla.kernel.org/show_bug.cgi?id=206395? The upstream fix for that bug landed in 5.6:
Yes, this and that bug are the same.
5.6 appears to have fixed the issue, at least in the gentoo sources.