Created attachment 287271 [details]
kernel log with error
iwlwifi crashes with BAD_COMMAND error on the AC3168NGW card shipped with the Motile m142. This is a separate issue from bug #206329, although the motile also suffers from that bug. The first bad commit is 39c1a9728f93: refactor the SAR tables from mvm to acpi. I haven't been able to revert the commit cleanly.
My work-around is to use the iwlwifi driver from v5.4.
Completely a shot in the dark since I'm not familiar with the driver code and I can't reproduce the problem but...
Something that looks a bit suspicious in that "iwlwifi: refactor the SAR tables from mvm to acpi" commit is that whereas before iwl_mvm_sar_geo_init used to return early without sending a command if iwl_mvm_sar_geo_support or iwl_mvm_sar_get_wgds_table failed, with that commit now that those calls are split into a separate function iwl_sar_geo_init, it will go ahead anyway and send the command anyway. Unless I am missing something this seems like a behaviour change in what should be a refactor.
Any chance it does work if you apply the following patch (in addition to the one from #206329)? Again, I'm just guessing from the code without being able to reproduce, so sorry if this doesn't work.
drivers/net/wireless/intel/iwlwifi/mvm/fw.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/drivers/net/wireless/intel/iwlwifi/mvm/fw.c b/drivers/net/wireless/intel/iwlwifi/mvm/fw.c
index c09624d8d..8a7f12498 100644
@@ -753,6 +753,7 @@ static int iwl_mvm_sar_geo_init(struct iwl_mvm *mvm)
cmd.geo_cmd.ops = cpu_to_le32(IWL_PER_CHAIN_OFFSET_SET_TABLES);
+ return 0;
cmd.geo_cmd.table_revision = cpu_to_le32(mvm->fwrt.geo_rev);
Nice! That seems to work, so far, in both the vanilla kernel and gentoo 5.5.2-r1. I see no errors in dmesg or signs of any problems on the network. I'm not sure what would be required to fix this in the general case.
Created attachment 287283 [details]
Fix error validation in iwl_mvm_sar_geo_init / iwl_sar_geo_init after 39c1a9728f93
Here's a patch that should (hopefully) fix the problem correctly by not sending the GEO_TX_POWER_LIMIT command in the same cases it was doing before 39c1a9728f93.
I haven't been able to test this on a 100% real case but I've found a way to hack up the driver to get a trace like Duane's. Applying this patch fixes that.
Attachment 287283 [details] works on the Motile.
Great, this confirms that the cause of the problem is the change in the error handling logic in 39c1a9728f93. If so, and since this change was likely accidental, there should hopefully be relatively little problem getting this patch accepted.
(In reply to Duane Robertson from comment #5)
> Attachment 287283 [details] works on the Motile.
Works for me too
could this be a duplicate of https://bugzilla.kernel.org/show_bug.cgi?id=206395? The upstream fix for that bug landed in 5.6:
Yes, this and that bug are the same.
5.6 appears to have fixed the issue, at least in the gentoo sources.
Just for the record here, bugzilla.kernel.org is really not the preferred way to reach mainline Linux developers. Some people may make best efforts to watch this bug tracker, but especially in cases where you have a bisection, it's preferred to reach out to email@example.com (perhaps including the bugzilla link, for more details).
The icing on the cake here is that the patch provided at comment #4 is effectively identical to the patch developed and accepted independently at comment #8, just with a month delay.
There's a little more info on bug reporting here:
Regardless, thanks for reporting.
@Brian Norris: Thanks, this is good to know. I assumed that the devs would take a look at the Bugzilla, even if sparingly. Maybe they generally do, but for some reason iwlwifi seemed to be at a standstill at that time (c.f. bug 206329). Anyway, I guess this is an indication I should get more familiar with the kernel dev. procedures, so I'll check your link out (I often run mainline RCs or nigthlies so I may be able to catch some issues early now and then).