Bug 206479

Summary: [iwlwifi, v5.5 regression, bisected] Wireless-AC 3168NGW no longer initializes
Product: Networking Reporter: Duane Robertson (duane)
Component: WirelessAssignee: networking_wireless (networking_wireless)
Severity: normal CC: belegdol, briannorris, bugzilla, grawcho, janek+kernel, joanbrugueram, kernel, me, stiletto
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: 5.5 Subsystem:
Regression: Yes Bisected commit-id:
Attachments: kernel log with error
Fix error validation in iwl_mvm_sar_geo_init / iwl_sar_geo_init after 39c1a9728f93

Description Duane Robertson 2020-02-10 01:59:01 UTC
Created attachment 287271 [details]
kernel log with error

iwlwifi crashes with BAD_COMMAND error on the AC3168NGW card shipped with the Motile m142. This is a separate issue from bug #206329, although the motile also suffers from that bug. The first bad commit is 39c1a9728f93: refactor the SAR tables from mvm to acpi. I haven't been able to revert the commit cleanly.
Comment 1 Duane Robertson 2020-02-10 02:14:50 UTC
My work-around is to use the iwlwifi driver from v5.4.
Comment 2 joanbrugueram 2020-02-10 13:25:36 UTC
Completely a shot in the dark since I'm not familiar with the driver code and I can't reproduce the problem but...

Something that looks a bit suspicious in that "iwlwifi: refactor the SAR tables from mvm to acpi" commit is that whereas before iwl_mvm_sar_geo_init used to return early without sending a command if iwl_mvm_sar_geo_support or iwl_mvm_sar_get_wgds_table failed, with that commit now that those calls are split into a separate function iwl_sar_geo_init, it will go ahead anyway and send the command anyway. Unless I am missing something this seems like a behaviour change in what should be a refactor.

Any chance it does work if you apply the following patch (in addition to the one from #206329)? Again, I'm just guessing from the code without being able to reproduce, so sorry if this doesn't work.

 drivers/net/wireless/intel/iwlwifi/mvm/fw.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/net/wireless/intel/iwlwifi/mvm/fw.c b/drivers/net/wireless/intel/iwlwifi/mvm/fw.c
index c09624d8d..8a7f12498 100644
--- a/drivers/net/wireless/intel/iwlwifi/mvm/fw.c
+++ b/drivers/net/wireless/intel/iwlwifi/mvm/fw.c
@@ -753,6 +753,7 @@ static int iwl_mvm_sar_geo_init(struct iwl_mvm *mvm)
 	cmd.geo_cmd.ops = cpu_to_le32(IWL_PER_CHAIN_OFFSET_SET_TABLES);
 	iwl_sar_geo_init(&mvm->fwrt, cmd.geo_cmd.table);
+	return 0;
 	cmd.geo_cmd.table_revision = cpu_to_le32(mvm->fwrt.geo_rev);
Comment 3 Duane Robertson 2020-02-10 20:19:21 UTC
Nice! That seems to work, so far, in both the vanilla kernel and gentoo 5.5.2-r1. I see no errors in dmesg or signs of any problems on the network. I'm not sure what would be required to fix this in the general case.
Comment 4 joanbrugueram 2020-02-11 01:08:52 UTC
Created attachment 287283 [details]
Fix error validation in iwl_mvm_sar_geo_init / iwl_sar_geo_init after 39c1a9728f93

Here's a patch that should (hopefully) fix the problem correctly by not sending the GEO_TX_POWER_LIMIT command in the same cases it was doing before 39c1a9728f93.

I haven't been able to test this on a 100% real case but I've found a way to hack up the driver to get a trace like Duane's. Applying this patch fixes that.
Comment 5 Duane Robertson 2020-02-11 23:59:59 UTC
Attachment 287283 [details] works on the Motile.
Comment 6 joanbrugueram 2020-02-12 10:14:11 UTC
Great, this confirms that the cause of the problem is the change in the error handling logic in 39c1a9728f93. If so, and since this change was likely accidental, there should hopefully be relatively little problem getting this patch accepted.
Comment 7 Shani Feldman 2020-03-25 14:39:40 UTC
(In reply to Duane Robertson from comment #5)
> Attachment 287283 [details] works on the Motile.

Works for me too
Comment 8 Julian Sikorski 2020-03-29 13:10:25 UTC
could this be a duplicate of https://bugzilla.kernel.org/show_bug.cgi?id=206395? The upstream fix for that bug landed in 5.6:
Comment 9 joanbrugueram 2020-03-29 15:36:00 UTC
Yes, this and that bug are the same.
Comment 10 Duane Robertson 2020-04-01 04:24:29 UTC
5.6 appears to have fixed the issue, at least in the gentoo sources.
Comment 11 Brian Norris 2020-04-28 22:18:27 UTC
Just for the record here, bugzilla.kernel.org is really not the preferred way to reach mainline Linux developers. Some people may make best efforts to watch this bug tracker, but especially in cases where you have a bisection, it's preferred to reach out to linux-wireless@vger.kernel.org (perhaps including the bugzilla link, for more details).

The icing on the cake here is that the patch provided at comment #4 is effectively identical to the patch developed and accepted independently at comment #8, just with a month delay.

There's a little more info on bug reporting here:

Regardless, thanks for reporting.
Comment 12 joanbrugueram 2020-04-29 00:38:09 UTC
@Brian Norris: Thanks, this is good to know. I assumed that the devs would take a look at the Bugzilla, even if sparingly. Maybe they generally do, but for some reason iwlwifi seemed to be at a standstill at that time (c.f. bug 206329). Anyway, I guess this is an indication I should get more familiar with the kernel dev. procedures, so I'll check your link out (I often run mainline RCs or nigthlies so I may be able to catch some issues early now and then).