Bug 206479 - [iwlwifi, v5.5 regression, bisected] Wireless-AC 3168NGW no longer initializes
Summary: [iwlwifi, v5.5 regression, bisected] Wireless-AC 3168NGW no longer initializes
Alias: None
Product: Networking
Classification: Unclassified
Component: Wireless (show other bugs)
Hardware: All Linux
: P1 normal
Assignee: networking_wireless@kernel-bugs.osdl.org
Depends on:
Reported: 2020-02-10 01:59 UTC by Duane Robertson
Modified: 2020-04-01 04:24 UTC (History)
8 users (show)

See Also:
Kernel Version: 5.5
Tree: Mainline
Regression: Yes

kernel log with error (89.53 KB, text/plain)
2020-02-10 01:59 UTC, Duane Robertson
Fix error validation in iwl_mvm_sar_geo_init / iwl_sar_geo_init after 39c1a9728f93 (2.89 KB, patch)
2020-02-11 01:08 UTC, joanbrugueram
Details | Diff

Description Duane Robertson 2020-02-10 01:59:01 UTC
Created attachment 287271 [details]
kernel log with error

iwlwifi crashes with BAD_COMMAND error on the AC3168NGW card shipped with the Motile m142. This is a separate issue from bug #206329, although the motile also suffers from that bug. The first bad commit is 39c1a9728f93: refactor the SAR tables from mvm to acpi. I haven't been able to revert the commit cleanly.
Comment 1 Duane Robertson 2020-02-10 02:14:50 UTC
My work-around is to use the iwlwifi driver from v5.4.
Comment 2 joanbrugueram 2020-02-10 13:25:36 UTC
Completely a shot in the dark since I'm not familiar with the driver code and I can't reproduce the problem but...

Something that looks a bit suspicious in that "iwlwifi: refactor the SAR tables from mvm to acpi" commit is that whereas before iwl_mvm_sar_geo_init used to return early without sending a command if iwl_mvm_sar_geo_support or iwl_mvm_sar_get_wgds_table failed, with that commit now that those calls are split into a separate function iwl_sar_geo_init, it will go ahead anyway and send the command anyway. Unless I am missing something this seems like a behaviour change in what should be a refactor.

Any chance it does work if you apply the following patch (in addition to the one from #206329)? Again, I'm just guessing from the code without being able to reproduce, so sorry if this doesn't work.

 drivers/net/wireless/intel/iwlwifi/mvm/fw.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/net/wireless/intel/iwlwifi/mvm/fw.c b/drivers/net/wireless/intel/iwlwifi/mvm/fw.c
index c09624d8d..8a7f12498 100644
--- a/drivers/net/wireless/intel/iwlwifi/mvm/fw.c
+++ b/drivers/net/wireless/intel/iwlwifi/mvm/fw.c
@@ -753,6 +753,7 @@ static int iwl_mvm_sar_geo_init(struct iwl_mvm *mvm)
 	cmd.geo_cmd.ops = cpu_to_le32(IWL_PER_CHAIN_OFFSET_SET_TABLES);
 	iwl_sar_geo_init(&mvm->fwrt, cmd.geo_cmd.table);
+	return 0;
 	cmd.geo_cmd.table_revision = cpu_to_le32(mvm->fwrt.geo_rev);
Comment 3 Duane Robertson 2020-02-10 20:19:21 UTC
Nice! That seems to work, so far, in both the vanilla kernel and gentoo 5.5.2-r1. I see no errors in dmesg or signs of any problems on the network. I'm not sure what would be required to fix this in the general case.
Comment 4 joanbrugueram 2020-02-11 01:08:52 UTC
Created attachment 287283 [details]
Fix error validation in iwl_mvm_sar_geo_init / iwl_sar_geo_init after 39c1a9728f93

Here's a patch that should (hopefully) fix the problem correctly by not sending the GEO_TX_POWER_LIMIT command in the same cases it was doing before 39c1a9728f93.

I haven't been able to test this on a 100% real case but I've found a way to hack up the driver to get a trace like Duane's. Applying this patch fixes that.
Comment 5 Duane Robertson 2020-02-11 23:59:59 UTC
Attachment 287283 [details] works on the Motile.
Comment 6 joanbrugueram 2020-02-12 10:14:11 UTC
Great, this confirms that the cause of the problem is the change in the error handling logic in 39c1a9728f93. If so, and since this change was likely accidental, there should hopefully be relatively little problem getting this patch accepted.
Comment 7 Shani Feldman 2020-03-25 14:39:40 UTC
(In reply to Duane Robertson from comment #5)
> Attachment 287283 [details] works on the Motile.

Works for me too
Comment 8 Julian Sikorski 2020-03-29 13:10:25 UTC
could this be a duplicate of https://bugzilla.kernel.org/show_bug.cgi?id=206395? The upstream fix for that bug landed in 5.6:
Comment 9 joanbrugueram 2020-03-29 15:36:00 UTC
Yes, this and that bug are the same.
Comment 10 Duane Robertson 2020-04-01 04:24:29 UTC
5.6 appears to have fixed the issue, at least in the gentoo sources.

Note You need to log in before you can comment on or make changes to this bug.