Bug 153381 - iwlwifi: 7260: Sporadic SYSASSERT. 0x000019C2 - WIFILNX-41
Summary: iwlwifi: 7260: Sporadic SYSASSERT. 0x000019C2 - WIFILNX-41
Status: CLOSED CODE_FIX
Alias: None
Product: Drivers
Classification: Unclassified
Component: network-wireless (show other bugs)
Hardware: x86-64 Linux
: P1 normal
Assignee: DO NOT USE - assign "network-wireless-intel" component instead
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2016-08-19 07:32 UTC by Samantha McVey
Modified: 2017-01-15 11:05 UTC (History)
6 users (show)

See Also:
Kernel Version: 4.7.1
Subsystem:
Regression: No
Bisected commit-id:


Attachments
kernel log (123.67 KB, text/plain)
2016-08-19 07:32 UTC, Samantha McVey
Details
kernel log (136.61 KB, text/plain)
2016-08-25 03:47 UTC, Samantha McVey
Details
IWL crash on X1 Yoga (139.16 KB, text/plain)
2016-08-26 07:02 UTC, Svenne Krap
Details
iwlwifi-7260-17.ucode firmware with debugging for sysassert 0x19C2 (768.66 KB, application/octet-stream)
2016-10-11 12:50 UTC, Luca Coelho
Details
Firmware Dump (79.33 KB, application/pgp-encrypted)
2016-10-20 02:01 UTC, Samantha McVey
Details
Kernel log for the Firmware Dump (60.89 KB, application/pgp-encrypted)
2016-10-20 02:01 UTC, Samantha McVey
Details
Queue firmware dump (75.22 KB, application/pgp-encrypted)
2016-10-21 21:12 UTC, Samantha McVey
Details
Queue logs (30.75 KB, application/pgp-encrypted)
2016-10-21 21:12 UTC, Samantha McVey
Details

Description Samantha McVey 2016-08-19 07:32:18 UTC
Created attachment 229321 [details]
kernel log

Firmware is version 17.352738.0, running a 4.7.1 kernel.  I think I saw this problem on at least 4.7.0.

I have attached the log, I have anonymized all but the last two digits of the mac address.  All of the actual digits were the same except the last digit, so the mac addresses were consecutive numbers.  It was a wifi access point with both 2.4Ghz and 5Ghz signals so I am guessing that is what is triggering this bug.

I am pasting the following below to help this be easier to search for:


iwlwifi 0000:03:00.0: loaded firmware version 17.352738.0 op_mode iwlmvm

iwlwifi 0000:03:00.0: Microcode SW error detected.  Restarting 0x2000000.

iwlwifi 0000:03:00.0: FW error in SYNC CMD TIME_EVENT_CMD

WARNING: CPU: 0 PID: 31459 at drivers/net/wireless/intel/iwlwifi/mvm/tx.c:1377 iwl_mvm_rx_tx_cmd+0x662/0x870 [iwlmvm]

WARNING: CPU: 0 PID: 28749 at drivers/net/wireless/intel/iwlwifi/mvm/utils.c:679 iwl_mvm_enable_txq+0x228/0x2a0 [iwlmvm]
Comment 1 Luca Coelho 2016-08-24 05:26:06 UTC
Thanks for reporting! We have started to investigate this internally.  I may ask you for more logs and tests with a different firmware that has more debugging capabilities.  I'll let you know soon.
Comment 2 Samantha McVey 2016-08-25 03:47:21 UTC
Created attachment 230101 [details]
kernel log

Here's another crash log from today.  Crash is on line 57
Comment 3 Luca Coelho 2016-08-25 05:52:44 UTC
Thanks.  Did you have the same "0x000019C2 | ADVANCED_SYSASSERT" this time? I can't see it in the log, I can only see the same WARNINGs on tx.c:1377, that you got after the initial SYSASSERT in the previous log.

Also, why do we keep bouncing from :c6 to :c7?
Comment 4 Samantha McVey 2016-08-25 18:07:27 UTC
No SYSASSERT, I don't know why it bounces back and forth, but it does maintain connection when switching(well until the drivers end up crashing).  The wifi access point has 5Ghz and 2.4Ghz that both have the same name but I am not sure why it keeps switching between them.
Comment 5 Svenne Krap 2016-08-26 07:02:19 UTC
Created attachment 230281 [details]
IWL crash on X1 Yoga

I am seeing the same issue. 

It seems, it has gotten worse with 4.7.2, before I only had one specific network triggering it. Now multiple do.

Svenne
Comment 6 Luca Coelho 2016-08-29 07:16:00 UTC
Samantha, we will investigate the tx.c:1377 warnings.  Regarding the SYSASSERT, please let us know if you see it again.  If you do, we will try to investigate and maybe send you a modifed FW to extract more logs.
Comment 7 Luca Coelho 2016-08-29 07:25:24 UTC
Svenne, the crash you are seeing seems to be unrelated to the problems reported in this bugzilla entry.  Do you mind creating a new bug for us to handle it separately? Additionally, it would be great if you could provide the complete dmesg and, if possible, trace-cmd logs, as described in our wiki:

https://wireless.wiki.kernel.org/en/users/drivers/iwlwifi/debugging
Comment 8 Luca Coelho 2016-08-29 07:30:17 UTC
Samantha, actually the tx.c:1377 bug is known and has been fixed for 4.8:

https://bugzilla.kernel.org/show_bug.cgi?id=153061

I'll send the patch that fixes it for 4.7-stable.  Leaving this bug open until I do so.
Comment 9 Samantha McVey 2016-08-29 11:57:39 UTC
Luca,
I have been seeing the SYSASSERT still and looking back in my logs it's been happening at least .  If you could send a debug firmware that would be great.  Searching my logs I've gotten it 42 times in the last 28 days.  Is the tx.c1377 bug unrelated to the SYSASSERT one?
Comment 10 Luca Coelho 2016-08-29 21:03:58 UTC
I don't think the SYSASSERT has anything to do with the tx.c:1377 bug.

It's good to know that you can reproduce the SYSASSERT relatively easily, so we can take some more logs to investigate.  I'll discuss this with our firmware team and come back to you for more debugging.

Since the tx.c:1377 bug is already fixed, I'm renaming this bug to better match the SYSASSERT problem we're having.
Comment 11 Luca Coelho 2016-10-11 12:50:03 UTC
Created attachment 241501 [details]
iwlwifi-7260-17.ucode firmware with debugging for sysassert 0x19C2

Samantha,

Our firmware team prepared a firmware to debug this issue.  Could you take traces using the attached firmware and using the intructions here?

https://wireless.wiki.kernel.org/en/users/drivers/iwlwifi/debugging#firmware_debugging

This will allow us to further debug the issues you are experiencing.

And please make sure you read and understand the privacy aspects of sending such logs to us:

https://wireless.wiki.kernel.org/en/users/drivers/iwlwifi/debugging#privacy_aspects
Comment 12 Samantha McVey 2016-10-20 02:00:36 UTC
Ok, finally got a dump :)
Had to move the wifi router further away from the laptop to trigger it (I had moved the access point closer to have the bug trigger less often).

Here is the firmware dump and kernel logs.
Comment 13 Samantha McVey 2016-10-20 02:01:09 UTC
Created attachment 242011 [details]
Firmware Dump
Comment 14 Samantha McVey 2016-10-20 02:01:45 UTC
Created attachment 242021 [details]
Kernel log for the Firmware Dump
Comment 15 Luca Coelho 2016-10-20 04:07:37 UTC
Thanks, Samantha!

I'll forward your dump to the firmware team to continue investigation.
Comment 16 Luca Coelho 2016-10-20 04:13:40 UTC
Actually, I see from the logs that you did not get the SYSASSERT error.  This is a queue stuck problem.

Did you ever see the SYSASSERT problem again?

In any case, this dump is valuable to us (to investigate the queue stuck issue), so I'll forward that to the firmware team anyway.  Thanks!
Comment 17 Samantha McVey 2016-10-20 04:17:40 UTC
Line 10610: Oct 20 01:49:32 kernel: iwlwifi 0000:03:00.0: Microcode SW error detected.  Restarting 0x2000000.

Line 10694: Oct 20 01:50:05 kernel: iwlwifi 0000:03:00.0: 0x00000000 | ADVANCED_SYSASSERT

I saw the queue stuck issue too.  I thought it crashed because of the SYSASSERT?  Did you see that line?
Comment 18 Luca Coelho 2016-10-20 05:39:49 UTC
Yes, I see that line, but it is a side-effect of the queue stuck problem (notice the 0x00000000).

The other SYSASSERT you were seeing is 0x000019C2, which is a different thing.

Did you see the 0x19C2 again with the firmware I provided? That firmware has modifications that are supposed to solve that (or give us more information in case it still happens).
Comment 19 Samantha McVey 2016-10-20 06:02:53 UTC
So far I have not, but I tried going back to the previous firmware, and still wasn't seeing the bug.  But I moved my wifi access point further away (the way it was when i started seeing like 3 SYSASSERT's a day).  So we will see in the next few days if it pops up.  I'll make sure to check the hex of the SYSASSERT.
Comment 20 Samantha McVey 2016-10-21 21:12:01 UTC
Ok, so far have not gotten the SYSASSERT 0x000019C2.

I got another crash of the firmware that seems to be related to the queues though it wasn't a SYSASSERT.  Attaching here, but let me know if there's an associated bug report for the queues or you want me to make one.  See 1399 for the firmware crash, though around 300ish there's also errors though the firmware doesn't crash relating to the queues.
Comment 21 Samantha McVey 2016-10-21 21:12:31 UTC
Created attachment 242161 [details]
Queue firmware dump
Comment 22 Samantha McVey 2016-10-21 21:12:54 UTC
Created attachment 242171 [details]
Queue logs
Comment 23 Luca Coelho 2016-11-28 13:12:42 UTC
Sorry for the delay, I was waiting for input from the firmware team.

The crash you captured is because of TX queue hang... we have a bunch of bugs for that already, but we haven't found a proper way to fix it yet. :(

About the SYSASSERT, since you can't reproduce anymore, I've asked the firmware team to release the fix you tested (with the firmware I provided).  I'll let you know once we release it officially.
Comment 24 Luca Coelho 2016-11-29 07:25:14 UTC
Samantha, if it's not much to ask, could you try the original firmware again and see if you can still reproduce the issue? We just want to double-check that there was a real issue before issuing a new official version of the 7260 firmware.
Comment 25 Samantha McVey 2016-12-01 00:13:34 UTC
Luca,
I was able to reproduce the problem with the original firmware. Thank you for helping to get this fixed.
Comment 26 Luca Coelho 2016-12-01 05:49:50 UTC
Thanks Samantha!

I'll report this to our firmware team and we will soon release a new official firmware version with the fix.
Comment 27 Luca Coelho 2017-01-15 11:05:23 UTC
We have published the new firmware that includes the fix for this bug.

Samantha, could you replace the one you tested before with the officially published one? And, of course, report in the unlikely case that it doesn't work. ;)

It can be found here (this will be pushed to the mainline linux-firmware.git soon):

http://git.kernel.org/cgit/linux/kernel/git/iwlwifi/linux-firmware.git/tree/iwlwifi-7260-17.ucode

Closing this bug.

Note You need to log in before you can comment on or make changes to this bug.