Bug 195865 - iwlwifi: 3165: NMI_INTERRUPT_WDG in INIT image during recovery - WIFILNX-940
Summary: iwlwifi: 3165: NMI_INTERRUPT_WDG in INIT image during recovery - WIFILNX-940
Status: CLOSED CODE_FIX
Alias: None
Product: Drivers
Classification: Unclassified
Component: network-wireless (show other bugs)
Hardware: x86-64 Linux
: P1 blocking
Assignee: DO NOT USE - assign "network-wireless-intel" component instead
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2017-05-23 23:30 UTC by Paul
Modified: 2017-07-16 11:47 UTC (History)
2 users (show)

See Also:
Kernel Version: 4.10
Subsystem:
Regression: Yes
Bisected commit-id:


Attachments
Firmware crash when disabled for many hours (1.71 MB, text/plain)
2017-05-23 23:30 UTC, Paul
Details
iwlwifi failure when in use (1.76 MB, text/plain)
2017-05-23 23:31 UTC, Paul
Details
When disabled for many hours (31.73 KB, text/plain)
2017-05-23 23:37 UTC, Paul
Details
When in use for a few minutes (46.85 KB, text/plain)
2017-05-23 23:38 UTC, Paul
Details
firmware panic with ucode version 27 (73.29 KB, text/plain)
2017-05-31 22:43 UTC, Paul
Details
Fix candidate (1008.29 KB, application/octet-stream)
2017-06-04 12:09 UTC, Emmanuel Grumbach
Details
remaining minor asserts (32.67 KB, text/plain)
2017-06-13 21:58 UTC, Paul
Details
panic after long idle run (81.61 KB, text/plain)
2017-06-21 07:55 UTC, Paul
Details
Fix candidate (1008.54 KB, application/octet-stream)
2017-06-29 17:11 UTC, Emmanuel Grumbach
Details
extra messages logged (2.62 KB, text/plain)
2017-07-06 06:27 UTC, Paul
Details

Description Paul 2017-05-23 23:30:07 UTC
Created attachment 256689 [details]
Firmware crash when disabled for many hours

occurs completely randomly, with or without wifi in use, different bands in use, system sleep and resume, different access points, different times of day, etc etc

I have two log dumps, one is when the wifi was turned off and it randomly 'failed' after many hours, didn't notice until I tried to use it an hour later. Second dump is when it was working normally when attached to a mobile hotspot but suddenly failed after a few minutes.

I'm fairly certain you'd find this particular adapter in many light laptops that you can buy off the shelf as that's exactly the laptop I'm using, seems unrelated to similar iwlwifi bug reports...

This bug also causes a slight kernel hang/panic when shutting down, it either waits for a ~10 minute timeout or never succeeds in stopping, I often end up waiting a few minutes then hard-off...
Comment 1 Paul 2017-05-23 23:31:07 UTC
Created attachment 256691 [details]
iwlwifi failure when in use
Comment 2 Paul 2017-05-23 23:37:27 UTC
Created attachment 256693 [details]
When disabled for many hours
Comment 3 Paul 2017-05-23 23:38:33 UTC
Created attachment 256695 [details]
When in use for a few minutes
Comment 4 Emmanuel Grumbach 2017-05-25 09:28:27 UTC
Can you please switch to 4.11 and test -27.ucode?

You can find -27.ucode here https://git.kernel.org/pub/scm/linux/kernel/git/iwlwifi/linux-firmware.git/plain/iwlwifi-7265D-27.ucode?id=0d92f2c95196d3d1dc7b5e25a4bc4a798601adcb

thanks.
Comment 5 Paul 2017-05-30 06:01:09 UTC
Nothing bad so far, however I've only tested with wifi left enabled and in use, seems pretty stable. Works fine with sleeping as well.

Going to try testing with wifi off when possible...
Comment 6 Paul 2017-05-30 06:03:11 UTC
There plans to back-port this to 4.10 or does it rely on 4.11?
Comment 7 Emmanuel Grumbach 2017-05-30 06:29:36 UTC
No plan to backport. What you can do is to use our backport tree (https://git.kernel.org/pub/scm/linux/kernel/git/iwlwifi/backport-iwlwifi.git/) to use our latest driver on 4.10.

I'll close the bug. You can still add comments here and we will get notified.
Comment 8 Paul 2017-05-31 22:42:03 UTC
nope, firmware crashed on having it disabled, driver cant recover...
Comment 9 Paul 2017-05-31 22:43:21 UTC
Created attachment 256819 [details]
firmware panic with ucode version 27
Comment 10 Emmanuel Grumbach 2017-06-01 08:12:35 UTC
I created an internal ticket for the recovery failure (load to fail the INIT image).

The ASSERT 19C2 will be treated separately. I am asking the firmware team what to do with it.
Comment 11 Emmanuel Grumbach 2017-06-04 12:09:18 UTC
Created attachment 256857 [details]
Fix candidate

Please test the firmware attached. This includes a fix for the INIT firmware failure, not for the 19C2.

Let me know how it goes.

Thanks.
Comment 12 Emmanuel Grumbach 2017-06-08 09:35:21 UTC
Hi,

did you have a chance to test it?

thanks.
Comment 13 Paul 2017-06-08 20:38:14 UTC
(In reply to Emmanuel Grumbach from comment #12)
> Hi,
> 
> did you have a chance to test it?
> 
> thanks.

It's been working perfectly the past few days, haven't had it fail to re-enable, so it seems that the INIT bug is fixed.
Comment 14 Emmanuel Grumbach 2017-06-08 20:49:39 UTC
great - can you check if you have SYSASSERT in your kernel log?

I informed the firmware team and they will merge the fix into the official stream.
I'll keep this bug open until this happens.
Comment 15 Emmanuel Grumbach 2017-06-13 17:04:53 UTC
So, no assert at all? :)
Comment 16 Paul 2017-06-13 21:58:12 UTC
Created attachment 256991 [details]
remaining minor asserts

sorry, hadn't been using the laptop much lately so half forgot about this, but as of the latest patch it seems perfectly reliable and only a few asserts had been logged and without causing noticeable issues that I'm aware of.

I know there's still one reliability issue that can occur if I put my laptop in a certain location, but it's rare and might be more to do with the router and having both 2.4 and 5 bands on the same wifi name (auto-switching). When it occurs there ends up being lots of packet loss unless you force the wifi to re-connect, then it works normally again.

Is there a way to work out which band is currently in use for the connection?
Comment 17 Emmanuel Grumbach 2017-06-14 06:07:08 UTC
so this dmesg output doesn't look pretty.

The 19C2 is tracked here: https://bugzilla.kernel.org/show_bug.cgi?id=194951

The Failed to find station part is tracked here: https://bugzilla.kernel.org/show_bug.cgi?id=195957
Comment 18 Emmanuel Grumbach 2017-06-14 07:37:41 UTC
I forgot to reply to your question.
You can see which band you are connected to with iw <iface name> link.
Comment 19 Emmanuel Grumbach 2017-06-16 07:23:06 UTC
I pushed the official delivery of the firmware to our firmware git tree:

https://git.kernel.org/pub/scm/linux/kernel/git/iwlwifi/linux-firmware.git

I am closing this bug now.

Thanks for your report and your help!
Comment 20 Emmanuel Grumbach 2017-06-19 16:48:07 UTC
Do you want to add yourself to the other bug for the 19c2 assert or you don't care anymore?
Comment 21 Paul 2017-06-21 07:55:26 UTC
Created attachment 257115 [details]
panic after long idle run

Seems that the firmware failed again after a long idle run, came back to there being no wifi adapter at all...
Comment 22 Emmanuel Grumbach 2017-06-29 17:11:24 UTC
Created attachment 257227 [details]
Fix candidate

Hey,


are is another fix candidate from our firmware team (David Meriin).

Please test with this one.

Thanks.
Comment 23 Emmanuel Grumbach 2017-07-06 05:36:00 UTC
Any news? :)
Comment 24 Paul 2017-07-06 05:50:31 UTC
seems good so far, all that appears in the logs is 'failed to find station' and 'L1 Enabled - LTR Disabled' on occasion, haven't seen it drop out with torture tests or repeated enable-disable and sleeping.
Comment 25 Emmanuel Grumbach 2017-07-06 06:05:04 UTC
Can you please attach a dmesg output?
No ASSERT?
Comment 26 Paul 2017-07-06 06:27:12 UTC
Created attachment 257381 [details]
extra messages logged
Comment 27 Emmanuel Grumbach 2017-07-06 06:41:01 UTC
Great.
I'll leave this bug open until the code is delivered to the right streams in the firmware.

Thanks!
Comment 28 Emmanuel Grumbach 2017-07-16 11:47:10 UTC
I just pushed the Core26 (-29.ucode) firmware with the fix to our master branch in https://git.kernel.org/pub/scm/linux/kernel/git/iwlwifi/linux-firmware.git

I am now closing the bug.

Thanks!

Note You need to log in before you can comment on or make changes to this bug.