18.104.22.168 started crashing with Thinkpad T60 when using adhoc mode wlan.
22.214.171.124 or older kernels work just fine. Have not managed to use magic sysreq to capture the crash to syslog.
I will try to bisect this further, there seems to be only few changes in iwl3945 between .4 and .5
Bisecting brought up a couple of different looking traces:
And the commit to blame is:
f8317705c4db3eaa68c9a9a7dd7dfc321f8057f3 is the first bad commit
Author: Stanislaw Gruszka <firstname.lastname@example.org>
Date: Tue Mar 29 11:24:21 2011 +0200
iwl3945: disable hw scan by default
commit 0263aa45293838b514b8af674a03faf040991a90 upstream.
After new NetworkManager 0.8.996 changes, hardware scanning is causing
microcode errors as reported here:
and sometimes kernel crashes:
Also with hw scan there are very bad performance on some systems
as reported here:
Since Intel no longer supports 3945, there is no chance to get proper
firmware fixes, we need workaround problems by disable hardware scanning
Signed-off-by: Stanislaw Gruszka <email@example.com>
Signed-off-by: John W. Linville <firstname.lastname@example.org>
Signed-off-by: Greg Kroah-Hartman <email@example.com>
Userspace is running a few weeks old Debian unstable with network-manager 0.8.3.999-1. Full bisect log is:
git bisect start
# bad: [60584ef99395a89d136399bbc127289a4aa29dc7] Linux 126.96.36.199
git bisect bad 60584ef99395a89d136399bbc127289a4aa29dc7
# good: [8fd62c82872a5a721c9fb0071ca0f7a49c1732e4] Linux 188.8.131.52
git bisect good 8fd62c82872a5a721c9fb0071ca0f7a49c1732e4
# good: [75db8ad812878495309d3d0b40467e9b9b61b29a] UBIFS: fix master node recovery
git bisect good 75db8ad812878495309d3d0b40467e9b9b61b29a
# good: [28785447dc596d0612513010e7bb23cce9c88e50] mm: thp: fix /dev/zero MAP_PRIVATE and vm_flags cleanups
git bisect good 28785447dc596d0612513010e7bb23cce9c88e50
# good: [0b7c6323a28f3fde67a26bc6b2a889d3f23b12c3] agp: fix arbitrary kernel memory writes
git bisect good 0b7c6323a28f3fde67a26bc6b2a889d3f23b12c3
# good: [87cb0add07ea816857eda33c70e274b2ec17bb2e] iwl3945: do not deprecate software scan
git bisect good 87cb0add07ea816857eda33c70e274b2ec17bb2e
# bad: [9ec3e481f696880fd11e24ff54da6252d3d1a986] iwlegacy: fix tx_power initialization
git bisect bad 9ec3e481f696880fd11e24ff54da6252d3d1a986
# bad: [f8317705c4db3eaa68c9a9a7dd7dfc321f8057f3] iwl3945: disable hw scan by default
git bisect bad f8317705c4db3eaa68c9a9a7dd7dfc321f8057f3
I will try to reproduce tomorrow. Note using disable_hw_scan=0 module parameter should make things works as before.
(In reply to comment #2)
> I will try to reproduce tomorrow. Note using disable_hw_scan=0 module
> should make things works as before.
Yes I noticed. Bisect with disable_hw_scan=1 would tell what really changed. 2.6.39-rc6 looked ok at first but then:
I'm able to reproduce on my laptop. Before crash, there are lot of microcode errors like that:
iwl3945 0000:03:00.0: Microcode SW error detected. Restarting 0x82000008.
iwl3945 0000:03:00.0: Loaded firmware version: 184.108.40.206
iwl3945 0000:03:00.0: Start IWL Error Log Dump:
iwl3945 0000:03:00.0: Status: 0x000202E4, count: 1
iwl3945 0000:03:00.0: Desc Time asrtPC blink2 ilink1 nmiPC Line
iwl3945 0000:03:00.0: SYSASSERT (0x5) 0000099916 0x008B6 0x13BE0 0x00320 0x00000 1095
iwl3945 0000:03:00.0: Start IWL Event Log Dump: display last 20 count
iwl3945 0000:03:00.0: 0000097008 0x00008000 0350
iwl3945 0000:03:00.0: 0000097655 0x000000d9 0106
iwl3945 0000:03:00.0: 0000097656 0x00000000 0302
iwl3945 0000:03:00.0: 0000097665 0x00008000 0350
iwl3945 0000:03:00.0: 0000097862 0x000000d9 0106
iwl3945 0000:03:00.0: 0000097864 0x00000000 0302
iwl3945 0000:03:00.0: 0000097872 0x00008000 0350
iwl3945 0000:03:00.0: 0000097917 0x000000d9 0106
iwl3945 0000:03:00.0: 0000097918 0x00000000 0301
iwl3945 0000:03:00.0: 0000098299 0x00000000 0356
iwl3945 0000:03:00.0: 0000099644 0x00000003 0310
iwl3945 0000:03:00.0: 0000099647 0x00000000 0302
iwl3945 0000:03:00.0: 0000099670 0x00000165 0353
iwl3945 0000:03:00.0: 0000099797 0x000000d9 0106
iwl3945 0000:03:00.0: 0000099799 0x00000000 0302
iwl3945 0000:03:00.0: 0000099807 0x00008000 0350
iwl3945 0000:03:00.0: 0000099905 0x04590010 0401
iwl3945 0000:03:00.0: 0000099913 0x000004a9 1000
iwl3945 0000:03:00.0: 0000099914 0x0000000c 0455
iwl3945 0000:03:00.0: 0000099917 0x00000100 0125
iwl3945 0000:03:00.0: Error Reply type 0x00000447 cmd REPLY_RXON (0x10) seq 0x0459 ser 0x000C0000
iwl3945 0000:03:00.0: Error setting Tx power (-5).
iwl3945 0000:03:00.0: Can't stop Rx DMA.
ieee80211 phy0: Hardware restart was requested
So this is similar problem I try to avoid on infrastructure mode, when switching to disable hw scan by default (ehh).
On my system, microcode errors happens when switching channel during software scan. When configuring constant ibss channel:
iwconfig wlan0 mode ad-hoc
iwconfig wlan0 channel 1
iwconfig wlan0 essid "aaa"
problem does not happen. It interesting that on infrastructure mode, problem also does not happen also. The difference are between filter settings, that give some opportunity to get a fix.
However first I will try some older kernels, to see if problem happens there. Perhaps some driver changes that was not present in old kernel. Generally solving this bug will take some time ...
(In reply to comment #4)
> However first I will try some older kernels, to see if problem happens
> Perhaps some driver changes that was not present in old kernel. Generally
> solving this bug will take some time ...
Last night I started bisecting with disable_hw_scan=1 and .39-rc6 to .36 were affected while 2.6.32-trunk version from Debian unstable was not, but that one contains quite alot of patches over vanilla v2.6.32.
Created attachment 56852 [details]
Proposed fix. It add a check if channel is IBSS capable before switch channels, that was removed during 2.6.35 development cycle. This seems fix problem on my system. Patch is against wireless-testing tree.
Created attachment 56862 [details]
The same proposed fix for 2.6.38.
(In reply to comment #7)
> Created an attachment (id=56862) [details]
> The same proposed fix for 2.6.38.
Thanks, this fixes the crash.
(In reply to comment #9)
> Patch: https://bugzilla.kernel.org/attachment.cgi?id=56862
Is this going to some tree and eventually to stable updates? Should I do something?
I posted patch on 7 May 2011. It will take some time until it lend in Linus' and stable trees.
A patch referencing this bug report has been merged in v2.6.39:
Author: Stanislaw Gruszka <firstname.lastname@example.org>
Date: Sat May 7 17:46:21 2011 +0200
iwlegacy: fix IBSS mode crashes