2.6.38.5 started crashing with Thinkpad T60 when using adhoc mode wlan. 2.6.38.4 or older kernels work just fine. Have not managed to use magic sysreq to capture the crash to syslog. Traces: http://mcfrisk.kapsi.fi/temp/wlan_crash/adhoc_wlan_crash_dump_1.jpg http://mcfrisk.kapsi.fi/temp/wlan_crash/adhoc_wlan_crash_dump_2.jpg Kernel config: http://mcfrisk.kapsi.fi/temp/wlan_crash/kernel_config.txt hwinfo: http://mcfrisk.kapsi.fi/temp/wlan_crash/thinpad_t60_hwinfo.txt I will try to bisect this further, there seems to be only few changes in iwl3945 between .4 and .5
Bisecting brought up a couple of different looking traces: http://mcfrisk.kapsi.fi/temp/wlan_crash/adhoc_wlan_crash_dump_3.jpg http://mcfrisk.kapsi.fi/temp/wlan_crash/adhoc_wlan_crash_dump_4.jpg And the commit to blame is: f8317705c4db3eaa68c9a9a7dd7dfc321f8057f3 is the first bad commit commit f8317705c4db3eaa68c9a9a7dd7dfc321f8057f3 Author: Stanislaw Gruszka <sgruszka@redhat.com> Date: Tue Mar 29 11:24:21 2011 +0200 iwl3945: disable hw scan by default commit 0263aa45293838b514b8af674a03faf040991a90 upstream. After new NetworkManager 0.8.996 changes, hardware scanning is causing microcode errors as reported here: https://bugzilla.redhat.com/show_bug.cgi?id=683571 and sometimes kernel crashes: https://bugzilla.redhat.com/show_bug.cgi?id=688252 Also with hw scan there are very bad performance on some systems as reported here: https://bugzilla.redhat.com/show_bug.cgi?id=671366 Since Intel no longer supports 3945, there is no chance to get proper firmware fixes, we need workaround problems by disable hardware scanning by default. Signed-off-by: Stanislaw Gruszka <sgruszka@redhat.com> Signed-off-by: John W. Linville <linville@tuxdriver.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de> Userspace is running a few weeks old Debian unstable with network-manager 0.8.3.999-1. Full bisect log is: git bisect start # bad: [60584ef99395a89d136399bbc127289a4aa29dc7] Linux 2.6.38.5 git bisect bad 60584ef99395a89d136399bbc127289a4aa29dc7 # good: [8fd62c82872a5a721c9fb0071ca0f7a49c1732e4] Linux 2.6.38.4 git bisect good 8fd62c82872a5a721c9fb0071ca0f7a49c1732e4 # good: [75db8ad812878495309d3d0b40467e9b9b61b29a] UBIFS: fix master node recovery git bisect good 75db8ad812878495309d3d0b40467e9b9b61b29a # good: [28785447dc596d0612513010e7bb23cce9c88e50] mm: thp: fix /dev/zero MAP_PRIVATE and vm_flags cleanups git bisect good 28785447dc596d0612513010e7bb23cce9c88e50 # good: [0b7c6323a28f3fde67a26bc6b2a889d3f23b12c3] agp: fix arbitrary kernel memory writes git bisect good 0b7c6323a28f3fde67a26bc6b2a889d3f23b12c3 # good: [87cb0add07ea816857eda33c70e274b2ec17bb2e] iwl3945: do not deprecate software scan git bisect good 87cb0add07ea816857eda33c70e274b2ec17bb2e # bad: [9ec3e481f696880fd11e24ff54da6252d3d1a986] iwlegacy: fix tx_power initialization git bisect bad 9ec3e481f696880fd11e24ff54da6252d3d1a986 # bad: [f8317705c4db3eaa68c9a9a7dd7dfc321f8057f3] iwl3945: disable hw scan by default git bisect bad f8317705c4db3eaa68c9a9a7dd7dfc321f8057f3
:-( I will try to reproduce tomorrow. Note using disable_hw_scan=0 module parameter should make things works as before.
(In reply to comment #2) > I will try to reproduce tomorrow. Note using disable_hw_scan=0 module > parameter > should make things works as before. Yes I noticed. Bisect with disable_hw_scan=1 would tell what really changed. 2.6.39-rc6 looked ok at first but then: http://mcfrisk.kapsi.fi/temp/wlan_crash/adhoc_wlan_crash_dump_5.jpg
I'm able to reproduce on my laptop. Before crash, there are lot of microcode errors like that: iwl3945 0000:03:00.0: Microcode SW error detected. Restarting 0x82000008. iwl3945 0000:03:00.0: Loaded firmware version: 15.32.2.9 iwl3945 0000:03:00.0: Start IWL Error Log Dump: iwl3945 0000:03:00.0: Status: 0x000202E4, count: 1 iwl3945 0000:03:00.0: Desc Time asrtPC blink2 ilink1 nmiPC Line iwl3945 0000:03:00.0: SYSASSERT (0x5) 0000099916 0x008B6 0x13BE0 0x00320 0x00000 1095 iwl3945 0000:03:00.0: Start IWL Event Log Dump: display last 20 count iwl3945 0000:03:00.0: 0000097008 0x00008000 0350 iwl3945 0000:03:00.0: 0000097655 0x000000d9 0106 iwl3945 0000:03:00.0: 0000097656 0x00000000 0302 iwl3945 0000:03:00.0: 0000097665 0x00008000 0350 iwl3945 0000:03:00.0: 0000097862 0x000000d9 0106 iwl3945 0000:03:00.0: 0000097864 0x00000000 0302 iwl3945 0000:03:00.0: 0000097872 0x00008000 0350 iwl3945 0000:03:00.0: 0000097917 0x000000d9 0106 iwl3945 0000:03:00.0: 0000097918 0x00000000 0301 iwl3945 0000:03:00.0: 0000098299 0x00000000 0356 iwl3945 0000:03:00.0: 0000099644 0x00000003 0310 iwl3945 0000:03:00.0: 0000099647 0x00000000 0302 iwl3945 0000:03:00.0: 0000099670 0x00000165 0353 iwl3945 0000:03:00.0: 0000099797 0x000000d9 0106 iwl3945 0000:03:00.0: 0000099799 0x00000000 0302 iwl3945 0000:03:00.0: 0000099807 0x00008000 0350 iwl3945 0000:03:00.0: 0000099905 0x04590010 0401 iwl3945 0000:03:00.0: 0000099913 0x000004a9 1000 iwl3945 0000:03:00.0: 0000099914 0x0000000c 0455 iwl3945 0000:03:00.0: 0000099917 0x00000100 0125 iwl3945 0000:03:00.0: Error Reply type 0x00000447 cmd REPLY_RXON (0x10) seq 0x0459 ser 0x000C0000 iwl3945 0000:03:00.0: Error setting Tx power (-5). iwl3945 0000:03:00.0: Can't stop Rx DMA. ieee80211 phy0: Hardware restart was requested So this is similar problem I try to avoid on infrastructure mode, when switching to disable hw scan by default (ehh). On my system, microcode errors happens when switching channel during software scan. When configuring constant ibss channel: iwconfig wlan0 mode ad-hoc iwconfig wlan0 channel 1 iwconfig wlan0 essid "aaa" problem does not happen. It interesting that on infrastructure mode, problem also does not happen also. The difference are between filter settings, that give some opportunity to get a fix. However first I will try some older kernels, to see if problem happens there. Perhaps some driver changes that was not present in old kernel. Generally solving this bug will take some time ...
(In reply to comment #4) > However first I will try some older kernels, to see if problem happens > there. > Perhaps some driver changes that was not present in old kernel. Generally > solving this bug will take some time ... Last night I started bisecting with disable_hw_scan=1 and .39-rc6 to .36 were affected while 2.6.32-trunk version from Debian unstable was not, but that one contains quite alot of patches over vanilla v2.6.32.
Created attachment 56852 [details] check_ibss_channel.patch Proposed fix. It add a check if channel is IBSS capable before switch channels, that was removed during 2.6.35 development cycle. This seems fix problem on my system. Patch is against wireless-testing tree.
Created attachment 56862 [details] check_ibss_channel_2.6.38.patch The same proposed fix for 2.6.38.
(In reply to comment #7) > Created an attachment (id=56862) [details] > check_ibss_channel_2.6.38.patch > > The same proposed fix for 2.6.38. Thanks, this fixes the crash.
Patch: https://bugzilla.kernel.org/attachment.cgi?id=56862
(In reply to comment #9) > Patch: https://bugzilla.kernel.org/attachment.cgi?id=56862 Is this going to some tree and eventually to stable updates? Should I do something?
I posted patch on 7 May 2011. It will take some time until it lend in Linus' and stable trees.
A patch referencing this bug report has been merged in v2.6.39: commit eb85de3f84868ca85703a23617b4079ce79a801e Author: Stanislaw Gruszka <sgruszka@redhat.com> Date: Sat May 7 17:46:21 2011 +0200 iwlegacy: fix IBSS mode crashes