Bug 34452

Summary: iwl3945 ad-hoc mode crash with Thinkpad T60
Product: Networking Reporter: Mikko Rapeli (mikko.rapeli)
Component: WirelessAssignee: Stanislaw Gruszka (stf_xl)
Status: CLOSED CODE_FIX    
Severity: normal CC: florian, linville, maciej.rutecki, rjw, stf_xl
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: 2.6.38.5 Subsystem:
Regression: Yes Bisected commit-id:
Bug Depends on:    
Bug Blocks: 32012    
Attachments: check_ibss_channel.patch
check_ibss_channel_2.6.38.patch

Description Mikko Rapeli 2011-05-04 15:06:01 UTC
2.6.38.5 started crashing with Thinkpad T60 when using adhoc mode wlan.
2.6.38.4 or older kernels work just fine. Have not managed to use magic sysreq to capture the crash to syslog.

Traces:

http://mcfrisk.kapsi.fi/temp/wlan_crash/adhoc_wlan_crash_dump_1.jpg
http://mcfrisk.kapsi.fi/temp/wlan_crash/adhoc_wlan_crash_dump_2.jpg

Kernel config:

http://mcfrisk.kapsi.fi/temp/wlan_crash/kernel_config.txt

hwinfo:

http://mcfrisk.kapsi.fi/temp/wlan_crash/thinpad_t60_hwinfo.txt

I will try to bisect this further, there seems to be only few changes in iwl3945 between .4 and .5
Comment 1 Mikko Rapeli 2011-05-04 18:47:51 UTC
Bisecting brought up a couple of different looking traces:

http://mcfrisk.kapsi.fi/temp/wlan_crash/adhoc_wlan_crash_dump_3.jpg
http://mcfrisk.kapsi.fi/temp/wlan_crash/adhoc_wlan_crash_dump_4.jpg

And the commit to blame is:

f8317705c4db3eaa68c9a9a7dd7dfc321f8057f3 is the first bad commit
commit f8317705c4db3eaa68c9a9a7dd7dfc321f8057f3
Author: Stanislaw Gruszka <sgruszka@redhat.com>
Date:   Tue Mar 29 11:24:21 2011 +0200

    iwl3945: disable hw scan by default

    commit 0263aa45293838b514b8af674a03faf040991a90 upstream.

    After new NetworkManager 0.8.996 changes, hardware scanning is causing
    microcode errors as reported here:
    https://bugzilla.redhat.com/show_bug.cgi?id=683571
    and sometimes kernel crashes:
    https://bugzilla.redhat.com/show_bug.cgi?id=688252

    Also with hw scan there are very bad performance on some systems
    as reported here:
    https://bugzilla.redhat.com/show_bug.cgi?id=671366

    Since Intel no longer supports 3945, there is no chance to get proper
    firmware fixes, we need workaround problems by disable hardware scanning
    by default.

    Signed-off-by: Stanislaw Gruszka <sgruszka@redhat.com>
    Signed-off-by: John W. Linville <linville@tuxdriver.com>
    Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

Userspace is running a few weeks old Debian unstable with network-manager 0.8.3.999-1. Full bisect log is:

git bisect start
# bad: [60584ef99395a89d136399bbc127289a4aa29dc7] Linux 2.6.38.5
git bisect bad 60584ef99395a89d136399bbc127289a4aa29dc7
# good: [8fd62c82872a5a721c9fb0071ca0f7a49c1732e4] Linux 2.6.38.4
git bisect good 8fd62c82872a5a721c9fb0071ca0f7a49c1732e4
# good: [75db8ad812878495309d3d0b40467e9b9b61b29a] UBIFS: fix master node recovery
git bisect good 75db8ad812878495309d3d0b40467e9b9b61b29a
# good: [28785447dc596d0612513010e7bb23cce9c88e50] mm: thp: fix /dev/zero MAP_PRIVATE and vm_flags cleanups
git bisect good 28785447dc596d0612513010e7bb23cce9c88e50
# good: [0b7c6323a28f3fde67a26bc6b2a889d3f23b12c3] agp: fix arbitrary kernel memory writes
git bisect good 0b7c6323a28f3fde67a26bc6b2a889d3f23b12c3
# good: [87cb0add07ea816857eda33c70e274b2ec17bb2e] iwl3945: do not deprecate software scan
git bisect good 87cb0add07ea816857eda33c70e274b2ec17bb2e
# bad: [9ec3e481f696880fd11e24ff54da6252d3d1a986] iwlegacy: fix tx_power initialization
git bisect bad 9ec3e481f696880fd11e24ff54da6252d3d1a986
# bad: [f8317705c4db3eaa68c9a9a7dd7dfc321f8057f3] iwl3945: disable hw scan by default
git bisect bad f8317705c4db3eaa68c9a9a7dd7dfc321f8057f3
Comment 2 Stanislaw Gruszka 2011-05-04 19:43:09 UTC
:-( 

I will try to reproduce tomorrow. Note using disable_hw_scan=0 module parameter should make things works as before.
Comment 3 Mikko Rapeli 2011-05-04 20:03:20 UTC
(In reply to comment #2)
> I will try to reproduce tomorrow. Note using disable_hw_scan=0 module
> parameter
> should make things works as before.

Yes I noticed. Bisect with disable_hw_scan=1 would tell what really changed.  2.6.39-rc6 looked ok at first but then:

http://mcfrisk.kapsi.fi/temp/wlan_crash/adhoc_wlan_crash_dump_5.jpg
Comment 4 Stanislaw Gruszka 2011-05-05 09:09:57 UTC
I'm able to reproduce on my laptop. Before crash, there are lot of microcode errors like that:

iwl3945 0000:03:00.0: Microcode SW error detected. Restarting 0x82000008.
iwl3945 0000:03:00.0: Loaded firmware version: 15.32.2.9
iwl3945 0000:03:00.0: Start IWL Error Log Dump:
iwl3945 0000:03:00.0: Status: 0x000202E4, count: 1
iwl3945 0000:03:00.0: Desc       Time       asrtPC  blink2 ilink1  nmiPC   Line
iwl3945 0000:03:00.0: SYSASSERT     (0x5) 0000099916 0x008B6 0x13BE0 0x00320 0x00000 1095

iwl3945 0000:03:00.0: Start IWL Event Log Dump: display last 20 count
iwl3945 0000:03:00.0: 0000097008        0x00008000      0350
iwl3945 0000:03:00.0: 0000097655        0x000000d9      0106
iwl3945 0000:03:00.0: 0000097656        0x00000000      0302
iwl3945 0000:03:00.0: 0000097665        0x00008000      0350
iwl3945 0000:03:00.0: 0000097862        0x000000d9      0106
iwl3945 0000:03:00.0: 0000097864        0x00000000      0302
iwl3945 0000:03:00.0: 0000097872        0x00008000      0350
iwl3945 0000:03:00.0: 0000097917        0x000000d9      0106
iwl3945 0000:03:00.0: 0000097918        0x00000000      0301
iwl3945 0000:03:00.0: 0000098299        0x00000000      0356
iwl3945 0000:03:00.0: 0000099644        0x00000003      0310
iwl3945 0000:03:00.0: 0000099647        0x00000000      0302
iwl3945 0000:03:00.0: 0000099670        0x00000165      0353
iwl3945 0000:03:00.0: 0000099797        0x000000d9      0106
iwl3945 0000:03:00.0: 0000099799        0x00000000      0302
iwl3945 0000:03:00.0: 0000099807        0x00008000      0350
iwl3945 0000:03:00.0: 0000099905        0x04590010      0401
iwl3945 0000:03:00.0: 0000099913        0x000004a9      1000
iwl3945 0000:03:00.0: 0000099914        0x0000000c      0455
iwl3945 0000:03:00.0: 0000099917        0x00000100      0125
iwl3945 0000:03:00.0: Error Reply type 0x00000447 cmd REPLY_RXON (0x10) seq 0x0459 ser 0x000C0000
iwl3945 0000:03:00.0: Error setting Tx power (-5).
iwl3945 0000:03:00.0: Can't stop Rx DMA.
ieee80211 phy0: Hardware restart was requested

So this is similar problem I try to avoid on infrastructure mode, when switching to disable hw scan by default (ehh).

On my system, microcode errors happens when switching channel during software scan. When configuring constant ibss channel:

iwconfig wlan0 mode ad-hoc
iwconfig wlan0 channel 1
iwconfig wlan0 essid "aaa"

problem does not happen. It interesting that on infrastructure mode, problem also does not happen also. The difference are between filter settings, that give some opportunity to get a fix.

However first I will try some older kernels, to see  if problem happens there. Perhaps some driver changes that was not present in old kernel. Generally solving this bug will take some time ...
Comment 5 Mikko Rapeli 2011-05-05 09:38:07 UTC
(In reply to comment #4)
> However first I will try some older kernels, to see  if problem happens
> there.
> Perhaps some driver changes that was not present in old kernel. Generally
> solving this bug will take some time ...

Last night I started bisecting with disable_hw_scan=1 and .39-rc6 to .36 were affected while 2.6.32-trunk version from Debian unstable was not, but that one contains quite alot of patches over vanilla v2.6.32.
Comment 6 Stanislaw Gruszka 2011-05-06 16:55:00 UTC
Created attachment 56852 [details]
check_ibss_channel.patch

Proposed fix. It add a check if channel is IBSS capable before switch channels, that was removed during 2.6.35 development cycle. This seems fix problem on my system. Patch is against wireless-testing tree.
Comment 7 Stanislaw Gruszka 2011-05-06 16:56:30 UTC
Created attachment 56862 [details]
check_ibss_channel_2.6.38.patch

The same proposed fix for 2.6.38.
Comment 8 Mikko Rapeli 2011-05-07 06:18:53 UTC
(In reply to comment #7)
> Created an attachment (id=56862) [details]
> check_ibss_channel_2.6.38.patch
> 
> The same proposed fix for 2.6.38.

Thanks, this fixes the crash.
Comment 9 Florian Mickler 2011-05-07 12:13:40 UTC
Patch: https://bugzilla.kernel.org/attachment.cgi?id=56862
Comment 10 Mikko Rapeli 2011-05-10 08:27:38 UTC
(In reply to comment #9)
> Patch: https://bugzilla.kernel.org/attachment.cgi?id=56862

Is this going to some tree and eventually to stable updates? Should I do something?
Comment 11 Stanislaw Gruszka 2011-05-10 11:55:35 UTC
I posted patch on 7 May 2011. It will take some time until it lend in Linus' and stable trees.
Comment 12 Florian Mickler 2011-05-19 07:22:29 UTC
A patch referencing this bug report has been merged in v2.6.39:

commit eb85de3f84868ca85703a23617b4079ce79a801e
Author: Stanislaw Gruszka <sgruszka@redhat.com>
Date:   Sat May 7 17:46:21 2011 +0200

    iwlegacy: fix IBSS mode crashes