Bug 16176

Summary: Microcode errors with iwl3945
Product: Drivers Reporter: Steinar H. Gunderson (steinar+kernel)
Component: network-wirelessAssignee: Reinette Chatre (reinette.chatre)
Status: CLOSED UNREPRODUCIBLE    
Severity: normal CC: dode, linville, maciej.rutecki, reinette.chatre, rjw
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: 2.6.35-rc2 Subsystem:
Regression: Yes Bisected commit-id:
Bug Depends on:    
Bug Blocks: 16055    
Attachments: more internal scan debug

Description Steinar H. Gunderson 2010-06-10 19:28:47 UTC
Hi,

I upgraded to 2.6.35-rc2 today, and it suddenly started spewing these errors at me in the kernel log:

iwl3945 0000:0c:00.0: Microcode SW error detected. Restarting 0x82000008.
iwl3945 0000:0c:00.0: Loaded firmware version: 15.32.2.9
iwl3945 0000:0c:00.0: Start IWL Error Log Dump:
iwl3945 0000:0c:00.0: Status: 0x0002A2E4, count: 1
iwl3945 0000:0c:00.0: Desc       Time       asrtPC  blink2 ilink1  nmiPC   Line
iwl3945 0000:0c:00.0: SYSASSERT     (#5) 4272998189 0x008B6 0x13756 0x0031C 0x00000 764

iwl3945 0000:0c:00.0: Start IWL Event Log Dump: display last 20 count
iwl3945 0000:0c:00.0: 4272997361        0x00000000      0600
iwl3945 0000:0c:00.0: 4272997369        0x000000d9      0106
iwl3945 0000:0c:00.0: 4272997370        0x00000000      0302
iwl3945 0000:0c:00.0: 4272997436        0x00000000      0304
iwl3945 0000:0c:00.0: 4272997535        0x000000a2      0600
iwl3945 0000:0c:00.0: 4272997540        0x00000004      0631
iwl3945 0000:0c:00.0: 4272997552        0x55000190      0630
iwl3945 0000:0c:00.0: 4272997553        0x56017c78      0630
iwl3945 0000:0c:00.0: 4272997568        0x000000d9      0106
iwl3945 0000:0c:00.0: 4272997570        0x00000000      0302
iwl3945 0000:0c:00.0: 4272997632        0x00000000      0304
iwl3945 0000:0c:00.0: 4272997653        0x000000d9      0106
iwl3945 0000:0c:00.0: 4272997655        0x00000000      0302
iwl3945 0000:0c:00.0: 4272997695        0x000000d4      0303
iwl3945 0000:0c:00.0: 4272997763        0x000000d9      0106
iwl3945 0000:0c:00.0: 4272997765        0x00000000      0302
iwl3945 0000:0c:00.0: 4272997834        0x00000000      0304
iwl3945 0000:0c:00.0: 4272998136        0x44400080      0401
iwl3945 0000:0c:00.0: 4272998151        0x00000017      0452
iwl3945 0000:0c:00.0: 4272998190        0x00000100      0125
iwl3945 0000:0c:00.0: Error Reply type 0x000002FC cmd REPLY_SCAN_CMD (0x80) seq 0x4440 ser 0x00340000
iwl3945 0000:0c:00.0: Command REPLY_SCAN_CMD failed: FW Error
iwl3945 0000:0c:00.0: Can't stop Rx DMA.

and then several times before I tried to switch APs, and got:


BUG: unable to handle kernel NULL pointer dereference at 000000ae
IP: [<f82c6a5f>] iwl3945_request_scan+0x3d7/0x538 [iwl3945]
*pde = 00000000
Oops: 0000 [#1] PREEMPT SMP
last sysfs file: /sys/devices/platform/regulatory.0/uevent
Modules linked in: tun iwl3945 iwlcore mac80211 cfg80211 sha256_generic aes_i586 aes_generic cbc uinput binfmt_misc ppdev lp parport sco bridge stp llc bnep rfcomm l2cap bluetooth kvm_intel kvm cpufreq_userspace cpufreq_stats cpufreq_conservative cpufreq_powersave fuse dm_crypt dm_snapshot dm_mirror dm_region_hash dm_log ipv6 cpufreq_ondemand acpi_cpufreq freq_table mperf sbp2 loop snd_hda_codec_idt snd_hda_intel snd_hda_codec arc4 ecb snd_pcm_oss snd_mixer_oss snd_pcm joydev snd_seq_dummy pcmcia snd_seq_oss tpm_tis snd_seq_midi i2c_i801 snd_rawmidi snd_seq_midi_event yenta_socket dell_laptop pcmcia_rsrc dcdbas evdev snd_seq snd_timer snd_seq_device tpm processor pcmcia_core battery ac snd psmouse tpm_bios soundcore snd_page_alloc serio_raw ext3 jbd dm_mod i915 drm_kms_helper i2c_algo_bit cfbcopyarea cfbimgblt cfbfillrect drm i2c_core sd_mod ide_pci_generic piix ide_core ata_generic uhci_hcd tg3 intel_agp ata_piix ehci_hcd libphy ohci1394 ieee1394 usbcore thermal video output button agpgart [last unloaded: cfg80211]

Pid: 3679, comm: iwl3945 Tainted: G        W   2.6.35-rc2 #1 0TJ984/Latitude D420
EIP: 0060:[<f82c6a5f>] EFLAGS: 00010287 CPU: 0
EIP is at iwl3945_request_scan+0x3d7/0x538 [iwl3945]
EAX: 000000ae EBX: f67d0b40 ECX: 00000000 EDX: 00000000
ESI: f61bf8f2 EDI: 00000001 EBP: f2de3f5c ESP: f2de3f00
 DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068
Process iwl3945 (pid: 3679, ti=f2de2000 task=f2daeff0 task.ti=f2de2000)
Stack:
 0057eff0 0016ff00 000000ae 00000001 f2daeff0 f2de3f70 c129e97b 005f1930
<0> 00000000 00000000 00000058 c149db00 c149db00 c149db00 f61bf800 00000000
<0> 00000000 00000000 00000001 008000d8 f67d0b40 f67d10b8 f67d6778 f2de3f70
Call Trace:
 [<c129e97b>] ? schedule+0x6ee/0x797
 [<f87945db>] ? iwl_bg_start_internal_scan+0x84/0xa1 [iwlcore]
 [<c103f794>] ? worker_thread+0x15d/0x1e1
 [<f8794557>] ? iwl_bg_start_internal_scan+0x0/0xa1 [iwlcore]
 [<c104285d>] ? autoremove_wake_function+0x0/0x2f
 [<c103f637>] ? worker_thread+0x0/0x1e1
 [<c10424af>] ? kthread+0x5f/0x64
 [<c1042450>] ? kthread+0x0/0x64
 [<c1002c36>] ? kernel_thread_helper+0x6/0x10
Code: c8 b8 01 00 00 00 c7 45 c8 00 00 00 00 d3 e0 66 89 55 a6 88 45 b0 83 e8 02 88 45 a9 e9 c3 00 00 00 8b 55 c4 8b 44 90 20 89 45 ac <39> 38 0f 85 ae 00 00 00 66 8b 48 06 89 fa 88 4e 01 89 d8 0f b6
EIP: [<f82c6a5f>] iwl3945_request_scan+0x3d7/0x538 [iwl3945] SS:ESP 0068:f2de3f00
CR2: 00000000000000ae
---[ end trace 8058ad57766ccf7c ]---

Going back to 2.6.34-rc6, no problems there.
Comment 1 Reinette Chatre 2010-06-11 18:46:55 UTC
We recently pushed some fixes for this issue upstream but they did not make it into 2.6.35-rc2 - they will be in rc3. Could you try with the latest linux-2.6?
Comment 2 Steinar H. Gunderson 2010-06-14 23:42:10 UTC
On Fri, Jun 11, 2010 at 06:46:59PM +0000, bugzilla-daemon@bugzilla.kernel.org wrote:
> https://bugzilla.kernel.org/show_bug.cgi?id=16176
> We recently pushed some fixes for this issue upstream but they did not make
> it
> into 2.6.35-rc2 - they will be in rc3. Could you try with the latest
> linux-2.6?

I tried -rc3, and it was much, much worse, but in a different way. Basically,
holding up TCP connections for more than a few seconds at a time was
completely impossible. I could not even get through the captive portal here;
everything would just hang randomly. (Tried N browsers and telnet.)

/* Steinar */
Comment 3 Reinette Chatre 2010-06-15 17:28:52 UTC
Created attachment 26785 [details]
more internal scan debug

The commit I refer to in comment #1 is in rc3 and should take care of the BUG you encounter. You mention that things are behaving in a different way - are you still seeing the same errors? The fix I pointed out does not address the SYSASSERT. On a closer look that SYSASSERT is the hardware telling the driver that it is not legal to scan on channel zero. This is its regulatory enforcement. 

To look into this further I'd like to see some scan and also some general debugging. Please ensure your driver is compiled with debug support (CONFIG_IWLWIFI_DEBUG) and then load the module with debug flags as follows:
modprobe iwl3945 debug=0x43fff

I attach an additional debug patch that will get us a little more data if you are able to get it included in your driver that will be great.
Comment 4 Steinar H. Gunderson 2010-06-19 12:48:46 UTC
OK, well, I guess I have bad news :-) I didn't have time to check this last week, and now I've traveled back from the US to Switzerland again. I don't see these problems anymore — the oopses were fixed already in 2.6.35-rc3, but the TCP problems are nowhere to be seen. Of course, if this really was a regulatory domain problem, I'd probably have to travel back to the US and the offending APs to reproduce...

IPv6 routing is now, on the other hand, completely broken (I can only ping stuff on my local network, nothing that has to go through a default route), but I take it this is not iwl3945's fault.

IOW, this is now unreproducible for me, so if nobody else steps forward with the same problems, I guess the bug has to be closed.
Comment 5 Reinette Chatre 2010-06-21 21:49:46 UTC
(In reply to comment #4)
> IPv6 routing is now, on the other hand, completely broken (I can only ping
> stuff on my local network, nothing that has to go through a default route),
> but
> I take it this is not iwl3945's fault.

You may want to try http://git.kernel.org/?p=linux/kernel/git/iwlwifi/iwlwifi-2.6.git;a=commit;h=4d23e4e5eb50431426facf192354ad2506e2dd40 which fixes a multicast issue in iwl3945.
Comment 6 Reinette Chatre 2010-06-28 22:55:34 UTC
(In reply to comment #4)

> I don't see
> these problems anymore — the oopses were fixed already in 2.6.35-rc3, but the
> TCP problems are nowhere to be seen. Of course, if this really was a
> regulatory
> domain problem, I'd probably have to travel back to the US and the offending
> APs to reproduce...
> 

...

> IOW, this is now unreproducible for me, so if nobody else steps forward with
> the same problems, I guess the bug has to be closed.
Comment 7 Torsten Römer 2011-06-13 12:48:26 UTC
Exactly from the moment where I built NetworkManager 0.8.9997 and replaced 0.8.4 with it, I have a lot of the below in dmesg/syslog, and scanning Wifi networks doesn't seem to work.

I might have not built or set up NetworkManager entirely correct (even though everything else than scanning seems to work fine), but I am wondering how another version of NetworkManager can cause the iwl3945 Microcode SW error to happen?

I am running Ubuntu 11.04 2.6.38-8-generic #42-Ubuntu.

[ 5420.614860] iwl3945 0000:04:00.0: Microcode SW error detected. Restarting 0x82000008.
[ 5420.614872] iwl3945 0000:04:00.0: Loaded firmware version: 15.32.2.9
[ 5420.614908] iwl3945 0000:04:00.0: Start IWL Error Log Dump:
[ 5420.614914] iwl3945 0000:04:00.0: Status: 0x0002A2E4, count: 1
[ 5420.614919] iwl3945 0000:04:00.0: Desc       Time       asrtPC  blink2 ilink1  nmiPC   Line
[ 5420.615153] iwl3945 0000:04:00.0: SYSASSERT     (0x5) 2932061891 0x008B6 0x13756 0x0031C 0x00000 764
[ 5420.615156] 
[ 5420.615289] iwl3945 0000:04:00.0: Start IWL Event Log Dump: display last 20 count
[ 5420.615334] iwl3945 0000:04:00.0: 2932060759	0x00000006	0463
[ 5420.615360] iwl3945 0000:04:00.0: 2932060760	0x00000006	0462
[ 5420.615385] iwl3945 0000:04:00.0: 2932060795	0x00000000	0201
[ 5420.615411] iwl3945 0000:04:00.0: 2932060806	0x0000001c	0206
[ 5420.615437] iwl3945 0000:04:00.0: 2932060808	0x00400001	0204
[ 5420.615462] iwl3945 0000:04:00.0: 2932060812	0x00400001	0219
[ 5420.615488] iwl3945 0000:04:00.0: 2932060813	0x00020041	0211
[ 5420.615513] iwl3945 0000:04:00.0: 2932060818	0x00000000	0212
[ 5420.615539] iwl3945 0000:04:00.0: 2932061149	0x00000000	0215
[ 5420.615565] iwl3945 0000:04:00.0: 2932061223	0x00400008	0220
[ 5420.615590] iwl3945 0000:04:00.0: 2932061238	0x00000000	0301
[ 5420.615616] iwl3945 0000:04:00.0: 2932061507	0x000000d4	0303
[ 5420.615641] iwl3945 0000:04:00.0: 2932061834	0x442d0080	0401
[ 5420.615667] iwl3945 0000:04:00.0: 2932061848	0x00000017	0452
[ 5420.615692] iwl3945 0000:04:00.0: 2932061853	0x00103b6e	0450
[ 5420.615717] iwl3945 0000:04:00.0: 2932061859	0x00103b6e	0450
[ 5420.615743] iwl3945 0000:04:00.0: 2932061864	0x00103b6e	0450
[ 5420.615769] iwl3945 0000:04:00.0: 2932061869	0x00103b6e	0450
[ 5420.615794] iwl3945 0000:04:00.0: 2932061874	0x00103b6e	0450
[ 5420.615820] iwl3945 0000:04:00.0: 2932061893	0x00000100	0125
[ 5420.615848] iwl3945 0000:04:00.0: Error Reply type 0x000002FC cmd REPLY_SCAN_CMD (0x80) seq 0x442D ser 0x00340000
[ 5420.620278] iwl3945 0000:04:00.0: Can't stop Rx DMA.