Bug 15374

Summary: iwlagn Microcode SW error detected
Product: Drivers Reporter: Trenton D. Adams (trenton.d.adams)
Component: network-wirelessAssignee: Reinette Chatre (reinette.chatre)
Status: RESOLVED CODE_FIX    
Severity: normal CC: claudiomkd, linville, reinette.chatre
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: vanilla 2.6.33-rc6 Subsystem:
Regression: No Bisected commit-id:
Attachments: Ensure aggregation cleanup succeeds after firmware restart
iwl (wlan0) log before the wifi stops working
iwl (wlan0) log after the wifi stops working

Description Trenton D. Adams 2010-02-22 20:25:33 UTC
I can reproduce these problems every time, as long as I leave my machine running; it's just a matter of time.

lspci -vvv
04:00.0 Network controller: Intel Corporation PRO/Wireless 5300 AGN [Shiloh] Network Connection
        Subsystem: Intel Corporation Device 1121
        Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- DisINTx-
        Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
        Latency: 0, Cache Line Size: 64 bytes
        Interrupt: pin A routed to IRQ 25
        Region 0: Memory at f8000000 (64-bit, non-prefetchable) [size=8K]
        Capabilities: [c8] Power Management version 3
                Flags: PMEClk- DSI+ D1- D2- AuxCurrent=0mA PME(D0+,D1-,D2-,D3hot+,D3cold+)
                Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME-
        Capabilities: [d0] MSI: Enable+ Count=1/1 Maskable- 64bit+
                Address: 00000000fee0300c  Data: 41b1
        Capabilities: [e0] Express (v1) Endpoint, MSI 00
                DevCap: MaxPayload 128 bytes, PhantFunc 0, Latency L0s <512ns, L1 unlimited
                        ExtTag- AttnBtn- AttnInd- PwrInd- RBE+ FLReset+
                DevCtl: Report errors: Correctable- Non-Fatal- Fatal- Unsupported-
                        RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop+ FLReset-
                        MaxPayload 128 bytes, MaxReadReq 128 bytes
                DevSta: CorrErr+ UncorrErr- FatalErr- UnsuppReq+ AuxPwr+ TransPend-
                LnkCap: Port #0, Speed 2.5GT/s, Width x1, ASPM L0s L1, Latency L0 <128ns, L1 <32us
                        ClockPM+ Surprise- LLActRep- BwNot-
                LnkCtl: ASPM L0s L1 Enabled; RCB 64 bytes Disabled- Retrain- CommClk+
                        ExtSynch- ClockPM+ AutWidDis- BWInt- AutBWInt-
                LnkSta: Speed 2.5GT/s, Width x1, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
        Capabilities: [100] Advanced Error Reporting
                UESta:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
                UEMsk:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
                UESvrt: DLP+ SDES- TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
                CESta:  RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+
                CEMsk:  RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+
                AERCap: First Error Pointer: 00, GenCap- CGenEn- ChkCap- ChkEn-
        Capabilities: [140] Device Serial Number 00-21-6a-ff-ff-11-3d-62
        Kernel driver in use: iwlagn
        Kernel modules: iwlagn

Feb 22 13:53:40 tdanotebook kernel: iwlagn 0000:04:00.0: Microcode SW error detected.  Restarting 0x2000000.
Feb 22 13:53:40 tdanotebook kernel: iwlagn 0000:04:00.0: Start IWL Error Log Dump:
Feb 22 13:53:40 tdanotebook kernel: iwlagn 0000:04:00.0: Status: 0x000212E4, count: 5
Feb 22 13:53:40 tdanotebook kernel: iwlagn 0000:04:00.0: Desc                               Time       data1      data2      line
Feb 22 13:53:40 tdanotebook kernel: iwlagn 0000:04:00.0: NMI_INTERRUPT_WDG            (#04) 3992959648 0x00000002 0x07030000 61630
Feb 22 13:53:40 tdanotebook kernel: iwlagn 0000:04:00.0: blink1  blink2  ilink1  ilink2
Feb 22 13:53:40 tdanotebook kernel: iwlagn 0000:04:00.0: 0x005AA 0x006E8 0x008B2 0x0E5F2
Feb 22 13:53:40 tdanotebook kernel: iwlagn 0000:04:00.0: Start IWL Event Log Dump: display last 20 entries
Feb 22 13:53:40 tdanotebook kernel: iwlagn 0000:04:00.0: EVT_LOGT:1951598675:0x00c00007:0310
Feb 22 13:53:40 tdanotebook kernel: iwlagn 0000:04:00.0: EVT_LOGT:1951598685:0x000000c3:0601
Feb 22 13:53:40 tdanotebook kernel: iwlagn 0000:04:00.0: EVT_LOGT:1951615519:0x0000010f:0106
Feb 22 13:53:40 tdanotebook kernel: iwlagn 0000:04:00.0: EVT_LOGT:1951615520:0x00000000:0302
Feb 22 13:53:40 tdanotebook kernel: iwlagn 0000:04:00.0: EVT_LOGT:1951615543:0x00000436:0323
Feb 22 13:53:40 tdanotebook kernel: iwlagn 0000:04:00.0: EVT_LOGT:1951615546:0x0000045c:0367
Feb 22 13:53:40 tdanotebook kernel: iwlagn 0000:04:00.0: EVT_LOGT:1951615547:0x00e3318b:0353
Feb 22 13:53:40 tdanotebook kernel: iwlagn 0000:04:00.0: EVT_LOGT:1951625427:0x0000010f:0106
Feb 22 13:53:40 tdanotebook kernel: iwlagn 0000:04:00.0: EVT_LOGT:1951625428:0x00000000:0302
Feb 22 13:53:40 tdanotebook kernel: iwlagn 0000:04:00.0: EVT_LOGT:1951625452:0x00000436:0323
Feb 22 13:53:40 tdanotebook kernel: iwlagn 0000:04:00.0: EVT_LOGT:1951625456:0x0000045c:0367
Feb 22 13:53:40 tdanotebook kernel: iwlagn 0000:04:00.0: EVT_LOGT:1951625457:0x00e3318c:0353
Feb 22 13:53:40 tdanotebook kernel: iwlagn 0000:04:00.0: EVT_LOGT:1951625467:0x00000000:0302
Feb 22 13:53:40 tdanotebook kernel: iwlagn 0000:04:00.0: EVT_LOGT:1951625489:0x00000436:0323
Feb 22 13:53:40 tdanotebook kernel: iwlagn 0000:04:00.0: EVT_LOGT:1951657233:0x0a54001c:0206
Feb 22 13:53:40 tdanotebook kernel: iwlagn 0000:04:00.0: EVT_LOGT:1951657235:0x00000001:0204
Feb 22 13:53:40 tdanotebook kernel: iwlagn 0000:04:00.0: EVT_LOGT:1951657239:0x2a1064a6:0227
Feb 22 13:53:40 tdanotebook kernel: iwlagn 0000:04:00.0: EVT_LOGT:1951657240:0xc0e006e8:0228
Feb 22 13:53:40 tdanotebook kernel: iwlagn 0000:04:00.0: EVT_LOGT:1951857216:0x000000d7:0123
Feb 22 13:53:40 tdanotebook kernel: iwlagn 0000:04:00.0: EVT_LOGT:1951857224:0x00000000:0125



Restarting the interface does absolutely nothing.  I unload the module, and reload it, and it starts work.  I noticed /var/log/messages shows that it crashed on what I believe is the unload...




Feb 22 14:01:05 tdanotebook kernel: iwlagn 0000:04:00.0: Stopping AGG while state not ON or starting
Feb 22 14:01:05 tdanotebook kernel: iwlagn 0000:04:00.0: queue number out of range: 0, must be 10 to 19
Feb 22 14:01:05 tdanotebook kernel: ------------[ cut here ]------------
Feb 22 14:01:05 tdanotebook kernel: WARNING: at net/mac80211/agg-tx.c:152 ___ieee80211_stop_tx_ba_session+0x83/0x89 [mac80211]()
Feb 22 14:01:05 tdanotebook kernel: Hardware name: Studio 1737
Feb 22 14:01:05 tdanotebook kernel: Modules linked in: ppp_async crc_ccitt ipt_MASQUERADE iptable_nat nf_nat snd_pcm_oss snd_mixer_oss snd_seq_oss snd_seq_midi_event snd_seq snd_seq_device vboxvideo drm ppp_mppe ppp_generic slhc mmc_block pl2303 usbserial appletouch cifs snd_hda_codec_atihdmi snd_hda_codec_idt fglrx(P) snd_hda_intel snd_hda_codec iwlagn iwlcore snd_pcm sdhci_pci snd_timer snd sdhci mac80211 cfg80211 mmc_core rfkill video snd_page_alloc led_class uvcvideo output pcspkr fuse raid10 raid456 async_raid6_recov async_memcpy async_pq raid6_pq async_xor xor async_tx raid1 raid0 md_mod scsi_wait_scan sbp2
Feb 22 14:01:05 tdanotebook kernel: Pid: 634, comm: ip Tainted: P           2.6.33-rc6 #8
Feb 22 14:01:05 tdanotebook kernel: Call Trace:
Feb 22 14:01:05 tdanotebook kernel: [<ffffffffa0141903>] ? ___ieee80211_stop_tx_ba_session+0x83/0x89 [mac80211]
Feb 22 14:01:05 tdanotebook kernel: [<ffffffff8103d817>] warn_slowpath_common+0x77/0xa4
Feb 22 14:01:05 tdanotebook kernel: [<ffffffff8103d853>] warn_slowpath_null+0xf/0x11
Feb 22 14:01:05 tdanotebook kernel: [<ffffffffa0141903>] ___ieee80211_stop_tx_ba_session+0x83/0x89 [mac80211]
Feb 22 14:01:05 tdanotebook kernel: [<ffffffffa0141a19>] __ieee80211_stop_tx_ba_session+0x49/0x63 [mac80211]
Feb 22 14:01:05 tdanotebook kernel: [<ffffffffa014141a>] ieee80211_sta_tear_down_BA_sessions+0x1b/0x39 [mac80211]
Feb 22 14:01:05 tdanotebook kernel: [<ffffffffa0143c6f>] ieee80211_set_disassoc+0xeb/0x1e6 [mac80211]
Feb 22 14:01:05 tdanotebook kernel: [<ffffffffa01442c6>] ieee80211_mgd_deauth+0x4d/0x13b [mac80211]
Feb 22 14:01:05 tdanotebook kernel: [<ffffffffa0149b85>] ieee80211_deauth+0x19/0x1b [mac80211]
Feb 22 14:01:05 tdanotebook kernel: [<ffffffffa0121af7>] __cfg80211_mlme_deauth+0x10c/0x11b [cfg80211]
Feb 22 14:01:05 tdanotebook kernel: [<ffffffffa0125045>] __cfg80211_disconnect+0x10c/0x184 [cfg80211]
Feb 22 14:01:05 tdanotebook kernel: [<ffffffffa01133a3>] cfg80211_netdev_notifier_call+0x28f/0x421 [cfg80211]
Feb 22 14:01:05 tdanotebook kernel: [<ffffffff814cf6b4>] notifier_call_chain+0x33/0x5b
Feb 22 14:01:05 tdanotebook kernel: [<ffffffff8105ae64>] raw_notifier_call_chain+0xf/0x11
Feb 22 14:01:05 tdanotebook kernel: [<ffffffff813f119a>] dev_close+0x59/0x9b
Feb 22 14:01:05 tdanotebook kernel: [<ffffffff813f0b59>] dev_change_flags+0xba/0x180
Feb 22 14:01:05 tdanotebook kernel: [<ffffffff813f9795>] do_setlink+0x264/0x32c
Feb 22 14:01:05 tdanotebook kernel: [<ffffffff813fa3ef>] rtnl_newlink+0x2d6/0x4a5
Feb 22 14:01:05 tdanotebook kernel: [<ffffffff813fa1c6>] ? rtnl_newlink+0xad/0x4a5
Feb 22 14:01:05 tdanotebook kernel: [<ffffffff813e4200>] ? sk_wait_data+0x49/0xcb
Feb 22 14:01:05 tdanotebook kernel: [<ffffffff813eb198>] ? __skb_recv_datagram+0x12a/0x25c
Feb 22 14:01:05 tdanotebook kernel: [<ffffffff813fa0fb>] rtnetlink_rcv_msg+0x1c2/0x1e0
Feb 22 14:01:05 tdanotebook kernel: [<ffffffff813f9f39>] ? rtnetlink_rcv_msg+0x0/0x1e0
Feb 22 14:01:05 tdanotebook kernel: [<ffffffff814032cf>] netlink_rcv_skb+0x3e/0x8e
Feb 22 14:01:05 tdanotebook kernel: [<ffffffff813f9f2f>] rtnetlink_rcv+0x27/0x31
Feb 22 14:01:05 tdanotebook kernel: [<ffffffff81402ffa>] netlink_unicast+0x206/0x27c
Feb 22 14:01:05 tdanotebook kernel: [<ffffffff813ea218>] ? memcpy_fromiovec+0x4a/0x7d
Feb 22 14:01:05 tdanotebook kernel: [<ffffffff8140380e>] netlink_sendmsg+0x253/0x266
Feb 22 14:01:05 tdanotebook kernel: [<ffffffff813e1286>] sock_sendmsg+0xbb/0xd4
Feb 22 14:01:05 tdanotebook kernel: [<ffffffff81097d03>] ? __alloc_pages_nodemask+0x132/0x632
Feb 22 14:01:05 tdanotebook kernel: [<ffffffff813df8e0>] ? move_addr_to_kernel+0x39/0x50
Feb 22 14:01:05 tdanotebook kernel: [<ffffffff813ea3a0>] ? verify_iovec+0x5b/0x9b
Feb 22 14:01:05 tdanotebook kernel: [<ffffffff813e1e5b>] sys_sendmsg+0x1fc/0x26b
Feb 22 14:01:05 tdanotebook kernel: [<ffffffff810a9160>] ? handle_mm_fault+0x395/0x72c
Feb 22 14:01:05 tdanotebook kernel: [<ffffffff811dfacb>] ? __up_read+0x9e/0xa7
Feb 22 14:01:05 tdanotebook kernel: [<ffffffff8100202b>] system_call_fastpath+0x16/0x1b
Feb 22 14:01:05 tdanotebook kernel: ---[ end trace 26497551434690b1 ]---
Feb 22 14:01:05 tdanotebook kernel: mac80211-phy0: failed to remove key (0, 00:1f:f3:c3:88:b4) from hardware (-22)
Feb 22 14:01:05 tdanotebook kernel: wlan0: deauthenticating from 00:1f:f3:c3:88:b4 by local choice (reason=3)


Furthermore, the behaviour is very odd, when it does stop working.  What happens is that all of my HTTP connections from my browser will "hang" after resolving the IP and attempting to connect.  Ping still works just fine, for pretty much any site, and I see no latency in ping.  But the HTTP protocol does not work.  Perhaps it's packet size related once it has this problem?

I have a hunch that this may be related to the amount of data that goes over the network over time, but I can't be certain.  Next time it happens, I will check the following...
1. before unloading, confirm that both sets of errors above did or did not occur
2. unload module and check for the messages above again.
3. use ifconfig to see how much the interface had transferred for it's uptime.

I may also try and make it happen by sending large amounts of data over the network in a short period.

Is there anything else you would like me to do the next time it happens?
Comment 1 Trenton D. Adams 2010-03-01 03:42:46 UTC
1. the first message happens before I unload the module
2. the second message happens after I unload the module
3. my interface statistics are below.
wlan0     Link encap:Ethernet  HWaddr 00:21:6a:11:3d:62  
          inet addr:10.0.1.6  Bcast:10.0.1.255  Mask:255.255.255.0
          inet6 addr: fe80::221:6aff:fe11:3d62/64 Scope:Link      
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1      
          RX packets:3061987 errors:0 dropped:0 overruns:0 frame:0
          TX packets:3578411 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000                              
          RX bytes:2436016088 (2.2 GiB)  TX bytes:530748011 (506.1 MiB)

I'll give another set of stats next time it happens.
Comment 2 Reinette Chatre 2010-03-03 17:06:15 UTC
Which version of the firmware are you using? This is usually printed after driver is loaded when interface is brought up.
Comment 3 Trenton D. Adams 2010-03-03 18:22:21 UTC
Feb 28 21:35:50 tdanotebook kernel: iwlagn: Intel(R) Wireless WiFi Link AGN driver for Linux, 2.6.33-rc6-ks
Feb 28 21:35:50 tdanotebook kernel: iwlagn: Copyright(c) 2003-2009 Intel Corporation
Feb 28 21:35:50 tdanotebook kernel: iwlagn 0000:04:00.0: PCI INT A -> GSI 17 (level, low) -> IRQ 17
Feb 28 21:35:50 tdanotebook kernel: iwlagn 0000:04:00.0: setting latency timer to 64
Feb 28 21:35:50 tdanotebook kernel: iwlagn 0000:04:00.0: Detected Intel Wireless WiFi Link 5300AGN REV=0x24
Feb 28 21:35:50 tdanotebook kernel: iwlagn 0000:04:00.0: Tunable channels: 13 802.11bg, 24 802.11a channels
Feb 28 21:35:50 tdanotebook kernel: iwlagn 0000:04:00.0: irq 25 for MSI/MSI-X
Feb 28 21:35:50 tdanotebook kernel: phy1: Selected rate control algorithm 'iwl-agn-rs'
Feb 28 21:35:50 tdanotebook kernel: iwlagn 0000:04:00.0: firmware: requesting iwlwifi-5000-2.ucode
Feb 28 21:35:50 tdanotebook kernel: iwlagn 0000:04:00.0: loaded firmware version 8.24.2.12
Feb 28 21:35:50 tdanotebook kernel: ADDRCONF(NETDEV_UP): wlan0: link is not ready
Comment 4 Reinette Chatre 2010-03-03 21:34:38 UTC
Created attachment 25346 [details]
Ensure aggregation cleanup succeeds after firmware restart

Tracking down the firmware error that triggers the problem will take some time. While this is done, could you please extract and apply the attached patch series to your kernel. It has been created against 2.6.33, can you test this kernel?

This series ensures that aggregation is cleaned up properly after a firmware restart.
Comment 5 Trenton D. Adams 2010-03-03 23:03:27 UTC
Yes, I can test a new kernel.  I have the linus git tree, is it on there somewhere?  If so, just let me know what to do.  Otherwise, I'll apply the patch.

Thanks.
Comment 6 Reinette Chatre 2010-03-03 23:07:52 UTC
(In reply to comment #5)
> Yes, I can test a new kernel.  I have the linus git tree, is it on there
> somewhere?  If so, just let me know what to do.  Otherwise, I'll apply the
> patch.
> 

Please apply these patches on top of vanilla 2.6.33. Thanks
Comment 7 Claudio M. Camacho 2010-03-08 18:01:25 UTC
Hello,

Is this bug fixed in 2.6.33? I have a strange problem as well.

I was using a 2.6.31.x without problems. All the problems started with 2.6.32. After some time of using wifi, my card got dead, meaning that it was associated but there was no response whatsoever. Anyway, I had wifi for some hours, but I suspect it depended on the volume of traffic.

Now, with 2.6.33, my wifi card (iwl5000) works for a few seconds (when I start downloading something) and then it stops working, just as with 2.6.32. I tried all versions of 2.6.32 and 2.6.33 but I had no luck.

Today I decided to try 2.6.31, since I thought it might be something different from the kernel, but I can confirm that it is the kernel. I am now running a 2.6.31.12 and wifi has been working for some hours and I have downloaded/uploaded huge amount of data (in order to test it), but everything works ok.

Could this problem be related to this issue? Shall I provide more details or info about what is happening?

Thanks in advance,


Claudio M. Camacho
Comment 8 Reinette Chatre 2010-03-08 18:19:57 UTC
(In reply to comment #7)
> Could this problem be related to this issue? Shall I provide more details or
> info about what is happening?

Does your logs show the same symptoms? Specifically, are you seeing the same firmware error followed by similar errors? If so, then this could be the same problem, but it is hard to say without logs.
Comment 9 Trenton D. Adams 2010-03-08 19:12:01 UTC
(In reply to comment #6)
> (In reply to comment #5)
> > Yes, I can test a new kernel.  I have the linus git tree, is it on there
> > somewhere?  If so, just let me know what to do.  Otherwise, I'll apply the
> > patch.
> > 
> 
> Please apply these patches on top of vanilla 2.6.33. Thanks

Okay, done.  Now I'm just waiting to see what happens.
Comment 10 Claudio M. Camacho 2010-03-08 19:15:24 UTC
Created attachment 25411 [details]
iwl (wlan0) log before the wifi stops working
Comment 11 Claudio M. Camacho 2010-03-08 19:15:53 UTC
Created attachment 25412 [details]
iwl (wlan0) log after the wifi stops working
Comment 12 Claudio M. Camacho 2010-03-08 19:17:39 UTC
Please notice that I don't get any error in dmesg (messages), I just get an additional line saying:

iwlagn 0000:04:00.0: iwl_tx_agg_start on ra = 00:1c:f0:f0:16:c4 tid = 0


After this, the wifi stops working. Actually, if I just leave a transfer open, it will download at irregular intervals, meaning that every now and then the wifi card works and some data is downloaded (for a couple of seconds) and then the wifi card stops again.

This is very strange.. I actually don't know how to profile this..
Comment 13 Reinette Chatre 2010-03-08 22:02:33 UTC
(In reply to comment #12)
> Please notice that I don't get any error in dmesg (messages), I just get an
> additional line saying:
> 
> iwlagn 0000:04:00.0: iwl_tx_agg_start on ra = 00:1c:f0:f0:16:c4 tid = 0
> 
> 
> After this, the wifi stops working. Actually, if I just leave a transfer
> open,
> it will download at irregular intervals, meaning that every now and then the
> wifi card works and some data is downloaded (for a couple of seconds) and
> then
> the wifi card stops again.
> 
> This is very strange.. I actually don't know how to profile this..

The logs you provided have nothing in common with the original report. Please do stop confusing this bug report with this issue. You can submit a new bug report, but before you do so, please first take a look if it is not a duplicate of http://bugzilla.intellinuxwireless.org/show_bug.cgi?id=2120 or http://bugzilla.intellinuxwireless.org/show_bug.cgi?id=2129 . It may then be better to add yourself to CC list and add your logs to that bug report since work is in progress on those issues.
Comment 14 Claudio M. Camacho 2010-03-09 06:05:40 UTC
I see, I know the logs are completely different.

I think the correct bug is #2129.

Sorry for the misunderstanding.
Comment 15 Reinette Chatre 2010-03-17 16:28:06 UTC
(In reply to comment #9)
> (In reply to comment #6)
> > 
> > Please apply these patches on top of vanilla 2.6.33. Thanks
> 
> Okay, done.  Now I'm just waiting to see what happens.

Trenton, how is the testing going?
Comment 16 Trenton D. Adams 2010-03-17 19:17:13 UTC
(In reply to comment #15)
> (In reply to comment #9)
> > (In reply to comment #6)
> > > 
> > > Please apply these patches on top of vanilla 2.6.33. Thanks
> > 
> > Okay, done.  Now I'm just waiting to see what happens.
> 
> Trenton, how is the testing going?

Nothing is happening so far.  My interface has received almost a GIG now, so I'm not quite up to what I was last time.  Did you fix something, or was the patch only for getting information at the time of the problem?

# dirty implying your patched version.

uname -a
Linux tdanotebook 2.6.33-dirty #9 SMP Mon Mar 8 12:26:52 CST 2010 x86_64 Intel(R) Core(TM)2 Duo CPU P8700 @ 2.53GHz GenuineIntel GNU/Linux

wlan0     Link encap:Ethernet  HWaddr 00:21:6a:11:3d:62
          inet addr:10.0.1.5  Bcast:10.0.1.255  Mask:255.255.255.0
          inet6 addr: fe80::221:6aff:fe11:3d62/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:2458152 errors:0 dropped:0 overruns:0 frame:0
          TX packets:2847427 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:993953444 (947.9 MiB)  TX bytes:709225297 (676.3 MiB)
Comment 17 Trenton D. Adams 2010-03-17 19:21:00 UTC
oops, using two different emails apparently, removing one
Comment 18 Reinette Chatre 2010-03-17 20:43:25 UTC
(In reply to comment #16)
 
> Nothing is happening so far.  My interface has received almost a GIG now, so
> I'm not quite up to what I was last time.  Did you fix something, or was the
> patch only for getting information at the time of the problem?

The patch series was a fix. The firmware error may still occur, but there is nothing we can do about this at this time. What the patches do is recover well if ever a firmware error occurs. Can you check your logs if there was perhaps a firmware error as before? You can search for the string "Microcode SW error"

Your initial report stated "I can reproduce these problems every time, as long as I leave my machine running; it's just a matter of time." ... so it really looks as though these patches are working for you.
Comment 19 Trenton D. Adams 2010-03-19 21:56:20 UTC
Hi Reinette,

Nothing in the log yet, since the kernel file datestamp.  I was taking a break from working though, for almost a week.  So, if it is tied to the amount of data transferred, I haven't reached the limit that I did previously.

I'll watch for it in the logs, and see if that comes up.  I'll also look into possibly doing a large transfer today.
Comment 20 Trenton D. Adams 2010-03-21 15:31:07 UTC
Hello Reinette,

Last night I did a large data transfer, and it did happen.  My network is still working.

Mar 21 00:00:16 tdanotebook kernel: iwlagn 0000:04:00.0: Microcode SW error detected.  Restarting 0x2000000.
Mar 21 00:00:16 tdanotebook kernel: iwlagn 0000:04:00.0: Start IWL Error Log Dump:
Mar 21 00:00:16 tdanotebook kernel: iwlagn 0000:04:00.0: Status: 0x000212E4, count: 5
Mar 21 00:00:16 tdanotebook kernel: iwlagn 0000:04:00.0: Desc                               Time       data1      data2      line
Mar 21 00:00:16 tdanotebook kernel: iwlagn 0000:04:00.0: NMI_INTERRUPT_WDG            (#04) 2976450706 0x00000002 0x07030000 61630
Mar 21 00:00:16 tdanotebook kernel: iwlagn 0000:04:00.0: blink1  blink2  ilink1  ilink2
Mar 21 00:00:16 tdanotebook kernel: iwlagn 0000:04:00.0: 0x005AA 0x006E8 0x008B2 0x0E5EA
Mar 21 00:00:16 tdanotebook kernel: iwlagn 0000:04:00.0: Start IWL Event Log Dump: display last 20 entries
Mar 21 00:00:16 tdanotebook kernel: iwlagn 0000:04:00.0: EVT_LOGT:3557267118:0x00000436:0323
Mar 21 00:00:16 tdanotebook kernel: iwlagn 0000:04:00.0: EVT_LOGT:3557267143:0x00000000:1350
Mar 21 00:00:16 tdanotebook kernel: iwlagn 0000:04:00.0: EVT_LOGT:3557267144:0x00000000:1351
Mar 21 00:00:16 tdanotebook kernel: iwlagn 0000:04:00.0: EVT_LOGT:3557267144:0x00000000:1352
Mar 21 00:00:16 tdanotebook kernel: iwlagn 0000:04:00.0: EVT_LOGT:3557267145:0x00000002:1353
Mar 21 00:00:16 tdanotebook kernel: iwlagn 0000:04:00.0: EVT_LOGT:3557267691:0x00000094:0322
Mar 21 00:00:16 tdanotebook kernel: iwlagn 0000:04:00.0: EVT_LOGT:3557267879:0x0a17001c:0206
Mar 21 00:00:16 tdanotebook kernel: iwlagn 0000:04:00.0: EVT_LOGT:3557267881:0x00000001:0204
Mar 21 00:00:16 tdanotebook kernel: iwlagn 0000:04:00.0: EVT_LOGT:3557267885:0x2a100dfa:0227
Mar 21 00:00:16 tdanotebook kernel: iwlagn 0000:04:00.0: EVT_LOGT:3557267886:0xc0f00c00:0228
Mar 21 00:00:16 tdanotebook kernel: iwlagn 0000:04:00.0: EVT_LOGT:3557269748:0x00000000:0263
Mar 21 00:00:16 tdanotebook kernel: iwlagn 0000:04:00.0: EVT_LOGT:3557271373:0x0000010f:0106
Mar 21 00:00:16 tdanotebook kernel: iwlagn 0000:04:00.0: EVT_LOGT:3557271374:0x00000000:0302
Mar 21 00:00:16 tdanotebook kernel: iwlagn 0000:04:00.0: EVT_LOGT:3557271632:0x00000000:0351
Mar 21 00:00:16 tdanotebook kernel: iwlagn 0000:04:00.0: EVT_LOGT:3557271707:0x0a17001c:0206
Mar 21 00:00:16 tdanotebook kernel: iwlagn 0000:04:00.0: EVT_LOGT:3557271709:0x00000001:0204
Mar 21 00:00:16 tdanotebook kernel: iwlagn 0000:04:00.0: EVT_LOGT:3557271713:0x2a100dfa:0227
Mar 21 00:00:16 tdanotebook kernel: iwlagn 0000:04:00.0: EVT_LOGT:3557271714:0xc0f00930:0228
Mar 21 00:00:16 tdanotebook kernel: iwlagn 0000:04:00.0: EVT_LOGT:3557471682:0x000000d7:0123
Mar 21 00:00:16 tdanotebook kernel: iwlagn 0000:04:00.0: EVT_LOGT:3557471690:0x00000000:0125
Mar 21 00:00:16 tdanotebook kernel: iwlagn 0000:04:00.0: Stopping AGG while state not ON or starting
Mar 21 00:00:16 tdanotebook kernel: iwlagn 0000:04:00.0: queue number out of range: 0, must be 10 to 19
Mar 21 00:00:16 tdanotebook kernel: iwlagn 0000:04:00.0: Stopping AGG while state not ON or starting
Mar 21 00:00:16 tdanotebook kernel: iwlagn 0000:04:00.0: queue number out of range: 0, must be 10 to 19
Mar 21 00:07:56 tdanotebook kernel: iwlagn 0000:04:00.0: iwl_tx_agg_start on ra = 00:1f:f3:c3:88:b4 tid = 0


wlan0     Link encap:Ethernet  HWaddr 00:21:6a:11:3d:62
          inet addr:10.0.1.5  Bcast:10.0.1.255  Mask:255.255.255.0
          inet6 addr: fe80::221:6aff:fe11:3d62/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:3533628 errors:0 dropped:0 overruns:0 frame:0
          TX packets:4388833 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:1618662921 (1.5 GiB)  TX bytes:1130521808 (1.0 GiB)
Comment 21 Reinette Chatre 2010-03-22 15:01:21 UTC
(In reply to comment #20)
> Last night I did a large data transfer, and it did happen.  My network is
> still
> working.
> 
> Mar 21 00:00:16 tdanotebook kernel: iwlagn 0000:04:00.0: Microcode SW error
> detected.  Restarting 0x2000000.


This was the goal of the patches. A firmware error occurs, which we cannot do anything about at this time, and the new changes enable the driver to recover without affecting the user.