Bug 11393

Summary: iwl4965 causes hard freeze
Product: Drivers Reporter: Thomas Witt (witt_th)
Component: network-wirelessAssignee: Reinette Chatre (reinette.chatre)
Status: RESOLVED CODE_FIX    
Severity: normal CC: dchris, linville
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: 2.6.27-rc3 Subsystem:
Regression: --- Bisected commit-id:

Description Thomas Witt 2008-08-21 13:09:49 UTC
Latest working kernel version: none
Earliest failing kernel version: 2.6.24 2.6.25 (both gentoo patched) 2.6.27-rc3 (vanilla)
Distribution: gentoo
Hardware Environment: Lenovo Thinkpad T61
Software Environment: gentoo/ ~amd64
Problem Description:  my wireless networking runs just fine with the iwl4965 driver, wpa_supplicant and NetworkManager, but after a few minutes (max is about 1.5 hours, average 1-5 minutes) the entire system freezes up.

The capslock LED blinks helplessly and nothing works, not even magic SysRq.

Steps to reproduce: I just installed gentoo, started wlan and it fails.

I finally managed to get some info out via netconsole:
http://www-public.tu-bs.de/~y0030095/files/crash.txt
Comment 1 John W. Linville 2008-08-22 07:29:24 UTC
In iwl_tx_cmd_complete:

        /* If a Tx command is being handled and it isn't in the actual
         * command queue then there a command routing bug has been introduced
         * in the queue management code. */
        if (txq_id != IWL_CMD_QUEUE_NUM)
                IWL_ERROR("Error wrong command queue %d command id 0x%X\n",
                          txq_id, pkt->hdr.cmd);
        BUG_ON(txq_id != IWL_CMD_QUEUE_NUM);

Zhu Yi, any idea what might cause that?
Comment 2 Zhu Yi 2008-08-24 22:18:23 UTC
Yup, we are tracking this bug on SLUB. Thomas, can you confirm you have also this bug on SLAB?
Comment 3 Thomas Witt 2008-08-25 00:10:06 UTC
(In reply to comment #2)
> Yup, we are tracking this bug on SLUB. Thomas, can you confirm you have also
> this bug on SLAB?

I actually use SLAB...
CONFIG_SLAB=y
Comment 4 Thomas Witt 2008-08-28 05:44:55 UTC
I just tested wlan with an 802.11-g network, and it just works for over two hours now, didn crash yet. So it seems the bug reported above just affects -n or 5GHz networks.
Comment 5 Christophe Dumez 2008-10-31 15:53:29 UTC
Well, I am using SLUB and I used to experience the same kernel panic (bug report is here: https://bugs.launchpad.net/ubuntu/+source/linux-backports-modules-2.6.27/+bug/276990).

I applied this patch "http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=55d6a3cd0cc85ed90c39cf32e16f622bd003117b".

I have a lot of "WARNING: at /build/buildd/linux-backports-modules-2.6.27-2.6.27/debian/build/build-generic/compat-wireless-2.6/drivers/net/wireless/iwlwifi/iwl-tx.c:1204 iwl_tx_cmd_complete+0x2ce/0x2e0 [iwlcore]()
Oct 31 21:32:00 chris-xps kernel: [18062.008981] wrong command queue 31, command id 0x0" in my kern.log

And here is the trace:

Oct 31 23:48:17 chris-xps kernel: [  756.402636] WARNING: at drivers/net/wireless/iwlwifi/iwl-tx.c:1196 iwl_tx_cmd_complete+0x2c9/0x2d0 [iwlcore]()
Oct 31 23:48:17 chris-xps kernel: [  756.402643] wrong command queue 63, command id 0x0
Oct 31 23:48:17 chris-xps kernel: [  756.402648] Modules linked in: iwlagn iwlcore rfkill led_class mac80211 cfg80211 aes_i586 aes_generic i915 drm binfmt_misc af_packet rfcomm bridge stp bnep sco l2cap ipv6 ppdev acpi_cpufreq cpufreq_userspace cpufreq_stats cpufreq_powersave cpufreq_ondemand freq_table cpufreq_conservative sbs sbshc pci_slot container iptable_filter ip_tables x_tables ext3 jbd mbcache sbp2 parport_pc lp parport joydev snd_hda_intel snd_pcm_oss snd_mixer_oss arc4 psmouse ecb snd_pcm crypto_blkcipher uvcvideo iTCO_wdt compat_ioctl32 dcdbas serio_raw videodev pcspkr v4l1_compat iTCO_vendor_support evdev sdhci_pci sdhci snd_seq_dummy mmc_core snd_seq_oss btusb snd_seq_midi snd_rawmidi snd_seq_midi_event bluetooth snd_seq snd_timer video output snd_seq_device snd soundcore intel_agp battery wmi button ac agpgart shpchp pci_hotplug snd_page_alloc jfs sr_mod cdrom pata_acpi sd_mod crc_t10dif sg ata_piix usbhid hid ohci1394 ieee1394 ahci ata_generic tg3 libphy libata scsi_mod dock ehci_hcd uhci_hcd usbco
Oct 31 23:48:17 chris-xps kernel: e thermal processor fan fbcon tileblit font bitblit softcursor fuse [last unloaded: cfg80211]
Oct 31 23:48:17 chris-xps kernel: [  756.402886] Pid: 0, comm: swapper Not tainted 2.6.27-7-generic #1
Oct 31 23:48:17 chris-xps kernel: [  756.402894]  [<c0131d65>] warn_slowpath+0x65/0x90
Oct 31 23:48:17 chris-xps kernel: [  756.402912]  [<c0136976>] ? set_normalized_timespec+0x16/0x90
Oct 31 23:48:17 chris-xps kernel: [  756.402922]  [<c037dfae>] ? account_scheduler_latency+0xe/0x220
Oct 31 23:48:17 chris-xps kernel: [  756.402934]  [<c0118e38>] ? read_hpet+0x8/0x20
Oct 31 23:48:17 chris-xps kernel: [  756.402944]  [<c014e63b>] ? getnstimeofday+0x4b/0x100
Oct 31 23:48:17 chris-xps kernel: [  756.402955]  [<c0136976>] ? set_normalized_timespec+0x16/0x90
Oct 31 23:48:17 chris-xps kernel: [  756.402965]  [<c0151c84>] ? clockevents_program_event+0x14/0x150
Oct 31 23:48:17 chris-xps kernel: [  756.402975]  [<c014b79e>] ? ktime_get+0x1e/0x40
Oct 31 23:48:17 chris-xps kernel: [  756.402985]  [<c015310b>] ? tick_dev_program_event+0x3b/0xc0
Oct 31 23:48:17 chris-xps kernel: [  756.402995]  [<f9243f49>] iwl_tx_cmd_complete+0x2c9/0x2d0 [iwlcore]
Oct 31 23:48:17 chris-xps kernel: [  756.403021]  [<c014a72d>] ? enqueue_hrtimer+0x7d/0x130
Oct 31 23:48:17 chris-xps kernel: [  756.403031]  [<f9493a39>] iwl_rx_handle+0xd9/0x260 [iwlagn]
Oct 31 23:48:17 chris-xps kernel: [  756.403047]  [<f949568d>] iwl4965_irq_tasklet+0x1ad/0x2f0 [iwlagn]
Oct 31 23:48:17 chris-xps kernel: [  756.403061]  [<c0118e38>] ? read_hpet+0x8/0x20
Oct 31 23:48:17 chris-xps kernel: [  756.403071]  [<c0137258>] tasklet_action+0x78/0x100
Oct 31 23:48:17 chris-xps kernel: [  756.403079]  [<c0137682>] __do_softirq+0x92/0x120
Oct 31 23:48:17 chris-xps kernel: [  756.403087]  [<c013776d>] do_softirq+0x5d/0x60
Oct 31 23:48:17 chris-xps kernel: [  756.403095]  [<c01378e5>] irq_exit+0x55/0x90
Oct 31 23:48:17 chris-xps kernel: [  756.403102]  [<c0106c1a>] do_IRQ+0x4a/0x80
Oct 31 23:48:17 chris-xps kernel: [  756.403111]  [<c0105003>] common_interrupt+0x23/0x30
Oct 31 23:48:17 chris-xps kernel: [  756.403119]  [<c01700d8>] ? __audit_mq_getsetattr+0x68/0xb0
Oct 31 23:48:17 chris-xps kernel: [  756.403134]  [<f885e800>] ? acpi_idle_enter_bm+0x268/0x2b7 [processor]
Oct 31 23:48:17 chris-xps kernel: [  756.403152]  [<c02dbf7b>] cpuidle_idle_call+0x7b/0xd0
Oct 31 23:48:17 chris-xps kernel: [  756.403161]  [<c010288d>] cpu_idle+0x7d/0x140
Oct 31 23:48:17 chris-xps kernel: [  756.403169]  [<c037a661>] start_secondary+0x9d/0xcc
Oct 31 23:48:17 chris-xps kernel: [  756.403179]  =======================
Oct 31 23:48:17 chris-xps kernel: [  756.403184] ---[ end trace be9a447c43dc9b0d
Comment 6 Reinette Chatre 2008-11-03 13:28:58 UTC
This problem is being tracked at http://www.intellinuxwireless.org/bugzilla/show_bug.cgi?id=1703
Comment 7 Thomas Witt 2008-11-12 10:34:10 UTC
so what's the status of this? Is there going to be a fix anywhere soon or should I get another card. Is this driver even actually maintained?
Comment 8 John W. Linville 2008-11-12 10:38:20 UTC
Did you look at the link in comment 6?  Have you contribute your information their?

Much as I would prefer for Intel to work on problems here, they seem much more cooperative with using their bugzilla instead...
Comment 9 Thomas Witt 2008-11-12 11:25:15 UTC
hm. I would really appreciate if someone would email me if he is moving the tracking of a bug I filed to another bugzilla I'm not registered in.

So, can anyone recommend another wifi card for the T61? I would prefer an Atheros chipset. 
Comment 10 Reinette Chatre 2008-11-12 11:37:13 UTC
The tracking of the bug was not moved. It was originally submitted there and currently being followed by 15 external people. Duplicating that effort for your convenience is not efficient.
Comment 11 Reinette Chatre 2008-11-21 22:30:58 UTC
This is duplicate of bug 11983, which has a patch available. Could you please test it?
Comment 12 Thomas Witt 2008-11-22 05:12:05 UTC
yeah, this is fixed by this patch.
Comment 13 Reinette Chatre 2008-12-04 16:15:57 UTC
Marking as fixed based on comment #12.