Bug 44251 - iwl4965 driver crashes randomly.
Summary: iwl4965 driver crashes randomly.
Status: CLOSED WILL_NOT_FIX
Alias: None
Product: Drivers
Classification: Unclassified
Component: network-wireless (show other bugs)
Hardware: x86-64 Linux
: P1 high
Assignee: Stanislaw Gruszka
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2012-07-04 19:55 UTC by Istvan Toth
Modified: 2018-06-06 15:42 UTC (History)
4 users (show)

See Also:
Kernel Version: 3.5.0
Subsystem:
Regression: No
Bisected commit-id:


Attachments
syslog: iwl4965 crash (33.96 KB, application/octet-stream)
2012-07-04 19:55 UTC, Istvan Toth
Details
full dmesg output (111.21 KB, text/plain)
2012-07-04 20:05 UTC, Istvan Toth
Details
lspci -vvnn as root (35.24 KB, text/plain)
2012-07-04 20:06 UTC, Istvan Toth
Details
DMI table (output of dmidecode as root) (14.64 KB, text/plain)
2012-07-04 20:08 UTC, Istvan Toth
Details
/proc/interrupts (2.04 KB, text/plain)
2012-07-04 20:08 UTC, Istvan Toth
Details
/proc/mtrr (419 bytes, text/plain)
2012-07-04 20:08 UTC, Istvan Toth
Details
output of ver_linux script in kernel tree (1.28 KB, text/plain)
2012-07-04 20:09 UTC, Istvan Toth
Details
iwlegacy_tracing_for_3.5.patch (3.51 KB, text/plain)
2012-08-02 14:40 UTC, Stanislaw Gruszka
Details
trace output after iwl4965 crash (using iwlegacy_tracing_for_3.5.patch) (27.70 KB, text/plain)
2012-08-04 20:37 UTC, Istvan Toth
Details
dmesg after iwl4965 crash (using iwlegacy_tracing_for_3.5.patch) (248.39 KB, text/plain)
2012-08-04 20:39 UTC, Istvan Toth
Details
trace output (w/16M buffer) after iwl4965 crash (running kernel 3.5.0-rc7) (60 bytes, text/plain)
2012-08-16 21:40 UTC, Istvan Toth
Details
dmesg after iwl4965 crash (running kernel 3.5.0-rc7) (248.23 KB, text/plain)
2012-08-16 21:41 UTC, Istvan Toth
Details

Description Istvan Toth 2012-07-04 19:55:32 UTC
Created attachment 74821 [details]
syslog: iwl4965 crash

the wireless driver iwl4965 crashes randomly on my thinkpad t61 (the radio led turns off too).

reloading the iwl4965 module doesn't help. only rebooting solves the issue.

i can not reproduce the bug, it occurs randomly: sometimes rarely (after hours radio working), sometimes more often.

i've reported the bug on ubuntu's launchpad here:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1019672

see the attachments there (using ubuntu 12.04 lts stock kernel).

and i attach a syslog with the same config but with the
latest 3.5.0rc5 mainline kernel.

the same bug is still there...
Comment 1 Istvan Toth 2012-07-04 20:05:46 UTC
Created attachment 74831 [details]
full dmesg output

my full dmesg output.

(last lines after iwl4965 crash: inserting an atheros cardbus wireless card. it works fluently...)
Comment 2 Istvan Toth 2012-07-04 20:06:53 UTC
Created attachment 74841 [details]
lspci -vvnn as root
Comment 3 Istvan Toth 2012-07-04 20:08:04 UTC
Created attachment 74851 [details]
DMI table (output of dmidecode as root)
Comment 4 Istvan Toth 2012-07-04 20:08:29 UTC
Created attachment 74861 [details]
/proc/interrupts
Comment 5 Istvan Toth 2012-07-04 20:08:52 UTC
Created attachment 74871 [details]
/proc/mtrr
Comment 6 Istvan Toth 2012-07-04 20:09:23 UTC
Created attachment 74881 [details]
output of ver_linux script in kernel tree
Comment 7 Stanislaw Gruszka 2012-07-09 08:58:11 UTC
Looks like firmware hangs so bad that is not possible to reset it. Or that could be some PCIe bus problems. Anyway nothing that would be easy to solve, but there are few things to try:

there are some pending fixes that _maybe_ help with this problem:

http://marc.info/?l=linux-wireless&m=134140106808740&w=2
http://marc.info/?l=linux-wireless&m=134140341109543&w=2

Please apply those patches and see if they help by a chance. If not, I will need to get debugging logs, I'll let you know how to capture that.
Comment 8 Stanislaw Gruszka 2012-07-21 09:54:25 UTC
Patches from above comments are now applied on 3.5-rc7 kernel. Does the problem still occurs on that version or later?
Comment 9 Istvan Toth 2012-07-21 17:04:15 UTC
pff... :)

I've began testing on 3.5-rc7.

I've also applied your patches without looking at the source. :)
the patch ran without errors so maybe it wasn't harmful. :)

so the testing is in progress: there is no error yet.
Comment 10 Istvan Toth 2012-08-02 08:13:31 UTC
unfortunately I've encountered the same bug with the patched rc6 kernel and using the stock rc7 kernel too. :(
Comment 11 Stanislaw Gruszka 2012-08-02 14:40:28 UTC
Created attachment 76671 [details]
iwlegacy_tracing_for_3.5.patch

Debug patch. How to use it is described here:

https://bugzilla.kernel.org/show_bug.cgi?id=42766#c16

except we are on 3.5 kernel now, and looking for "Queue stuck" or first microcode error.
Comment 12 Istvan Toth 2012-08-04 07:16:13 UTC
is it possible to use it for the latest 3.6-rc1?
Comment 13 Istvan Toth 2012-08-04 07:41:00 UTC
the question is answered: the simple answer is no, the patch doesn't work out-of-the-box with 3.6-rc1. so i'm using it with 3.5-rc7.
Comment 14 Istvan Toth 2012-08-04 20:37:31 UTC
Created attachment 76811 [details]
trace output after iwl4965 crash (using iwlegacy_tracing_for_3.5.patch)
Comment 15 Istvan Toth 2012-08-04 20:39:07 UTC
Created attachment 76821 [details]
dmesg after iwl4965 crash (using iwlegacy_tracing_for_3.5.patch) 

related to trace.txt
Comment 16 Stanislaw Gruszka 2012-08-06 16:33:42 UTC
I'm sorry, but trace does not include amount of information I expect to see. Seems trace buffer size defaults was changed. 

Please increase buffer size (by following command) after "mount -t debugfs debugfs /sys/kernel/debug/" and before "modprobe iwl4965 debug=0x47833fff"

echo 16384 > /sys/kernel/debug/tracing/buffer_size_kb

Sorry again for wrong instructions.
Comment 17 Istvan Toth 2012-08-16 21:40:20 UTC
Created attachment 77851 [details]
trace output (w/16M buffer) after iwl4965 crash (running kernel 3.5.0-rc7)

finally got it.
Comment 18 Istvan Toth 2012-08-16 21:41:27 UTC
Created attachment 77861 [details]
dmesg after iwl4965 crash (running kernel 3.5.0-rc7)

and the dmesg log. maybe it's useful...h
Comment 19 Stanislaw Gruszka 2018-06-06 15:42:26 UTC
Sorry, this will not be fixed.

Note You need to log in before you can comment on or make changes to this bug.