Bug 9860 - rt61pci crashes in 2.6.24
Summary: rt61pci crashes in 2.6.24
Status: CLOSED CODE_FIX
Alias: None
Product: Drivers
Classification: Unclassified
Component: network-wireless (show other bugs)
Hardware: All Linux
: P1 normal
Assignee: John W. Linville
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2008-01-31 14:54 UTC by Chris Clayton
Modified: 2008-03-05 09:23 UTC (History)
4 users (show)

See Also:
Kernel Version: 2.6.24
Subsystem:
Regression: ---
Bisected commit-id:


Attachments
Screenshot of rt61pci oops (436.21 KB, image/jpeg)
2008-02-09 03:00 UTC, Chris Clayton
Details
screenshot of the kernel panic (452.97 KB, image/jpeg)
2008-02-28 13:28 UTC, Edoardo Vacchi
Details
some more output for the kpanic (202.19 KB, image/jpeg)
2008-03-04 08:10 UTC, Edoardo Vacchi
Details

Description Chris Clayton 2008-01-31 14:54:34 UTC
Latest working kernel version: None - driver introduced in 2.6.24
Earliest failing kernel version: 2.6.24-rc8-git4
Distribution: Was once Peanut Linux (a Slackware derivative), but now highly customised
Hardware Environment: Belkin F5D7010 (11g) wireless adapter and a Draytek Vigor
2600WE AP (11b)
Software Environment: rt61pci driver on 2.6.24 kernel
Problem Description: I can reliably crash the driver by running the following script:

#!/bin/sh
for i in `seq 1 100`; do
    echo $i > itercount
    rm -f linux-2.6.23.tar.bz2
    wget ftp://192.168.1.10/pub/linux-2.6.23.tar.bz2
done

The most complete diagnostics I have copied from the console are:

Call Trace:
[<c011440a>] __update_rq_clock+0x1a/0xf0
[<e08ff59d>] rt2x00lib_txdone+0x9d/0xd0 [rt2x00lib]
[<e0909573>] rt61pci_txdone+0x153/0x1f0 [rt61pci]
[<e09096ad>] rt61pci_interrupt+0x9d/0xb0 [rt61pci]
[<c0140dc7>] handle_IRQ_event+0x27/0x60
[<c0141fbb>] handle_level_irq+0x6b/0x100
[<c014if50>] handle_level_irq+0x0/0x100
[<c0105db8>] do_IRQ+0x68/0xc0
[<c011440a>] __update_rq_clock+0x1a/0xf0
[<c01042f3>] common_interrupt+0x23/0x28
[<c014007b>] encode_comp2_t+0x4b/0x80
[<c011007b>] acpi_copy_wakeup_routine+0x2b/0x9c
[<c02161fb>] acpi_processor_idle+0x25b/0x36e
[<c01020dc>] cpu_idle+0x5c/0x80
[<c040381b>] start_kernel+0x18b/0x1d0
[<c0403360>] unknown_bootoption+0x0/0x150
========================
Code: 00 00 00 e9 6c ff ff ff 8d b6 00 00 00 00 8d bf 00 00 00 00 55 89 cd 57 56
 53 83 ec 10 89 d3 89 44 24 08 ba 20 00 00 00 8b 40 58 <89> 43 14 a1 10 66 3e c0
 e8 6c f9 82 df 89 44 24 0c 85 c0 89 c7
EIP: [<e0930eb7>] ieee80211_tx_status_irqsafe+0x17/0x120 [mac80211] SS:ESP 0068:
c042ff4c
Kernel panic - not syncing: Fatal exception in interrupt

Steps to reproduce:

If I start any non-trivial network activity, the crash will occur. See the thread at http://marc.info/?l=linux-wireless&m=120098190722048&w=4 for all info collected so far.
Comment 1 John W. Linville 2008-02-01 07:24:09 UTC
Can you recreate this using a current net-2.6 kernel?  I would suggest a wireless-2.6 kernel but it is in a bit of flux ATM...
Comment 2 Chris Clayton 2008-02-02 10:43:50 UTC
On Friday 01 February 2008 15:24:09 you wrote:
> http://bugzilla.kernel.org/show_bug.cgi?id=9860
>
>
> linville@tuxdriver.com changed:
>
>            What    |Removed                     |Added
> ---------------------------------------------------------------------------
>- CC|                            |IvDoorn@gmail.com
> AssignedTo|networking_wireless@kernel- |linville@tuxdriver.com
>
>                    |bugs.osdl.org               |
>
>              Status|NEW                         |ASSIGNED
>           Component|Wireless                    |network-wireless
>             Product|Networking                  |Drivers
>
>
>
>
> ------- Comment #1 from linville@tuxdriver.com  2008-02-01 07:24 -------
> Can you recreate this using a current net-2.6 kernel?  I would suggest a
> wireless-2.6 kernel but it is in a bit of flux ATM...

I'm not a git user, so is there a tarball of current net-2.6 available 
anywhere, please? If so, I'll give it a go.

Chris
Comment 3 Chris Clayton 2008-02-03 10:58:37 UTC
I've tried 2.6.24-git13, which appears to be in roughly the same state as net-2.6 as far as the rt61pci driver is concerned, and found that I no longer get the lock-up and flashing LEDs. However, my network dies after only minimal activity. For example, if I try to ping another box on my network (a simple "ping 192.168.1.1"), my network locks up after between 3 and 10 pings. Similarly, my network dies before the whole of the home page at kernel.org has arrived. The only way I can get the network back is to unload and reload the driver.

(You will be pleased to hear, however, that the rtl8180 driver seems to be solid. It has survived 11 ftp transfers of the 2.6.23 kernel sources without failing.)
Comment 4 Chris Clayton 2008-02-05 00:13:14 UTC
Sorry, I should have said that when the network dies, there are no diagnostic messages in /var/log/syslog or in the output from dmesg.
Comment 5 Chris Clayton 2008-02-09 03:00:34 UTC
Created attachment 14764 [details]
Screenshot of rt61pci oops

Screenshot of rt61pci oops
Comment 6 Chris Clayton 2008-02-09 03:03:17 UTC
Mmm, my note disappeared somehow :(
The screenshot I attached a few moments ago is from 2.6.24.1. I didn't expect the release to have fixed the oops, but thought I'd give it a go anyway.

Is theer anything I can do to help get to the bottom of this, bearing in mind that I am not a git user.
Comment 7 Daniel Drake 2008-02-13 13:56:48 UTC
Can you enable a framebuffer? That would allow us to see more of the crash message on screen. It is missing some crucial info which has been pushed off.
Comment 8 Chris Clayton 2008-02-14 11:57:38 UTC
I think I may have an explanation for the crashes I have been experiencing. Earlier this week, my wireless LAN locked up completely. None of the machines in my home (Linux and Windows) could connect. I powered the router off and on and since then the rt61pci driver is no longer failing. So, it would appear that there is a problem in the driver, but, at least for now, I won't be able to provide you with any additional information. What ever the problem was, the rtl8180 driver in 2.3.25-{gitn,rc1} is not vulnerable to it, because it has worked reliably throughout.

I suggest I close the bug and report again if and when my AP begins to 'misbehave'. Is that OK, please?
Comment 9 Edoardo Vacchi 2008-02-17 03:30:12 UTC
I think I can confirm this bug, because I'm experiencing the same behaviour; I can't give details, though, because as OP said, i can't find anything in my logs.

I can tell you here's the ubuntu hardy kernel, 2.6.24-4 works, while all the following updates do not

I know it's little, but i hope it helps
Comment 10 Edoardo Vacchi 2008-02-28 13:28:03 UTC
Created attachment 15072 [details]
screenshot of the kernel panic 

as a proof of the bug occurring, I'm posting a screenshot (actually a real shot of the display); really the output seems a bit different. I boot with acpi=force and the patch at bug #8896 is NOT applied (on that bug you can find more about my hardware, anyway).

ubuntu 7.04 (gutsy) but with kernel from hardy; bug doesn't occur with 2.6.24-4 but with the following updates does.

HTH
Comment 11 Edoardo Vacchi 2008-03-04 08:10:47 UTC
Created attachment 15138 [details]
some more output for the kpanic

some more output
Comment 12 Edoardo Vacchi 2008-03-04 13:05:34 UTC
bug seems fixed in net-2.6 branch!
Comment 13 John W. Linville 2008-03-04 13:31:49 UTC
Excellent!  Closing on the basis of comment 12...
Comment 14 Edoardo Vacchi 2008-03-05 00:22:18 UTC
well if you could please provide a patch for 2.6.24 in the meantime,it would be great;

I had to recompile the whole branch and I think I've found some other regressions somewhere else I wouldn't want to have to deal with ATM ;)
Comment 15 Ivo van Doorn 2008-03-05 09:05:42 UTC
Providing that patch is a bit hard, I still have no idea what is/was causing that panic.
They all point to something in mac80211 and not rt2x00. Unfortunately they point to a function in mac80211 that is pretty straightforward where not much can go wrong.

I fear that the fix for this issue is more a side-effect of a patch that handled something completely different. And knowing that the rt2x00 in 2.6.24 handles some things completely differently then current net-2.6 branch, with some very large patches to rework some internal workings, we probably can't find the correct solution by simply pointing at a particular patch.
Comment 16 Edoardo Vacchi 2008-03-05 09:23:59 UTC
Yes, I feared this problem; I'll try and see if people at ubuntu can backport the whole drivers/net/wireless + net/mac80211 trees

Note You need to log in before you can comment on or make changes to this bug.