Bug 13390
Description
Márton Németh
2009-05-26 19:43:56 UTC
Created attachment 21564 [details]
dmesg after turning on WiFi
Created attachment 21565 [details]
kernel .config for 2.6.30-rc7
Created attachment 21566 [details]
photo of the screen with crash messages
Created attachment 21568 [details]
another dmesg without freeze but with "possible recursive locking"
[ 317.646024] [ INFO: possible recursive locking detected ]
[ 317.647010] 2.6.30-rc7 #1
[ 317.647986] ---------------------------------------------
[ 317.648033] swapper/0 is trying to acquire lock:
[ 317.648033] (pTimer){+.-...}, at: [<c0134a71>] del_timer_sync+0x0/0x83
[ 317.648033]
[ 317.648033] but task is already holding lock:
[ 317.648033] (pTimer){+.-...}, at: [<c013420f>] run_timer_softirq+0x113/0x20b
[ 317.648033]
[ 317.648033] other info that might help us debug this:
[ 317.648033] 1 lock held by swapper/0:
[ 317.648033] #0: (pTimer){+.-...}, at: [<c013420f>] run_timer_softirq+0x113/0x20b
[ 317.648033]
[ 317.648033] stack backtrace:
[ 317.648033] Pid: 0, comm: swapper Tainted: G C 2.6.30-rc7 #1
[ 317.648033] Call Trace:
[ 317.648033] [<c034a44d>] ? printk+0xf/0x12
[ 317.648033] [<c014cfea>] __lock_acquire+0xbfa/0x128d
[ 317.648033] [<c0128848>] ? scheduler_tick+0x39/0x1b2
[ 317.648033] [<c014d730>] lock_acquire+0xb3/0xd6
[ 317.648033] [<c0134a71>] ? del_timer_sync+0x0/0x83
[ 317.648033] [<c0134aa5>] del_timer_sync+0x34/0x83
[ 317.648033] [<c0134a71>] ? del_timer_sync+0x0/0x83
[ 317.648033] [<f88dee93>] RTMP_OS_Del_Timer+0x10/0x1a [rt2860sta]
[ 317.648033] [<f88bb23e>] RTMPCancelTimer+0x22/0x4a [rt2860sta]
[ 317.648033] [<f88f30ff>] RT28xxPciMlmeRadioOFF+0x7a/0x261 [rt2860sta]
[ 317.648033] [<f88b5cb4>] MlmePeriodicExec+0xd0/0x427 [rt2860sta]
[ 317.648033] [<f88df1b1>] linux_MlmePeriodicExec+0x13/0x2a [rt2860sta]
[ 317.648033] [<c013428e>] run_timer_softirq+0x192/0x20b
[ 317.648033] [<c013420f>] ? run_timer_softirq+0x113/0x20b
[ 317.648033] [<f88df19e>] ? linux_MlmePeriodicExec+0x0/0x2a [rt2860sta]
[ 317.648033] [<c0130aa9>] __do_softirq+0xb1/0x187
[ 317.648033] [<c0130bb5>] do_softirq+0x36/0x5a
[ 317.648033] [<c0130d2f>] irq_exit+0x38/0x6f
[ 317.648033] [<c0104e31>] do_IRQ+0x71/0x87
[ 317.648033] [<c010392e>] common_interrupt+0x2e/0x34
[ 317.648033] [<c014007b>] ? enqueue_hrtimer+0x29/0x68
[ 317.648033] [<f83f928a>] ? acpi_idle_enter_bm+0x267/0x298 [processor]
[ 317.648033] [<c02cc101>] cpuidle_idle_call+0x65/0x98
[ 317.648033] [<c0102400>] cpu_idle+0x4e/0x7e
[ 317.648033] [<c033cc4f>] rest_init+0x67/0x69
[ 317.648033] [<c04c9811>] start_kernel+0x309/0x30e
[ 317.648033] [<c04c906a>] __init_begin+0x6a/0x6f
Created attachment 21570 [details]
photo with a longer crash log
I succeed to set the console font smaller the command "consolechars -f /usr/share/consolefonts/lat2-08.psf.gz" so in this photo more text is visible.
Created attachment 21928 [details]
Panic
This is what I captured via netconsole.
The panic doesn't happen if I first bring ra0 down (here, via "ifdown ra0"). (In reply to comment #7) > The panic doesn't happen if I first bring ra0 down (here, via "ifdown ra0"). What hardware are you using? > What hardware are you using?
EeePC 901.
(In reply to comment #7) > The panic doesn't happen if I first bring ra0 down (here, via "ifdown ra0"). ifdown --force ra0 was what originally triggered this bug for me, I suspended the EeePC, then on resume the wifi was not able to scan for networks, so I tried an ifup ra0 && ifdown --force ra0 && ifup ra0. This triggered the panic. I was not able to just reproduce this on my EeePC 901 though. # echo 1 >/sys/bus/pci/devices/0000:01:00.0/remove works fine, removing the device and disposing of ra0. It's then safe to switch off the interface via rfkill. Hello. I too am having trouble here, on an eee pc 1000HE. The manual fix in Comment 11 ("echo 1 >/sys/bus/pci/devices/0000:01:00.0/remove") also causes a freeze and requires a hard reboot, but only after a suspend-resume cycle. Also, "ifdown ra0" or "modprobe -r rt2860sta" both freeze the system, but again only after a suspend-resume cycle. It seems something in the suspend-resume process is breaking rt2860sta. My posting to the debian-eeepc-devel list gives more details: http://lists.alioth.debian.org/pipermail/debian-eeepc-devel/2009-July/002397.html [As an aside, how does one re-establish a network connection after running "echo 1 >/sys/bus/pci/devices/0000:01:00.0/remove"?] Created attachment 22579 [details]
Work around the panic via changes in eeepc-laptop.
This patch is intended only as an EeePC-specific workaround for the panic and should no longer be used once rt2800pci becomes sufficiently useable or rt2860sta (presumably!) is fixed.
Created attachment 22629 [details]
Work around the panic via changes in eeepc-laptop.
Bug fix; won't kill wireless when toggling anything else.
Created attachment 22718 [details]
Work around the panic via changes in eeepc-laptop (2.6.31-rc6).
Updated the workaround patch for 2.6.31-rc6.
I tested the patch on kernel 2.6.30, hardware EEEpc 1000 (ra2680sta, same bug) and it works fine. But after suspending to disk, the freeze reoccurs, as described in the bug report. (using s2disk). However, using shutdown instead of platform in /etc/uswsusp.conf is a workaround for this. correction, EEEpc 1000 should be EEEpc 1000h Created attachment 23279 [details]
Work around the panic via changes in eeepc-laptop (2.6.32-rc3)
Can you send this patch to the maintainer of this driver through email, with a Signed-off-by: line as described in the file, Documentation/SubmittingPatches? Done. I'll submit either/both of the older versions to -stable if this one is accepted as is. I put the patches here because I'm not sure whether it should be worked around in eeepc-laptop or dealt with in rt2860sta (and then there's rt2800pci, which I've not yet been adequately able to test – basically, I want to see if the patch is needed when that driver's in use). Created attachment 23368 [details]
Hopefully a proper fix…
Hah. That easy, in the end. Test away :-)
(I'll send it with the proper Signed-Off-By etc. this evening.)
I tested the rt2860sta of comment 21. It fixes the bug. Thanks a lot. Created attachment 23401 [details]
Backport to 2.6.30 (compile-tested only)
Here's my backport of the patch to 2.6.30.
GregKH: if you want this for -stable (should there be another 2.6.30 release), let me know (or you can use the description & S-O-B from the patch for 2.6.31 and .32-rc).
The fix from mainline works here. Created attachment 23890 [details]
(partial revert) Fix for eeepc-laptop (needed by rt2800pci at least)
rfkill_switch_all calls eeepc-laptop rfkill with calls rfkill_unregister (due to eeepc-laptop removing the device)
as both rfkill_switch_all and rfkill_unregister lock on the global rfkill mutex this cannot work
Thus the coming back of the work schedule for eeepc hotplug.
The trace with rt2800pci before the " (partial revert) Fix for eeepc-laptop (needed by rt2800pci at least)" was : [ 2204.613861] Dumping ftrace buffer: [ 2204.613861] (ftrace buffer empty) [ 2218.320610] SysRq : Restore framebuffer console [ 2228.303896] SysRq : Dump ftrace buffer [ 2243.235978] SysRq : Show Blocked State [ 2243.239948] task PC stack pid father [ 2243.239948] events/0 D c0140031 0 9 2 0x00000000 [ 2243.239948] c2007e40 00000046 c2002a40 c0140031 c0118874 c022c95d ec803d50 c2007e40 [ 2243.239948] f6bec8c0 f68f8540 c05793e4 f704e490 f704e620 00000000 f8d46464 00000001 [ 2243.239948] f704e490 f704e620 ef1d007b 000000d8 00000000 ffffff10 c0126a38 f8d46458 [ 2243.239948] Call Trace: [ 2243.239948] [<c0140031>] ? irq_exit+0x31/0x60 [ 2243.239948] [<c0118874>] ? smp_apic_timer_interrupt+0x54/0x80 [ 2243.239948] [<c022c95d>] ? sysfs_addrm_finish+0x1ad/0x1f0 [ 2243.239948] [<c0126a38>] ? mutex_spin_on_owner+0x88/0xb0 [ 2243.239948] [<c03b2a58>] ? __mutex_lock_slowpath+0xb8/0x110 [ 2243.239948] [<c03b2ded>] ? mutex_lock+0x1d/0x40 [ 2243.239948] [<f8d44f6b>] ? rfkill_unregister+0x4b/0xa0 [rfkill] [ 2243.239948] [<f8a225d3>] ? wiphy_unregister+0x23/0x180 [cfg80211] [ 2243.239948] [<f8b7c0dd>] ? ieee80211_unregister_hw+0xad/0xd0 [mac80211] [ 2243.239948] [<f86195b3>] ? rt2x00leds_unregister_led+0x23/0x40 [rt2x00lib] [ 2243.239948] [<f8614950>] ? rt2x00lib_remove_dev+0xa0/0xb0 [rt2x00lib] [ 2243.239948] [<f86381fa>] ? rt2x00pci_remove+0x2a/0x70 [rt2x00pci] [ 2243.239948] [<c02937e6>] ? pci_device_remove+0x16/0x40 [ 2243.239948] [<c02f1eae>] ? __device_release_driver+0x4e/0xa0 [ 2243.239948] [<c02f1fad>] ? device_release_driver+0x1d/0x30 [ 2243.239948] [<c02f1543>] ? bus_remove_device+0x73/0xa0 [ 2243.239948] [<c02efa0b>] ? device_del+0xfb/0x180 [ 2243.239948] [<c02efa98>] ? device_unregister+0x8/0x10 [ 2243.239948] [<c028e77e>] ? pci_stop_bus_device+0x4e/0x60 [ 2243.239948] [<c028e80a>] ? pci_remove_bus_device+0xa/0x90 [ 2243.239948] [<f8dc5980>] ? eeepc_rfkill_hotplug+0x60/0x140 [eeepc_laptop] [ 2243.239948] [<f8dc5b5a>] ? eeepc_rfkill_set+0x2a/0x50 [eeepc_laptop] [ 2243.239948] [<f8d444fd>] ? rfkill_set_block+0x5d/0xc0 [rfkill] [ 2243.239948] [<f8d44637>] ? __rfkill_switch_all+0x57/0x70 [rfkill] [ 2243.239948] [<f8d446fa>] ? rfkill_switch_all+0x5a/0x60 [rfkill] [ 2243.239948] [<f8d4562f>] ? rfkill_op_handler+0xbf/0x180 [rfkill] [ 2243.239948] [<c02e0f8f>] ? console_callback+0xbf/0x120 [ 2243.239948] [<c014f86b>] ? worker_thread+0x13b/0x200 [ 2243.239948] [<f8d45570>] ? rfkill_op_handler+0x0/0x180 [rfkill] [ 2243.239948] [<c01534a0>] ? autoremove_wake_function+0x0/0x50 [ 2243.239948] [<c014f730>] ? worker_thread+0x0/0x200 [ 2243.239948] [<c0153184>] ? kthread+0x74/0x80 [ 2243.239948] [<c0153110>] ? kthread+0x0/0x80 [ 2243.239948] [<c0103a57>] ? kernel_thread_helper+0x7/0x10 By the way even in my version of the patch I left the mutex. I had another issue (due to cfg80211 flush_work on the global queue in wiphy_unregister while we were already in a work on the global queue, ie eeepc_hotplug_work. This one should be fixed soon in wireless-testing tree) with locking and removed them. Now the cfg80211 issue is fixed but I wonder if this ehotk mutex is not an issue or of any use in a workqueue implementation. They were not there in previous "in tree" workqueue implementation too. (In reply to comment #25) > Created an attachment (id=23890) [details] > (partial revert) Fix for eeepc-laptop (needed by rt2800pci at least) I don't know what that patch is a partial reversion of (I'm not looking right now), but it's evident from the patch content that it isn't reverting what you claim that it's reverting... This is a revert of the mutex implementation of eeepc-laptop: http://git.kernel.org/?p=linux/kernel/git/linville/wireless-testing.git;a=commitdiff;h=dcf443b5813074031a45b05ad9c57da98bcae329 Sorry if I was confusing in my explanation. Reading the previous diff about the switch from works to mutexes in eeepc-laptop. I told partial revert as I left the mutexes in my version of the patch while at the same time going back to the works scheduled. I saw hints that mutexes in works are bad. So a complete revert removing the mutexes looks more and more like the right thing to do. Were you looking at the attachement id 23890 ? Thanks for the lookup. All USB bugs should be sent to the linux-usb@vger.kernel.org mailing list, and not entered into bugzilla. Please bring this issue up there, if it is still a problem in the latest kernel release. |