Created attachment 21499 [details] Screenshot of kernel crash output. When the system shuts down/reboots nearly every shutdown stops when trying to kill all processes with killall5. Pressing the power button has no effect in most cases. Only the magic key sequences work. This problem occures since 2.6.30-rc6. 2.6.30-rc5 had no problems with shutting down/rebooting.
Created attachment 21500 [details] kernel configuration KMS and plymouth are used on this test system.
Created attachment 21501 [details] dmesg output
Created attachment 21502 [details] lspci -vvv output
I thought this may be ec related at the first glance, but given that the problem occurs since 2.6.30-rc6, this is not an ec problem. Martin, can you run git-bisect to find out which commit introduces the problem.
Reply-To: hugh.dickins@tiscali.co.uk A git-bisect would indeed be worthwhile; but looking through the diff between 2.6.30-rc5 and 2.6.30-rc7 didn't show any likely candidates - I wonder if this will turn out to be something more elusive. Is this an Acer Aspire One? Looks rather like it: I tried building your kernel (on openSUSE rather than Ubuntu) on mine, and running it: no luck reproducing your issue here. I doubt this has got much to do with mlockall() or lru_add_drain_all() themselves: it looks rather as if an events thread has "gone away". Would you mind applying the hacky patch below, and posting the screenshot from shutdown? I assume from the fact that you posted a photo, that nothing useful gets out to the logs: so here I'm trying to leave just the "events/0" and "events/1" stacktraces onscreen. --- 2.6.30-rc7/kernel/hung_task.c 2009-04-08 14:59:26.000000000 +0100 +++ linux/kernel/hung_task.c 2009-05-25 18:45:11.000000000 +0100 @@ -98,7 +98,7 @@ static void check_hung_task(struct task_ printk(KERN_ERR "\"echo 0 > /proc/sys/kernel/hung_task_timeout_secs\"" " disables this message.\n"); sched_show_task(t); - __debug_show_held_locks(t); + show_state_filter(512); touch_nmi_watchdog(); --- 2.6.30-rc7/kernel/sched.c 2009-05-09 09:24:35.000000000 +0100 +++ linux/kernel/sched.c 2009-05-25 19:08:05.000000000 +0100 @@ -6514,13 +6514,14 @@ void show_state_filter(unsigned long sta * console might take alot of time: */ touch_nmi_watchdog(); - if (!state_filter || (p->state & state_filter)) + if ((state_filter == 512 && !strncmp(p->comm, "events/", 7)) || + !state_filter || (p->state & state_filter)) sched_show_task(p); } while_each_thread(g, p); touch_all_softlockup_watchdogs(); -#ifdef CONFIG_SCHED_DEBUG +#ifdef CONFIG_SCHED_DEBUG_NOT sysrq_sched_debug_show(); #endif read_unlock(&tasklist_lock);
Created attachment 21567 [details] kernel output with last patch in thread for debugging I've found out that the shutdown problem only occurs when I: 1) plug my usb umts stick 2) go online (with network manager) 3) unplug the umts stick 4) shut down or reboot
Created attachment 21569 [details] Shutdown problem without umts stick Ok, forget the info that the shutdown problem only occurs with the umts stick. Currently had this problem without this usb device. But as you can see in the screenshot when I use the magic key sequence to reboot the system the network subsystem outputs something. Maybe the problem is related to the network code?
One more info: I've compared now the shutdown/reboot process with rc5. The differences are: - with rc5 the shutdown/reboot process begins immediately - with rc6 and rc7 the shutdown process begins with a delay of ~2 seconds - when the shutdown/reboot process hangs the 2 lines with the network manager are missing Hope this helps.
I tried to bisect the issue. But unfortunately it fails, because of a reiserfs bug after rc5 which causes the kernel crashing when it mounts the root fs. But I've found the reason for this issue. I remembered that I compiled rc5 with a slightly different config than rc6 and rc7. In rc6/rc7 I had enabled more debugging options and also the kdbg. Looking at the kernel messages when it stopped booting I saw that the last outputs always came from kdbg. Then I disabled kdbg and the additional debugging options in the current master. Now the issue seems to be gone. I've rebootet the kernel several times without any problems. I also compiled rc5 with the config of rc6/rc7 and it showed the same problems as rc6/rc7.
> I also compiled rc5 with the config of rc6/rc7 and it > showed the same problems as rc6/rc7 Clearing the "regression" flag, since it is now unclear if this configuration ever worked on this machine. Can you narrow the problem down to a single .config option?
On Sunday 07 June 2009, Martin Bammer wrote: > Since i disabled most of the debug options this problem has gone. IMHO > this issue has been caused by kdbg.
Dropping from the list of recent regressions as per comment #10.
so can you reproduce this bug on any earlier kernel releases?