Bug 6977
Summary: | S3: SMP resume hang - 2.6.17 regression - Dell D420 | ||
---|---|---|---|
Product: | ACPI | Reporter: | Steinar H. Gunderson (steinar+kernel) |
Component: | Power-Sleep-Wake | Assignee: | acpi_power-sleep-wake |
Status: | CLOSED PATCH_ALREADY_AVAILABLE | ||
Severity: | normal | CC: | acpi-bugzilla, akpm, bunk, pavel |
Priority: | P2 | ||
Hardware: | i386 | ||
OS: | Linux | ||
Kernel Version: | 2.6.17 | Subsystem: | |
Regression: | --- | Bisected commit-id: |
Description
Steinar H. Gunderson
2006-08-09 02:11:32 UTC
FYI the commit is:[PATCH] on_each_cpu(): disable local interrupts When on_each_cpu() runs the callback on other CPUs, it runs with local interrupts disabled. So we should run the function with local interrupts disabled on this CPU, too. And do the same for UP, so the callback is run in the same environment on both UP and SMP. (strictly it should do preempt_disable() too, but I think local_irq_disable is sufficiently equivalent). Also uninlines on_each_cpu(). softirq.c was the most appropriate file I could find, but it doesn't seem to justify creating a new file. Oh, and fix up that comment over (under?) x86's smp_call_function(). It drives me nuts. Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org> ...okay, just add warn_on() to on_each_cpu, and notice where it hangs... but you'll need some kind of console for that. Does it work with minimal drivers? The machine has serial if I hook it up to the docking station, but I don't think I have a proper cable here (that will have to wait a few weeks). It does not work with minimal drivers. Hi!
> ------- Additional Comments From sgunderson@bigfoot.com 2006-08-09 02:55 -------
> The machine has serial if I hook it up to the docking station, but I don't think
> I have a proper cable here (that will have to wait a few weeks).
>
> It does not work with minimal drivers.
You could also take kernel, replace half of for_each_cpus with new
variant, and see if it breaks... basically binary searching on
that. Unfortunately I do not see on_each_cpu used in kernel/power/...
Maybe the one in flush_tlb_all()?
Pavel
Andrew, we are talking http://www.kernel.org/git/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=78eef01b0fae087c5fadbd85dd4fe2918c3a015f;hp=ac2b898ca6fb06196a26869c23b66afe7944e52e here. I think it is wrong: it now does local_irq_enable() unconditionally, even if interrupts were disabled before. That's probably what hurts suspend here, we call flush_tlb_all() from pretty low level code. Should we perhaps only restore to previous state of interrupts enabled/disabled instead of enabling unconditionally? Did you try latest Linus's tree? we have some fixes about the issues, like changing flush_tlb_all to flush_tlb_local. Indeed! 2.6.18-rc4 (commit 9f737633e6ee54fc174282d49b2559bd2208391d) works fine for me. It should be noted that I got the following during boot, though: Lukewarm IQ detected in hotplug locking BUG: warning at kernel/cpu.c:38/lock_cpu_hotplug() [<b0132383>] lock_cpu_hotplug+0x40/0x63 [<b012a8e2>] __create_workqueue+0x50/0x11b [<f8af832e>] cpufreq_governor_dbs+0x91/0x29d [cpufreq_ondemand] [<b02162de>] __cpufreq_governor+0x3f/0xb9 [<b0216425>] __cpufreq_set_policy+0xcd/0x100 [<b0216f18>] store_scaling_governor+0x143/0x187 [<b0216b86>] handle_update+0x0/0x5 [<b01b4700>] kobject_cleanup+0x29/0x5e [<b0216dd5>] store_scaling_governor+0x0/0x187 [<b02168af>] store+0x2e/0x3e [<b018af45>] sysfs_write_file+0x8c/0xb4 [<b018aeb9>] sysfs_write_file+0x0/0xb4 [<b0157e93>] vfs_write+0xa1/0x143 [<b0158483>] sys_write+0x3c/0x63 [<b0102c6b>] syscall_call+0x7/0xb Overall, though, it suspends and resumes just fine. Is there any specific fix I can point to to get this backported to my distribution's 2.6.17 kernel? Applying 55b2355eefc2f160246226d4d69fed431173a4d5 on top of 2.6.17 makes resume work for me -- not every time (it sometimes hangs during suspend, probably due to other bugs in the kernel), but much much better than it was in 2.6.17. Should this be closed, or could the fix go into 2.6.17.x? |