Bug 14101
Summary: | acpi_idle_enter_bm() calls sched_clock_cpu() with interrupts enabled - i8kmon - Vostro1710 | ||
---|---|---|---|
Product: | Drivers | Reporter: | Ben Hutchings (ben) |
Component: | Platform | Assignee: | acpi_platform-drivers (acpi_platform-drivers) |
Status: | REJECTED DOCUMENTED | ||
Severity: | normal | CC: | akpm, lenb, tim.w.connors |
Priority: | P1 | ||
Hardware: | All | ||
OS: | Linux | ||
URL: | http://bugs.debian.org/541677 | ||
Kernel Version: | 2.6.30-5 (Debian) | Subsystem: | |
Regression: | Yes | Bisected commit-id: | |
Attachments: |
acpidump output
lspci output /proc/cpuinfo disable the SMI to see whether the issue still happens try the debug patch, which adds the debug macro to identify where the irq is re-enabled? |
Description
Ben Hutchings
2009-08-31 20:43:06 UTC
please supply the output from lspci -v (In reply to comment #1) > please supply the output from lspci -v I can ask the original reporter, but isn't code inspection sufficient? Hi, Ben Will you please attach the output of acpidump, dmesg, cat /proc/cpuinfo? Thanks. (In reply to comment #3) > Hi, Ben > Will you please attach the output of acpidump, dmesg, cat /proc/cpuinfo? > Thanks. dmesg output is in the Debian bug report. I will ask for the others. Created attachment 22980 [details]
acpidump output
Created attachment 22981 [details]
lspci output
Created attachment 22982 [details]
/proc/cpuinfo
Guys, we shouldn't need lspci and acpidump output to find out what this bug is. But I don't get it. acpi_idle_enter_bm() does local_irq_disable() prior to calling sched_clock_idle_sleep_event(), so why did the warning trigger? (In reply to comment #8) > Guys, we shouldn't need lspci and acpidump output to find out what this bug > is. > > But I don't get it. acpi_idle_enter_bm() does local_irq_disable() prior to > calling sched_clock_idle_sleep_event(), so why did the warning trigger? Yes, right, I misread the sense of the warning, so the summary is wrong. No idea what's happened here. If you refer back to the original Debian bug report, you'll see some other symptoms reported which might point at a deeper problem (hardware or SMM misbehaviour) or might be cascading faults. After checking the source code, it seems that the irq will be disabled before calling the function of sched_clock_cpu. Why is the irq enabled spontaneously? Maybe it is caused by SMM misbehaviour. Thanks. lspci shows that this is an 82801H (ICH8 Family) Yakui, maybe a test patch to flip the chipset bit that disables SMI/SMM will prove that the instability is due to SMM or not? Created attachment 23059 [details]
disable the SMI to see whether the issue still happens
Will you please try the debug patch on the latest kernel and see whether the issue still happens?
This debug patch is only to check whether the issue is caused by the SMI. In this patch the SMI is disabled after ACPI is enabled.
After the test, please attach the output of dmesg.
Thanks.
http://kerneloops.org/searchweek.php?search=acpi_idle_enter_bm shows that a lot (of apparently 32-bit machines) are running into this. dunno if they are all ICH8's Yakui, please add a 2nd debug patch with a WARN_ONCE in acpi_idle_enter_bm to figure out exactly where we first have interrupts disabled on us. I'll bet it is the actual entry to c3, but who knows... Created attachment 23673 [details]
try the debug patch, which adds the debug macro to identify where the irq is re-enabled?
This issue is different with the kerneloops in kerneloops.org. It seems that the issue in the kerneloops.org is caused by that condition_schedule is called in ISR/softirq after the irq is enabled. (When the cpu exits the C-state, it will be re-enable the interrupt. And then OS will process the ISR/softirq). But this issue in this bug is that the irq is re-enabled spontaneously before entering the deep C-state. Hi, Ben Will you please try the debug patch in comment #14 and attach the output of dmesg when the issue happens? Thanks. I'm the original reporter, and am now subscribed to this report. The laptop is currently in "production", so I can't apply the patch until I can afford to reboot and crash the machine, however, one bit of trivia that didn't get into this report is that the bug is related to the i8k module. With i8kmon running (which polls /proc/i8k I believe), the machine crashes in a small number of minutes (it stutters and pauses for a few seconds at a time and gets worse and worse until it panics). If on the console, I cat /proc/i8kmon in a tight bash loop, then I could induce the panic in a few seconds after a few stutters, and if I recall correctly, after seeing a couple of traces (can't remember whether they were BUGs or otherwise). The traces almost always involved "coretemp" if I recall correctly. The hardware isn't an inspiron 8000, but that module had worked on most dell laptops I had come across, and was giving sane temperatures (if not fan speed) and was able to control the fan, so I kept on using it until I worked out it was crashing 2.6.28 and above. If the debug will still be useful knowing these points, I'll get back to you when I'm able to apply the patch. Do you mean that the i8k module is used on your box? It seems that it will enter SMM mode in i8k. If the i8kmon polls the /proc/i8k interface, it will frequently enter the SMM mode. Maybe the IRQ will be enabled after entering the SMM. At the same time as the SMM is invisible to OS, we can't know where/when the IRQ is enabled. If the IRQ is enabled before we enter the C-state, sometimes it will complain the following warning message: ------------[ cut here ]------------ >> [ 391.549836] Pid: 0, comm: swapper Not tainted 2.6.30-1-amd64 #1 > [ 391.549841] Call Trace: > [ 391.549852] [<ffffffff80258897>] ? sched_clock_cpu+0x7b/0x202 > [ 391.549861] [<ffffffff80258897>] ? sched_clock_cpu+0x7b/0x202 > [ 391.549871] [<ffffffff8024238f>] ? warn_slowpath_common+0x77/0xa3 > [ 391.549880] [<ffffffff80258897>] ? sched_clock_cpu+0x7b/0x202 > [ 391.549905] [<ffffffffa020a4db>] ? acpi_idle_enter_bm+0x16c/0x2a8 > [processor] > [ 391.549916] [<ffffffff803fcfe0>] ? cpuidle_idle_call+0x8b/0xc7 > [ 391.549926] [<ffffffff8064e140>] ? early_idt_handler+0x0/0x71 > [ 391.549936] [<ffffffff8020edda>] ? cpu_idle+0x50/0x91 > [ 391.549943] [<ffffffff8064ec62>] ? start_kernel+0x37a/0x386 > [ 391.549951] [<ffffffff8064e3b7>] ? x86_64_start_kernel+0xf9/0x106 > [ 391.549957] ---[ end trace 1aa56acdd82f3805 ]--- How about this issue if you don't load the i8k module or try the patch in comment #12? thanks. From the log it seems that this issue is related with the SMI operation. In such case it is transparent to OS and we don't know when the irq is enabled. At the same time as there is no response for more than one month, the bug will be rejected. |