Distribution: SUSE 9.2 x86-64
Hardware Environment: Asus L5D
Problem Description: The box hangs solid while resuming from disk, after the
system image has been restored. Apparently, the hang occurs as a result of the
call to pci_write_config_word() with pmcsr = 0 in
drivers/pci/pci.c:pci_set_power_state(). In fact, the function
pci_write_config_word() completes successfully but the box hangs during
msleep() afterwards. It cartainly is related to the fact that ACPI PCI links
are not set before pci_set_power_state() is called.
Steps to reproduce: Suspend the box to disk and try to resume it. The problem
is 100% reproducible.
Created attachment 4812 [details]
Output of acpidmp on Asus L5D
Attached is the output of acpidmp (the 2.6.12-rc1-mm3 kernel) on the affected
Created attachment 4813 [details]
The path that needs to be reverted to fix the problem
Attached it the patch from the ACPI tree that needs to be reverted to fix the
The problem seems to be related to the chipset (NVidia NForce 3) and/or BIOS,
as I am not able to reproduce it on another AMD64-based box with the AMD-81x1
Oh well, could you please also attach the dmesg? I suspect we should disable
link device at suspend, but not very sure if it can help. But first I'd like
to know which mode (pic/apic) the box is.
I didn't see the issue on all boxes I tried, so it is quite possible a
Created attachment 4817 [details]
The box is in the PIC mode (sorry, I should have said this earlier), because
there are problems with IO-APIC on it.
The attached dmesg output is from the kernel with the patch reverted.
ACPI: OEMB ...
>>> ERROR: Invalid checksum
ACPI: DSDT ...
Please verify that this machine is running the latest available BIOS.
Could you please try below debug patch, it disable link device at suspend. If
with the patch the system reports an oops caused by 'sleeping at atomic',
please just ignored it. If it works, we will figure out a real fix.
diff -puN drivers/acpi/pci_link.c~disable_link_device drivers/acpi/pci_link.c
--- 2.5/drivers/acpi/pci_link.c~disable_link_device 2005-03-31
+++ 2.5-root/drivers/acpi/pci_link.c 2005-03-31 13:28:12.101036184 +0800
@@ -733,6 +733,7 @@ irqrouter_suspend(
if (link->irq.active && link->irq.initialized)
link->irq.suspend_resume = 1;
+ acpi_ut_evaluate_object(link->handle, "_DIS", 0, NULL);
Created attachment 4828 [details]
Serial console log from 2.6.12-rc1-mm2
Referring to Comment #6:
Yes, it does. Unfortunately, the BIOS is buggy (eg the processor description
in the DSDT is wrong etc.).
Referring to Comment #7:
This patch seems to help, but it causes some funny side-effects to appear.
Attached is the serial console log from 2.6.12-rc1-mm2 with the patch applied.
Referring to Comment #8:
Please ignore the "acpi_pci_link_allocate(): irq.active = ..." lines in the
log. They come from a printk that I've added to acpi_pci_link_allocate().
Referring to Comment #8:
Er, one more mistake of the same kind. Please ignore the
"pci_set_power_state(): pmcsr = ..." and "pci_write_config_word() successful"
lines in the log. They come from printks that I've added to
Great, looks like we should disable link device.
Does any device work (except the devices with irq 11) with the patch? and how
about unload the sound card driver?
Created attachment 4830 [details]
dmesg output from 2.6.12-rc1-mm2 with the debug patch
First, the log that I have posted is not a typical one, it appears.
Typically, if I don't unload the sound card driver before suspend, IRQ 11 is
disabled and the box goes into a neverending loop while suspending devices (ie
during suspend), but magic SysRq works then.
If I unload the sound card driver before suspend, both IRQ 10 and IRQ 11 are
disabled and the box suspends and resumes sucessfully (IRQ 10 and 11 are
shared). The devices that are not on either IRQ 10 or IRQ 11 seem to work
Attached is a log of a typical boot-suspend-resume cycle, from dmesg.
Created attachment 4831 [details]
Contents of /proc/interrupts
Attached is the contents of /proc/interrupts on the box.
Referring to Comment #12:
A correction: The infinite loop that occurs if the sound card driver is not
unloaded before suspend actually starts in device_resume() called from
kernel/power/swsusp.c:swsusp_write(), which is while _resuming_ devices during
Ok, maybe we should mask all PIC/IOAPIC pin at suspend, and only enable them
till a device really start using them.
Please also try below debug patch (with previous one). I'd like to see if it's
a PIC issue.
diff -puN arch/i386/kernel/i8259.c~disable_link_device arch/i386/kernel/i8259.c
--- 2.5/arch/i386/kernel/i8259.c~disable_link_device 2005-04-01
+++ 2.5-root/arch/i386/kernel/i8259.c 2005-04-01 13:15:10.719027816 +0800
@@ -259,6 +259,8 @@ static void save_ELCR(char *trigger)
static int i8259A_resume(struct sys_device *dev)
+ /* mask interrupt 10,11 */
+ trigger &= 0xF3;
Referring to Comment #16:
This patch does not seem to change anything.
Hmm, maybe the .suspend should call free_irq and .resume should call
request_irq, so we mask some interrupts till devices request them.
Pavel, the PIC suspend/resume seems quite wrong to me. We should enable
PIC/IRQ router per device's quest. A blind guess will have a small window
which causes unexpected interrupts occur.
I can confirm that the problem does not occur in the APIC mode (tested on
The problem is present in 2.6.12-rc3-mm2 (PIC mode).
Created attachment 5034 [details]
Hi Rafael, could you please try the debug patch. Please unload (never loading
the drivers is preferred since some drivers' unload routine isn't good) all PCI
drivers except yenta (my patch just changed yenta driver). I'd like to konw if
yenta is ok after resume. The patche does:
1. if no driver uses PCI irq router, the router will be disabled so it can
2. PCI drivers call pci_disable_device and free_irq at suspend, so the
pic/ioapic is masked at suspend.
The goal of the patch is to make ioapic/pic/irq router just like the boot time
at suspend (boot time the devices are disabled).
Add Nigel in the list, since this also impacts suspend2.
Created attachment 5037 [details]
Patch to get the previous patch work on x86-64
Referring to Comment #21:
With the patch applied the box has suspended and resumed successfully. :-)
However, I was not quite sure which kernel you wanted me to test, so I tested
2.6.12-rc3-mm2 (2.6.12-rc3-mm3 does not boot on my box). If you want me to
test another one, please let me know.
Also, the system here is an x86-64, and your patch is against i386, apparently,
so I needed to apply some changes manually to arch/x86-64/kernel/mpparse.c (it
seems to be very similar to the i386 version, so I just copied the changes from
your patch - with some minor modifications). The attached patch contains the
changes that I made.
I booted the box to the runlevel 3 (ie no X). The network and USB drivers were
not loaded. Before suspend, I had unloaded the sound card driver manually.
The yenta driver was in memory all the time.
-mm2 should be ok. Any tree which includes the patch mentioned at comment 2
(the 'delete link resume' patch) is ok.
Pavel, Len & Nigel, did you have any objection on this proposal? Maybe I
should drop a mail to pm list.
Yes, please drop a mail to the pm list. Hopefully it'll get more
David, do you want me to comment on
havig u8 reserved:6 seems little silly to me. Otherwise it looks okay.
Thanks Pavel. I'd like consider the proposal in comment 21, but I'll send an
email to pm list to detailly explain the proposal soon, so please review it
Hi Rafael, without the patch does the system work only with yenta loaded?
I'd like show people a case that 'not call free_irq' isn't good for some
Unfortunately (?), without the patch and with only yenta loaded the system
appears to work.
As of 2.6.12-rc3-mm2, the drivers that hang the kernel during resume (sometimes
also during suspend) are:
ohci-hcd (shares IRQ with yenta)
Can we have an update on this please? Is 2.6.12-rc5 fixed?
The 2.6.12-rc5 kernel is unaffected, as the problem is only present in -mm.
The problem is present in 2.6.12-rc5-mm1. Apparently, the following drivers
The box hangs solid during resume or suspend while executing the drivers'
The problem is not present in 2.6.12-rc5-mm2
As described in the bug logs, this issue is now fixed.
Unfortunately, not quite so. It's back in 2.6.13-rc2-mm2.
The problem has made it to mainline, as it is present in 2.6.13-rc3.
I've just had some time to test this a bit more on Linux-2.6.13-rc3-git5 with a
patch that frees the IRQ in _resume() and requests one in _suspend() for
After a fresh reboot the configuration of IRQs is the following:
0: 31855 XT-PIC timer
1: 262 XT-PIC i8042
2: 0 XT-PIC cascade
5: 2 XT-PIC ehci_hcd:usb3
8: 0 XT-PIC rtc
9: 280 XT-PIC acpi
10: 194 XT-PIC SysKonnect SK-98xx, NVidia nForce3
11: 42 XT-PIC ohci_hcd:usb1, ohci_hcd:usb2, yenta, yenta
12: 1115 XT-PIC i8042
14: 719 XT-PIC ide0
15: 6425 XT-PIC ide1
So, the involved PCI drivers are: ehci_hcd (IRQ5), sk98lin (IRQ10),
snd_intel8x0 (IRQ10), ohci_hcd (IRQ11), yenta_socket (IRQ11).
Evidently, if only one (arbitrary) of these drivers is loaded before resume,
the box always suspends and resumes successfully, so the problem does not seem
to be directly related to what these drivers do in their _suspend/_resume
However, there are the following combinations of drivers that cause the box to
hang solid during resume:
snd_intel8x0 + ohci_hcd
snd_intel8x0 + ehci_hcd
sk98lin + ohci_hcd
sk98lin + ehci_hcd
The remaining combinations of drivers are apparently OK wrt suspend/resume.
Thus it looks like the box hangs during resume if IRQ10 is occupied (by
anything) _and_ a USB controller is active at the same time.
It is also apparent that the box hangs because of the call to
pci_write_config_word() with pmcsr = 0 in
drivers/pci/pci.c:pci_set_power_state() called from
drivers/usb/core/usb/hcd-pci.c:usb_hcd_pci_resume() for either the ehci_hcd or
the ohci_hcd driver which happens _before_ the driver has a chance to request
an IRQ. It is also the first driver for which the _resume() routine is called
so any other driver cannot request its IRQ earlier.
Now it seems to me that, as a result of the call to pci_set_power_state(), an
IRQ (IRQ10?) is generated which is then mishandled and this causes the box to
Created attachment 5371 [details]
Patch to fix the problem on Asus L5D
The attached patch (the Yenta part is not mine) fixes the problem on my box,
which has been tested on both 2.6.13-rc3-git5 and -git6.
As I said in Comment #36, I think that calling pci_set_power_state() from
usb_hcd_pci_resume() on my box causes an interrupt to occur which is mishandled
if at least one driver has not freed IRQ10 during suspend.
Apparently, all PCI drivers _must_ call free_irq() from their suspend routines
or some unrelated interrupts may be mishandled during resume.
As far as I am concerned, the problem is resolved.
Great analysis, Rafael. Please push it to upstream kernel. I guess there will
be many similar patches recently. People seem agree we should
free_irq/request_irq in suspend/resume time.
We sent this one to 2.6.13-rc5 and caused quite a stir,
resulting in Linus restoring the blind link restore code.
So this failure should be masked for 2.6.13 -- please re-open
if 2.6.13 is still an issue.