Bug 3469 - S3 resume: PCI Interrupt Link Device oops
Summary: S3 resume: PCI Interrupt Link Device oops
Alias: None
Product: ACPI
Classification: Unclassified
Component: Power-Sleep-Wake (show other bugs)
Hardware: i386 Linux
: P2 high
Assignee: Len Brown
: 4091 4159 5805 (view as bug list)
Depends on:
Reported: 2004-09-26 21:26 UTC by Shaohua
Modified: 2007-03-08 21:21 UTC (History)
9 users (show)

See Also:
Kernel Version: 2.6.7
Regression: ---
Bisected commit-id:

proposed patch (2.86 KB, patch)
2004-11-03 21:48 UTC, Shaohua
Details | Diff
patch to enable ACPICA run with IRQ disabled (3.17 KB, patch)
2004-11-07 17:47 UTC, Shaohua
Details | Diff
new patch free memory before suspend (3.77 KB, patch)
2004-11-21 19:30 UTC, Shaohua
Details | Diff
proposed patch part 1 (1.13 KB, patch)
2005-01-24 16:54 UTC, Shaohua
Details | Diff
proposed patch - part 2 (2.13 KB, patch)
2005-01-24 16:55 UTC, Shaohua
Details | Diff
proposed patch - part 3 (1.69 KB, patch)
2005-01-24 16:57 UTC, Shaohua
Details | Diff
call pci_enable_device from ne2k_pci_resume (420 bytes, patch)
2005-02-19 06:04 UTC, Laurent Riffard
Details | Diff
New patches (2.63 KB, patch)
2005-03-02 21:49 UTC, Shaohua
Details | Diff
patch to disable link device if no PCI device uses it. (14.49 KB, patch)
2005-07-27 23:02 UTC, Shaohua
Details | Diff
Shaohua's patch vs 2.6.13-rc5 (773 bytes, patch)
2005-08-03 20:37 UTC, Len Brown
Details | Diff
Shaohua's patch vs 2.6.13-rc5+ (1.31 KB, patch)
2005-08-03 20:40 UTC, Len Brown
Details | Diff
workaround patch (4.17 KB, patch)
2005-10-28 02:20 UTC, Shaohua
Details | Diff
workaround patch 2 (4.21 KB, patch)
2005-10-28 02:21 UTC, Shaohua
Details | Diff
patch vs 2.6.18-rc1 (5.05 KB, patch)
2006-07-09 22:32 UTC, Len Brown
Details | Diff
patch vs 2.6.18-rc1 (5.08 KB, patch)
2006-07-10 11:29 UTC, Len Brown
Details | Diff

Description Shaohua 2004-09-26 21:26:48 UTC
Hi guys,
	Lo and Behold, I managed to get S3 to work on my laptop by following
instructions on this website.


However, I've not tested it further(as in using the normal kernel boot
up line in grub.conf) to verify if it is the same whether or not I use

I'm getting these errors during Suspend to disk.

Warning: CPU frequency out of sync: cpufreq and timingcore thinks of 600000, 
is 1400000 kHz.
Debug: sleeping function called from invalid context at mm/slab.c:1989
in_atomic():1, irqs_disabled():0
 [<c01156ac>] __might_sleep+0xa5/0xaf
 [<c0135d3f>] __kmalloc+0x40/0x76
 [<c023bec9>] acpi_os_allocate+0xa/0xb
 [<c024fa1f>] acpi_ut_allocate+0x34/0x57
 [<c024f9b0>] acpi_ut_initialize_buffer+0x42/0x7d
 [<c024c85e>] acpi_rs_create_byte_stream+0x23/0x39
 [<c024dc1e>] acpi_rs_set_srs_method_data+0x1b/0x9d
 [<c011ecd4>] __group_send_sig_info+0x36/0x9d
 [<c011ee0a>] group_send_sig_info+0x58/0x61
 [<c02556fd>] acpi_pci_link_set+0xde/0x155
 [<c0255b0f>] irqrouter_resume+0x1c/0x30
 [<c027bfff>] sysdev_resume+0x3e/0xc7
 [<c027ea69>] device_power_up+0x5/0xa
 [<c012b332>] do_magic_resume_2+0x63/0xb9
 [<c02db0d9>] copy_done+0x49/0x4c
 [<c012b536>] software_suspend+0x98/0xae
 [<c0252322>] acpi_system_write_sleep+0x6a/0x85
 [<c014650c>] vfs_write+0xb8/0xe4
 [<c01465d6>] sys_write+0x3c/0x62
 [<c01059fd>] sysenter_past_esp+0x52/0x71
ACPI: PCI interrupt 0000:00:1d.0[A] -> GSI 11 (level, low) -> IRQ 11
PCI: Setting latency timer of device 0000:00:1d.0 to 64
ACPI: PCI interrupt 0000:00:1d.1[B] -> GSI 11 (level, low) -> IRQ 11
PCI: Setting latency timer of device 0000:00:1d.1 to 64
ACPI: PCI interrupt 0000:00:1d.2[C] -> GSI 11 (level, low) -> IRQ 11
PCI: Setting latency timer of device 0000:00:1d.2 to 64
ACPI: PCI interrupt 0000:00:1d.7[D] -> GSI 11 (level, low) -> IRQ 11
PCI: Setting latency timer of device 0000:00:1d.7 to 64
ACPI: PCI interrupt 0000:00:1f.1[A] -> GSI 11 (level, low) -> IRQ 11
ACPI: PCI interrupt 0000:00:1f.5[B] -> GSI 7 (level, low) -> IRQ 7
PCI: Setting latency timer of device 0000:00:1f.5 to 64
radeonfb: resumed !
ACPI: PCI interrupt 0000:02:00.0[A] -> GSI 11 (level, low) -> IRQ 11
Fixing swap signatures... ok

I also noticed that now, suspend does not work as it should/or was when
I was using 2.6.7 kernel. 

Right now, it will go to the Freeing memory section.. then the screen
goes blank. (blacklight still on) and then stays that way for quite
sometime before I see the words "radeonfb: resumed" and when that
happens, it's as though time stood still. It won't start writing to the
disk until I press a key. (I usually just press the spacebar) and then
it will start writing and works OK from that. (I am writing on my laptop
which is been suspended(hibernated) and resumed a few times already.

Anything changed or broke??

Ow Mun Heng
Fedora GNU/Linux Core 2 on D600 1.4Ghz CPU kernel 
Neuromancer 01:06:38 up 1 day, 3:25, 1 user, load average: 0.73, 0.88,
Comment 1 Shaohua 2004-11-03 21:48:35 UTC
Created attachment 3951 [details]
proposed patch

The patch works for me.
Comment 2 Shaohua 2004-11-03 21:51:46 UTC
since no sure about if ACPI CA call some routines which may sleep in resume, I 
added many debug printk int the patches. Len and Venki, please give your 
Comment 3 Len Brown 2004-11-04 01:18:04 UTC
same in 2.6.9?
Comment 4 Shaohua 2004-11-07 17:47:11 UTC
Created attachment 3978 [details]
patch to enable ACPICA run with IRQ disabled

This patch tried to make ACPICA run with IRQ disabled. Main changes are to make
kmalloc atomic and use down_trylock with 'in_atomic' and 'irq disabled'.

I added some debug messages in the patch to alert us if ACPICA use
sleep-routines in atomic situation. I'd like we keep the debug messages in
final patch. 

Len, please consider merging it.
Comment 5 Len Brown 2004-11-10 19:20:27 UTC
disabling the complaint when we kmalloc with interrupts off
is probably not the right fix.

other methods to change things so the interpreter can run
with interrupts off is probably not the right fix.

the right fix probably entails enabling interrupts at the
point the interpreter is called upon to process AML
to restore the PCI interrupt links.  This has got
to be similar to when do the same thing booting the system.

If devices can't handle this, then we need to fix the device drivers.
Comment 6 Shaohua 2004-11-10 19:25:48 UTC
No, open IRQ is completely wrong. When resume no process is running including 
kswapd, so kmalloc (it possibly need kswapd running) will cause unexpected 
Comment 7 Shaohua 2004-11-16 18:26:26 UTC
Len, I think the only choice here is to make ACPICA use non-sleep kmalloc or 
something similar memory allocation methods.
If you really woried about the memory allocation failure, we can free (such as 
call 'drain_local_pages') some memory before suspend.
Comment 8 Andreas Mohr 2004-11-17 01:16:07 UTC
Or the other way around: allocating required memory *before* entering the
non-sleep area (which would have the advantage that it would be even safer than
hoping that we are able to re-alloc an area which has been freed directly
before). But this is perhaps not possible, since quite likely the allocation
would happen in an entirely different code area (the ACPI interpreter, right?).
OTOH, this could be solved by supplying a pre-allocated (i.e. pre-suspend)
memory area pointer parameter to those ACPI functions needing memory allocation,
and they'd only alloc by themselves in case this parameter is supplied as NULL.
This might easily still be not implementable, however (e.g. dynamic size
requirements), and I probably don't know what I'm talking about anyway, for lack
of ACPI involvement ;-)
Anyway, good luck!
Comment 9 Shaohua 2004-11-17 01:23:37 UTC
Yes, we actually discussed the idea (use memory pool). One point is we can't 
estimate the pre-allocate memory size. In addition, if we free some memory 
(like my comment #7), we can reach the same goal and no estimation is required.
Comment 10 Shaohua 2004-11-21 19:30:02 UTC
Created attachment 4106 [details]
new patch free memory before suspend

This is a new patch. It shrink memory before entering suspend, so resume code
have enough memory.
Comment 11 Len Brown 2004-12-01 19:21:05 UTC
+	if (in_atomic() || irqs_disabled())
+		return kmalloc(size, GFP_ATOMIC);
 	return kmalloc(size, GFP_KERNEL);

We can't do this.
Operating systems can not rely on luck to work.

+ shrink_all_memory(100);

Again, while it may address the symptom,
the real solution must be deterministic.

We have very similar tasks to perform on cold boot.
Perhaps we can make the resume sequence more similar to cold boot?
Comment 12 Shaohua 2004-12-01 19:31:33 UTC
Another idea would be PIC and IOAPIC don't resume, and device drivers resume 
code to re-init link devices. This is what you want, right?
But this is broken too. We can't call any non-atomic kmalloc in resume code 
path (not just the irq disabled issue), since kmalloc would require kswap, 
which is frozen till all devices resume (they requir good link device)

I actually don't understand why you said it's luck. Note, any kmalloc possibly 
will fail, not just atomic-kmalloc. Can you say it's luck?
Comment 13 Nigel Cunningham 2004-12-01 19:37:48 UTC
It's not deterministic because shrink_all_memory can fail to free memory.


Comment 14 Nigel Cunningham 2004-12-01 19:49:59 UTC
Is there an absolute maximum to the size of memory you'll need?

I'm wondering if simply using GFP_ATOMIC all the time is the way to do.

I'm not sure about STR, but swsusp eats all the memory it can, so any allocation 
shouldn't be a problem there. Suspend2 currently makes sure a reasonably large 
number of order 0 pages are available, plus one order 1 and one order 2 
allocation (it uses its own memory pool to ensure the image is consistent, 
although I'm looking at whether I can do away with it).
Comment 15 Shaohua 2004-12-01 20:18:31 UTC
We can't determine how many memory it requires. It diffs from BIOS to BIOS. 
But I guess several pages are sufficient.
Comment 16 Luming Yu 2004-12-01 23:44:10 UTC
To comment 12,
I actually don't understand why you said it's luck. Note, any kmalloc 
possibly  will fail, not just atomic-kmalloc. Can you say it's luck? 

To clearly understand the problem, we need to clarify what's the usage of
atomic kmalloc. " GFP_ATOMIC has only __GFP_HIGH flag set. And It will use
emergency pools (if they exit) but will not sleep .This flag would be used
by an interrupt handler for example. " (quoted from ULVM 7.4)

(Kmalloc is relying on slab allocator)

Essentially, atomic-kmalloc is relying on memory pool, which seems to be 
unacceptable, if we cannot predicate how many memory needed for ACPI 
If memory pool happens to satisfy the ACPI interpreter, it will success, 
otherwise, it will fail.  Mabye this is called lucky. :-)

Whatever solution, we need to know how many memory is needed to execute AML 
methods. Please note, those methods have been executed at boot up. At least we 
can know how many memory were allocated from acpi_os_allocate at boot time 
And, I believe ACPI interpreter also know the memory consumption if itself 
from the design point of view. 

A fundamental point is that if it boot up, then the physical memory is 
sufficient to the AML method's second time execution.
From the first time execution , we can know how the exact memory requirement.
Then, we can exactly allocate or reserve sufficient memory for the second time 
execution( then free).  I'm sure this is not called lucky. :-)

Comment 17 Shaohua 2004-12-02 00:48:51 UTC
If 'shrink_all_memory' failed, that means any other memory allocation method 
will fail, so we can actually ignore the case. If we are afraid of memory 
allocation failure, we can do something like this:

if (nr_free_pages() < 100) {
    printk("No enough memory, can't suspend\n");
    return -ENOMEM;

I wonder if there is any feasible method to determine how many memory the code 
required and it actually isn't required here. A guess works ok here.
Comment 18 Shaohua 2005-01-24 16:47:52 UTC
*** Bug 4091 has been marked as a duplicate of this bug. ***
Comment 19 Shaohua 2005-01-24 16:54:34 UTC
Created attachment 4456 [details]
proposed patch part 1
Comment 20 Shaohua 2005-01-24 16:55:54 UTC
Created attachment 4457 [details]
proposed patch - part 2
Comment 21 Shaohua 2005-01-24 16:57:24 UTC
Created attachment 4458 [details]
proposed patch - part 3
Comment 22 Venkatesh Pallipadi 2005-02-07 15:25:19 UTC
*** Bug 4159 has been marked as a duplicate of this bug. ***
Comment 23 Laurent Riffard 2005-02-19 06:04:33 UTC
Created attachment 4572 [details]
call pci_enable_device from ne2k_pci_resume


I had a similar warning when I suspended to disk a desktop machine with kernel

Applying the 3 proposed patches solved it, there was no more warnings.

I had to patch the driver for my network card because I was getting some errors
after resuming :
NETDEV WATCHDOG: eth0: transmit timed out
eth0: Tx timed out, lost interrupt? TSR=0x3, ISR=0x3, t=5114.

Will these patches go to mainline or to -mm kernel tree ?
Comment 24 Shaohua 2005-03-02 21:49:16 UTC
Created attachment 4634 [details]
New patches

Just a minor change against previous patches. Len, please merge it. The patches
just fixed some drivers, we will fix more till we find more.
Comment 25 Shaohua 2005-03-02 21:56:09 UTC
BTW, the package is a tgz file.
Comment 26 Len Brown 2005-03-18 13:55:36 UTC
applied to acpi-test tree 
Comment 27 Shaohua 2005-07-27 23:02:22 UTC
Created attachment 5393 [details]
patch to disable link device if no PCI device uses it.
Comment 28 Len Brown 2005-08-03 20:34:42 UTC
shipped patch in comment #27 in 2.6.13-rc5
However, at the same time, Linus restored the
blind link resume code.
Comment 29 Len Brown 2005-08-03 20:37:19 UTC
Created attachment 5499 [details]
Shaohua's patch vs 2.6.13-rc5

Linus accepted this patch after 2.6.13-rc5 to make the
link reference count code not interfere with the
restored blind link restore code.
Comment 30 Len Brown 2005-08-03 20:40:01 UTC
Created attachment 5500 [details]
Shaohua's patch vs 2.6.13-rc5+

This patch addresses the original symptom of the kmalloc oops
by simply setting a flag during suspend and having ACPI
use GPF_ATOMIC at that time.  Hopefully the kmalloc will
never fail...

applied to-linus
Comment 31 Len Brown 2005-08-04 13:17:39 UTC
above fix is in Linus' git tree as of today.  Closing.
Comment 32 Shaohua 2005-10-28 02:20:32 UTC
Created attachment 6403 [details]
workaround patch

Previous patches just workaround the kmalloc issue, but we actually have the
semaphore issue. The two new patches (this onw and nex one) makes pci link
device a platform device, so its suspend/resume methods can be invoked with
interrupt disabled, which should also fix the semaphore issue.
Comment 33 Shaohua 2005-10-28 02:21:58 UTC
Created attachment 6404 [details]
workaround patch 2

This patch together with attachement 6403 should completely workaround the
Comment 34 Len Brown 2005-11-16 18:45:03 UTC
Unfortunately this pair of patches will open a window
where interrutps are enabled but the Link Devices are
not yet restored.  It is possible for an interrupt to
occur in that window and if it is directec to the wrong
IRQ it can kill another device.

Looks like we should instead move the acpi_in_resume
local workaround into the system state global workaround....
Comment 35 Shaohua 2006-01-11 18:55:20 UTC
*** Bug 5805 has been marked as a duplicate of this bug. ***
Comment 36 Len Brown 2006-07-09 22:32:37 UTC
Created attachment 8514 [details]
patch vs 2.6.18-rc1

applied to acpi-test tree.
Comment 37 Len Brown 2006-07-10 11:29:25 UTC
Created attachment 8520 [details]
patch vs 2.6.18-rc1

fix build error in previous patch, this one matches what was actually checked
Comment 38 Alan Stern 2006-07-10 11:58:40 UTC
This patch doesn't address a related problem.  In drivers/acpi/osl.c, the
acpi_os_wait_semaphore() routine will call either down() or
schedule_timeout_interruptible(), unless timeout is set to 0.  Neither behavior
is good when interrupts are disabled.

Also, what about all the GFP_KERNEL kmalloc calls in drivers/acpi/utils.c?
Comment 39 Len Brown 2006-07-10 12:06:12 UTC
re: utils.c
Can you identify any code that calls acpi_evaluate_integer()
with interrupts off?  This is not used in the course of evaluating
a PCI Interrupt Link Device at resume time, which is the issue at hand.

re: semaphore
Yes, analogous issue.  I think the answer is to BUG() when this
case is hit, as it should be impossible.
Comment 40 Len Brown 2007-03-08 21:21:45 UTC
akpm told us not to use system_state,
so as of 2.6.18 we went with the original suggestion of using GFP_ATOMIC.

Re: comment #38
acpi_evaluate_integer() in utils.c now looks like this:
kzalloc(sizeof(union acpi_object), irqs_disabled() ? GFP_ATOMIC: GFP_KERNEL);

acpi_os_wait_semaphore() in osl.c now starts with this:
         * This can be called during resume with interrupts off.
         * Like boot-time, we should be single threaded and will
         * always get the lock if we try -- timeout or not.
         * If this doesn't succeed, then we will oops courtesy of
         * might_sleep() in down().
        if (!down_trylock(sem))
                return AE_OK;


Note You need to log in before you can comment on or make changes to this bug.