Hi guys, Lo and Behold, I managed to get S3 to work on my laptop by following instructions on this website. http://www.loria.fr/~thome/d600/ However, I've not tested it further(as in using the normal kernel boot up line in grub.conf) to verify if it is the same whether or not I use video=radeonfb. I'm getting these errors during Suspend to disk. Warning: CPU frequency out of sync: cpufreq and timingcore thinks of 600000, is 1400000 kHz. Debug: sleeping function called from invalid context at mm/slab.c:1989 in_atomic():1, irqs_disabled():0 [<c01156ac>] __might_sleep+0xa5/0xaf [<c0135d3f>] __kmalloc+0x40/0x76 [<c023bec9>] acpi_os_allocate+0xa/0xb [<c024fa1f>] acpi_ut_allocate+0x34/0x57 [<c024f9b0>] acpi_ut_initialize_buffer+0x42/0x7d [<c024c85e>] acpi_rs_create_byte_stream+0x23/0x39 [<c024dc1e>] acpi_rs_set_srs_method_data+0x1b/0x9d [<c011ecd4>] __group_send_sig_info+0x36/0x9d [<c011ee0a>] group_send_sig_info+0x58/0x61 [<c02556fd>] acpi_pci_link_set+0xde/0x155 [<c0255b0f>] irqrouter_resume+0x1c/0x30 [<c027bfff>] sysdev_resume+0x3e/0xc7 [<c027ea69>] device_power_up+0x5/0xa [<c012b332>] do_magic_resume_2+0x63/0xb9 [<c02db0d9>] copy_done+0x49/0x4c [<c012b536>] software_suspend+0x98/0xae [<c0252322>] acpi_system_write_sleep+0x6a/0x85 [<c014650c>] vfs_write+0xb8/0xe4 [<c01465d6>] sys_write+0x3c/0x62 [<c01059fd>] sysenter_past_esp+0x52/0x71 ACPI: PCI interrupt 0000:00:1d.0[A] -> GSI 11 (level, low) -> IRQ 11 PCI: Setting latency timer of device 0000:00:1d.0 to 64 ACPI: PCI interrupt 0000:00:1d.1[B] -> GSI 11 (level, low) -> IRQ 11 PCI: Setting latency timer of device 0000:00:1d.1 to 64 ACPI: PCI interrupt 0000:00:1d.2[C] -> GSI 11 (level, low) -> IRQ 11 PCI: Setting latency timer of device 0000:00:1d.2 to 64 ACPI: PCI interrupt 0000:00:1d.7[D] -> GSI 11 (level, low) -> IRQ 11 PCI: Setting latency timer of device 0000:00:1d.7 to 64 ACPI: PCI interrupt 0000:00:1f.1[A] -> GSI 11 (level, low) -> IRQ 11 ACPI: PCI interrupt 0000:00:1f.5[B] -> GSI 7 (level, low) -> IRQ 7 PCI: Setting latency timer of device 0000:00:1f.5 to 64 radeonfb: resumed ! ACPI: PCI interrupt 0000:02:00.0[A] -> GSI 11 (level, low) -> IRQ 11 Fixing swap signatures... ok I also noticed that now, suspend does not work as it should/or was when I was using 2.6.7 kernel. Right now, it will go to the Freeing memory section.. then the screen goes blank. (blacklight still on) and then stays that way for quite sometime before I see the words "radeonfb: resumed" and when that happens, it's as though time stood still. It won't start writing to the disk until I press a key. (I usually just press the spacebar) and then it will start writing and works OK from that. (I am writing on my laptop which is been suspended(hibernated) and resumed a few times already. Anything changed or broke?? -- Ow Mun Heng Fedora GNU/Linux Core 2 on D600 1.4Ghz CPU kernel 2.6.8.1.Aug.29 Neuromancer 01:06:38 up 1 day, 3:25, 1 user, load average: 0.73, 0.88, 0.93
Created attachment 3951 [details] proposed patch The patch works for me.
since no sure about if ACPI CA call some routines which may sleep in resume, I added many debug printk int the patches. Len and Venki, please give your comments.
same in 2.6.9?
Created attachment 3978 [details] patch to enable ACPICA run with IRQ disabled This patch tried to make ACPICA run with IRQ disabled. Main changes are to make kmalloc atomic and use down_trylock with 'in_atomic' and 'irq disabled'. I added some debug messages in the patch to alert us if ACPICA use sleep-routines in atomic situation. I'd like we keep the debug messages in final patch. Len, please consider merging it.
disabling the complaint when we kmalloc with interrupts off is probably not the right fix. other methods to change things so the interpreter can run with interrupts off is probably not the right fix. the right fix probably entails enabling interrupts at the point the interpreter is called upon to process AML to restore the PCI interrupt links. This has got to be similar to when do the same thing booting the system. If devices can't handle this, then we need to fix the device drivers.
No, open IRQ is completely wrong. When resume no process is running including kswapd, so kmalloc (it possibly need kswapd running) will cause unexpected error.
Len, I think the only choice here is to make ACPICA use non-sleep kmalloc or something similar memory allocation methods. If you really woried about the memory allocation failure, we can free (such as call 'drain_local_pages') some memory before suspend.
Or the other way around: allocating required memory *before* entering the non-sleep area (which would have the advantage that it would be even safer than hoping that we are able to re-alloc an area which has been freed directly before). But this is perhaps not possible, since quite likely the allocation would happen in an entirely different code area (the ACPI interpreter, right?). OTOH, this could be solved by supplying a pre-allocated (i.e. pre-suspend) memory area pointer parameter to those ACPI functions needing memory allocation, and they'd only alloc by themselves in case this parameter is supplied as NULL. This might easily still be not implementable, however (e.g. dynamic size requirements), and I probably don't know what I'm talking about anyway, for lack of ACPI involvement ;-) Anyway, good luck!
Yes, we actually discussed the idea (use memory pool). One point is we can't estimate the pre-allocate memory size. In addition, if we free some memory (like my comment #7), we can reach the same goal and no estimation is required.
Created attachment 4106 [details] new patch free memory before suspend This is a new patch. It shrink memory before entering suspend, so resume code have enough memory.
+ if (in_atomic() || irqs_disabled()) + return kmalloc(size, GFP_ATOMIC); return kmalloc(size, GFP_KERNEL); We can't do this. Operating systems can not rely on luck to work. + shrink_all_memory(100); Again, while it may address the symptom, the real solution must be deterministic. We have very similar tasks to perform on cold boot. Perhaps we can make the resume sequence more similar to cold boot?
Another idea would be PIC and IOAPIC don't resume, and device drivers resume code to re-init link devices. This is what you want, right? But this is broken too. We can't call any non-atomic kmalloc in resume code path (not just the irq disabled issue), since kmalloc would require kswap, which is frozen till all devices resume (they requir good link device) I actually don't understand why you said it's luck. Note, any kmalloc possibly will fail, not just atomic-kmalloc. Can you say it's luck?
It's not deterministic because shrink_all_memory can fail to free memory. Regards, Nigel
Is there an absolute maximum to the size of memory you'll need? I'm wondering if simply using GFP_ATOMIC all the time is the way to do. I'm not sure about STR, but swsusp eats all the memory it can, so any allocation shouldn't be a problem there. Suspend2 currently makes sure a reasonably large number of order 0 pages are available, plus one order 1 and one order 2 allocation (it uses its own memory pool to ensure the image is consistent, although I'm looking at whether I can do away with it).
We can't determine how many memory it requires. It diffs from BIOS to BIOS. But I guess several pages are sufficient.
To comment 12, <--quote I actually don't understand why you said it's luck. Note, any kmalloc possibly will fail, not just atomic-kmalloc. Can you say it's luck? quote--> To clearly understand the problem, we need to clarify what's the usage of atomic kmalloc. " GFP_ATOMIC has only __GFP_HIGH flag set. And It will use emergency pools (if they exit) but will not sleep .This flag would be used by an interrupt handler for example. " (quoted from ULVM 7.4) (Kmalloc is relying on slab allocator) Essentially, atomic-kmalloc is relying on memory pool, which seems to be unacceptable, if we cannot predicate how many memory needed for ACPI interpreter. If memory pool happens to satisfy the ACPI interpreter, it will success, otherwise, it will fail. Mabye this is called lucky. :-) Whatever solution, we need to know how many memory is needed to execute AML methods. Please note, those methods have been executed at boot up. At least we can know how many memory were allocated from acpi_os_allocate at boot time And, I believe ACPI interpreter also know the memory consumption if itself from the design point of view. A fundamental point is that if it boot up, then the physical memory is sufficient to the AML method's second time execution. From the first time execution , we can know how the exact memory requirement. Then, we can exactly allocate or reserve sufficient memory for the second time execution( then free). I'm sure this is not called lucky. :-)
If 'shrink_all_memory' failed, that means any other memory allocation method will fail, so we can actually ignore the case. If we are afraid of memory allocation failure, we can do something like this: shrink_all_memory(100); if (nr_free_pages() < 100) { printk("No enough memory, can't suspend\n"); return -ENOMEM; } I wonder if there is any feasible method to determine how many memory the code required and it actually isn't required here. A guess works ok here.
*** Bug 4091 has been marked as a duplicate of this bug. ***
Created attachment 4456 [details] proposed patch part 1
Created attachment 4457 [details] proposed patch - part 2
Created attachment 4458 [details] proposed patch - part 3
*** Bug 4159 has been marked as a duplicate of this bug. ***
Created attachment 4572 [details] call pci_enable_device from ne2k_pci_resume Hello, I had a similar warning when I suspended to disk a desktop machine with kernel 2.6.11-rc3-mm2. Applying the 3 proposed patches solved it, there was no more warnings. I had to patch the driver for my network card because I was getting some errors after resuming : NETDEV WATCHDOG: eth0: transmit timed out eth0: Tx timed out, lost interrupt? TSR=0x3, ISR=0x3, t=5114. Will these patches go to mainline or to -mm kernel tree ?
Created attachment 4634 [details] New patches Just a minor change against previous patches. Len, please merge it. The patches just fixed some drivers, we will fix more till we find more.
BTW, the package is a tgz file.
applied to acpi-test tree
Created attachment 5393 [details] patch to disable link device if no PCI device uses it.
shipped patch in comment #27 in 2.6.13-rc5 However, at the same time, Linus restored the blind link resume code.
Created attachment 5499 [details] Shaohua's patch vs 2.6.13-rc5 Linus accepted this patch after 2.6.13-rc5 to make the link reference count code not interfere with the restored blind link restore code.
Created attachment 5500 [details] Shaohua's patch vs 2.6.13-rc5+ This patch addresses the original symptom of the kmalloc oops by simply setting a flag during suspend and having ACPI use GPF_ATOMIC at that time. Hopefully the kmalloc will never fail... applied to-linus
above fix is in Linus' git tree as of today. Closing.
Created attachment 6403 [details] workaround patch Previous patches just workaround the kmalloc issue, but we actually have the semaphore issue. The two new patches (this onw and nex one) makes pci link device a platform device, so its suspend/resume methods can be invoked with interrupt disabled, which should also fix the semaphore issue.
Created attachment 6404 [details] workaround patch 2 This patch together with attachement 6403 should completely workaround the issue.
Unfortunately this pair of patches will open a window where interrutps are enabled but the Link Devices are not yet restored. It is possible for an interrupt to occur in that window and if it is directec to the wrong IRQ it can kill another device. Looks like we should instead move the acpi_in_resume local workaround into the system state global workaround....
*** Bug 5805 has been marked as a duplicate of this bug. ***
Created attachment 8514 [details] patch vs 2.6.18-rc1 applied to acpi-test tree.
Created attachment 8520 [details] patch vs 2.6.18-rc1 fix build error in previous patch, this one matches what was actually checked in.
This patch doesn't address a related problem. In drivers/acpi/osl.c, the acpi_os_wait_semaphore() routine will call either down() or schedule_timeout_interruptible(), unless timeout is set to 0. Neither behavior is good when interrupts are disabled. Also, what about all the GFP_KERNEL kmalloc calls in drivers/acpi/utils.c?
re: utils.c Can you identify any code that calls acpi_evaluate_integer() with interrupts off? This is not used in the course of evaluating a PCI Interrupt Link Device at resume time, which is the issue at hand. re: semaphore Yes, analogous issue. I think the answer is to BUG() when this case is hit, as it should be impossible.
akpm told us not to use system_state, so as of 2.6.18 we went with the original suggestion of using GFP_ATOMIC. Re: comment #38 acpi_evaluate_integer() in utils.c now looks like this: kzalloc(sizeof(union acpi_object), irqs_disabled() ? GFP_ATOMIC: GFP_KERNEL); acpi_os_wait_semaphore() in osl.c now starts with this: /* * This can be called during resume with interrupts off. * Like boot-time, we should be single threaded and will * always get the lock if we try -- timeout or not. * If this doesn't succeed, then we will oops courtesy of * might_sleep() in down(). */ if (!down_trylock(sem)) return AE_OK; --- closed.