Bug 15407
Description
Bryn Hughes
2010-02-27 00:31:24 UTC
please attach 1. the dmesg output after boot 2. the acpidump output 3. the content of "/proc/interrupts" before suspend. is this a regression? I mean are you aware of any previous kernel release that doesn't have this problem? If yes, what's the earliest kernel that has this problem. Created attachment 25314 [details]
output of dmesg from Thinkpad W510 (4319-29G)
Created attachment 25315 [details]
output of acpidump from Thinkpad W510 (4319-29G)
Created attachment 25316 [details]
/proc/interrupts from Thinkpad W510 (4319-29G)
I observed the same behaviour like Bryn on a Thinkpad W510 (4319-29G). The problem occurs with 2.6.32 and 2.6.33. I did not test any previous kernel versions. Created attachment 25320 [details]
Output of dmesg on W510 (4318-CTO)
Created attachment 25321 [details]
Output of acpidump on W510 (4318-CTO)
Created attachment 25322 [details]
/proc/interrupts on W510 (4318-CTO)
I have tested on 2.6.31, 2.6.32 and 2.6.33 - all versions currently have the same behaviour. please also attach all the files in /sys/firmware/acpi/tables/dynamic. you can get the table by running "cat /sys/firmware/acpi/tables/dynamic/SSDTx > ssdtx.dat". please attach the dmesg output after the first resume. please attach the screen shot when the system hangs during the second resume. It is impossible to provide a screenshot of the second resume unfortunately - the system gets dumped back to a BIOS screen that just has the Thinkpad logo, but is completely frozen. Linux is not running anymore at that point. The other attachments will be uploaded shortly... Created attachment 25332 [details]
Contents of /sys/firmware/acpi/tables/dynamic/SSDT* (pre and post suspend)
Created attachment 25333 [details]
dmesg output post resume
[ 2.068747] ------------[ cut here ]------------ [ 2.068759] WARNING: at /build/buildd/linux-2.6.32/arch/x86/kernel/hpet.c:392 hpet_next_event+0x7a/0x90() [ 2.068763] Hardware name: 4318CTO [ 2.068765] Modules linked in: [ 2.068770] Pid: 0, comm: swapper Not tainted 2.6.32-15-generic #22-Ubuntu [ 2.068773] Call Trace: [ 2.068781] [<ffffffff81064f8b>] warn_slowpath_common+0x7b/0xc0 [ 2.068787] [<ffffffff81064fe4>] warn_slowpath_null+0x14/0x20 [ 2.068791] [<ffffffff810375da>] hpet_next_event+0x7a/0x90 [ 2.068796] [<ffffffff81037620>] hpet_legacy_next_event+0x10/0x20 [ 2.068804] [<ffffffff81090f94>] clockevents_program_event+0x54/0xa0 [ 2.068810] [<ffffffff810924c8>] tick_dev_program_event+0x48/0xd0 [ 2.068816] [<ffffffff81091e1e>] tick_broadcast_oneshot_control+0x11e/0x120 [ 2.068821] [<ffffffff81091630>] tick_notify+0x130/0x200 [ 2.068829] [<ffffffff81561cf7>] notifier_call_chain+0x47/0x90 [ 2.068835] [<ffffffff81088566>] raw_notifier_call_chain+0x16/0x20 [ 2.068841] [<ffffffff81090db7>] clockevents_notify+0x37/0x160 [ 2.068847] [<ffffffff813080a1>] lapic_timer_state_broadcast+0x46/0x48 [ 2.068852] [<ffffffff81308609>] acpi_idle_enter_bm+0x180/0x2b7 [ 2.068857] [<ffffffff81561cc6>] ? notifier_call_chain+0x16/0x90 [ 2.068863] [<ffffffff8144c9e7>] cpuidle_idle_call+0xa7/0x140 [ 2.068870] [<ffffffff81011ea3>] cpu_idle+0xb3/0x110 [ 2.068877] [<ffffffff81558f69>] start_secondary+0xa8/0xaa [ 2.068881] ACPI Warning for \_PR_.CPU0._PSD: Return Package type mismatch at index 4 - found [NULL Object Descriptor], expected Integer [ 2.068894] ---[ end trace 0bcff51cc3f6d229 ]--- [ 2.068898] (20090903/nspredef-1012) [ 2.068910] ACPI: Invalid _PSD data [ 2.069070] ACPI Warning for \_PR_.CPU1._PSD: Return Package type mismatch at index 4 - found [NULL Object Descriptor], expected Integer (20090903/nspredef-1012) [ 2.069084] ACPI: Invalid _PSD data [ 2.069239] ACPI Warning for \_PR_.CPU2._PSD: Return Package type mismatch at index 4 - found [NULL Object Descriptor], expected Integer (20090903/nspredef-1012) [ 2.069252] ACPI: Invalid _PSD data [ 2.069406] ACPI Warning for \_PR_.CPU3._PSD: Return Package type mismatch at index 4 - found [NULL Object Descriptor], expected Integer (20090903/nspredef-1012) [ 2.069419] ACPI: Invalid _PSD data [ 2.069572] ACPI Warning for \_PR_.CPU4._PSD: Return Package type mismatch at index 4 - found [NULL Object Descriptor], expected Integer (20090903/nspredef-1012) [ 2.069585] ACPI: Invalid _PSD data [ 2.069739] ACPI Warning for \_PR_.CPU5._PSD: Return Package type mismatch at index 4 - found [NULL Object Descriptor], expected Integer (20090903/nspredef-1012) [ 2.069752] ACPI: Invalid _PSD data [ 2.069905] ACPI Warning for \_PR_.CPU6._PSD: Return Package type mismatch at index 4 - found [NULL Object Descriptor], expected Integer (20090903/nspredef-1012) [ 2.069918] ACPI: Invalid _PSD data [ 2.070071] ACPI Warning for \_PR_.CPU7._PSD: Return Package type mismatch at index 4 - found [NULL Object Descriptor], expected Integer (20090903/nspredef-1012) [ 2.070084] ACPI: Invalid _PSD data please blacklist processor.ko and boot with boot option "hpet=disable" and see if the error messages above goes away. And check if the problem still exists. Unfortunately the acpi processor module is compiled in to my kernel (Ubuntu) - I can compile a fresh one if you feel it would help. hpet=disable removes the hpet-related message, but the system starts running poorly - it runs slower and it has poor video performance at times. oh, and hpet=disable does not correct the problem on resume... (In reply to comment #15) > Unfortunately the acpi processor module is compiled in to my kernel (Ubuntu) > - > I can compile a fresh one if you feel it would help. > yes, please. thanks. BTW: does the second resume still hangs if you boot with "irqpoll"? please attach the dmesg output after the first resume (boot with irqpoll). Yes, the second resume also hangs when irqpoll is added to the kernel command line, see the attachment. Created attachment 25393 [details]
dmesg output after 1st resume when "irqpoll" is used
OK, was able to compile a kernel with ACPI_PROCESSOR as a module and then blacklist it. Confirmed that it was NOT loaded... The "Invalid PSD Data" messages do go away with ACPI_PROCESSOR blacklisted, but the system still gets the same error message on resume, and can't resume a second time. I can confirm this is also the case for the Lenovo x201s. I have tried kernels 2.6.32-2, 2.6.33-1 and 2.6.34-rc4, they all exhibit the same problem. I would be happy to provide more info, please instruct what is needed. (In reply to comment #21) > I can confirm this is also the case for the Lenovo x201s. I have tried > kernels > 2.6.32-2, 2.6.33-1 and 2.6.34-rc4, they all exhibit the same problem. I > would > be happy to provide more info, please instruct what is needed. 2.6.34-rc2 sorry typo I notice the status is still 'NEEDINFO' - what other info can we provide? We have been working with Lenovo. Lenovo is resolving issues with their bios now. More info is in Launchpad where we are tracking this issue: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/532374 Bryn, Stefan, others who can reproduce this bug: Please try booting with this kernel command line option and see if it helps: acpi_sleep=sci_force_enable Please report back your results. Thanks. Created attachment 26053 [details]
add various lenovo machines to acpisleep_dmi_table
Please try this patch, which adds the Lenovo laptops to acpisleep_dmi_table.
We have some information from Lenovo BIOS engineers that indicates forcing OSPM to write to SCI_EN after resume is what Windows does.
@Alex For this issue it may be best to not have these in the kernel since Lenovo is resolving the issue in their updated bios. This is mainly reserved for bioses that have no hope of changing I believe. Sorry to hijack that thread, but I have very similar issue on my acer aspire 5720. It is well known that bios hangs on second resume from ram. The 'acpi_sleep=sci_force_enable' doesn't help, and nether does milliard or other things I tried (I am not a newbie, I even have some kernel code written) So the only option left is to contact acer's bios team. Maybe you know which strings to pull to make that possible? Jerone: Given that the BIOS can't be updated under Linux, and given that there's no indication to the users that there's a BIOS bug, it makes sense to just make the software work. When hardware works fine under another OS then that indicates that we need to make Linux mimic the behaviour of that OS to the closest extent practical. @Matthew Actually you can update the bios. For thinkpads they provide an DOS based iso for updating the bios. Also this just happened to be a case where things feel out of the ACPI specification by accident. Under Windows 7 they just force this ... but I'm not sure what the consequences are of doing this on other systems. @Maxim I can ask some folks working for us who work with acer. But no promises. Jerone: The X201 has no optical drive. @Matthew You use a usb cdrom. I just did this to an X201. So people who don't own a USB optical drive get stuck with a broken OS? That's less than optimal. @Matthew Also there is a issue with USB after suspsend that we are looking to fix in the BIOS as well. So you will need to update anyways. If you have a problem with bios updating methods. It's best to take it up with Lenovo. Th USB bug appears to be worked-around by unloading and reloading the hci drivers, which is an indication that we're doing something wrong in our resume path. We should fix the bug, not get vendors to release BIOS updates. @Matthew Hmm .. Don't want to side track this bug. But I tried this on a T410 and that did not help the USB bug. Will try again We are tracking & discussing on launchpad here: https://bugs.launchpad.net/oem-priority/+bug/566149 matthew: googling around suggests that grub4dos can actually boot ISO images directly, which may be a way to do the BIOS flash for people with no optical drive. It's hedged around, but sounds like this might be the kind of ISO for which it'd work: http://diddy.boot-land.net/grub4dos/files/map.htm#hd32 Just a note on updating: the best method actually seems http://www.thinkwiki.org/wiki/BIOS_update_without_optical_disk Patch is posted to linux-acpi, but I don't seem to have sufficiently privileges to alter the bug state. okay, patch available at https://patchwork.kernel.org/patch/94711/ @Jerone Young. Was there any success contacting acer folks? This problem affects many peoples, and it very frustrating. (In fact I recently switched to nouveau drives, which work almost perfectly here, and with patch to skip VT switch on suspend/resume, my system resumes from ram and ready to work instantly, seems even faster that in windows). But it happens only once.... :-( Sorry for typos above (need some sleep), but I want to add that if I had a email address of team that writes the BIOS, I could team up with other users that suffer from same problem, and send together mail to that team. This issue is resolved both in the latest mainstream kernels and in the latest Lenovo BIOS for the affected machines. Either fix will work - the newer kernel duplicates the behaviour of that "Other" OS when required, while the Lenovo BIOS now follows the ACPI specifications properly. @Bryn Hugens, I have unrelated problem. My system is Acer Aspire 5720, and it is reported at https://bugzilla.kernel.org/show_bug.cgi?id=13931. I asked Jerone Young to see if he could contact acer on that problem. Sorry for noise. The init_set_sci_en_on_resume DMI patches mentioned in comment #4 (and others) shipped in 2.6.34. this bug report is closed. Note that we will try to clean up the workaround in 2.6.35 by doing away with the DMI lists. (In reply to Alex Chiang from comment #25) > Bryn, Stefan, others who can reproduce this bug: > > Please try booting with this kernel command line option and see if it helps: > > acpi_sleep=sci_force_enable > > Please report back your results. Thanks. Hi, acpi_sleep=sci_force_enable fix pm-suspend on my laptop , beautiful ! I was searching on prevent suspend on close lid, that gives me corrupt access to disk, after wakeup . someone point to this bug , I saw some usb error on wakeup that match on description , so I test it and it works ! Created attachment 170841 [details]
dmidecode of laptop that needs acpi_sleep=sci_force_enable
many thanks !
Sergio, The sci_force_enable option was removed from the kernel in 2010, so that's unlikely to be what's helping you. :/ , on the second attempt fails again , BTW this parameter still in documentation : https://git.kernel.org/cgit/linux/kernel/git/stable/linux-stable.git/tree/Documentation/kernel-parameters.txt?id=refs/tags/v3.18.9#n359 sci_force_enable causes the kernel to set SCI_EN directly on resume from S1/S3 (which is against the ACPI spec, but some broken systems don't work without it). We need revert document part of this patch : https://lists.ubuntu.com/archives/kernel-team/2010-April/010243.html |