Bug 12873
Description
Kenney Phillis Jr.
2009-03-14 12:48:16 UTC
please try this test in the 2.6.29-rc7 kernel, 1.set CONFIG_ACPI_DEBUG, 2.boot with "acpi_debug.layer=0x44" and "acpi_debug.level=0x08000004". 3.attach the dmesg output after boot. Created attachment 20554 [details]
dmesg log for 2.6.29rc8
Here's the latest kernel log, and it's compiled with CONFIG_ACPI_DEBUG=Y and the two boot parameters you requested i boot with.
Created attachment 20555 [details]
/var/log/kern.log
I couldn't get the dmesg log, because it was too long... however here's one of 3 files which are just as useful.
Created attachment 20556 [details]
/var/log/messages
Created attachment 20557 [details]
/var/log/syslog
(In reply to comment #1) > please try this test in the 2.6.29-rc7 kernel, > 1.set CONFIG_ACPI_DEBUG, > 2.boot with "acpi_debug.layer=0x44" and "acpi_debug.level=0x08000004". Sorry, they should be "acpi.debug_layer=0x44" and "acpi.debug_level=0x08000004" will you please try again. thanks. I already figured that, hence why the attachments in comment 3, 4 and 5 are all based on those boot options. please attach the content of /proc/interrupts Hi, Kenney Will you please attach the output of "grep . /sys/firmware/acpi/interrupts/*"? From the dmesg log in comment #2 it seems that the following warning message is complained and then the ACPI irq 9 is disabled. >irq 9: nobody cared (try booting with the "irqpoll" option) Will you please try the following boot option? a. acpi_sci=high b. acpi_sci=edge c. acpi_sci=edge acpi_sci=high Will you please double check whether the 2.6.25 kernel can work well? If it can work well, please attach the output of dmesg and "grep . /sys/firmware/acpi/interrupts/*". Please also attach the output of acpidump. Thanks. Created attachment 20575 [details]
interrupts of 2.6.24-23-generic
Hi, Kenney Please also attach the output of /proc/interrupts on the 2.6.29-rc8 failing kernel. Thanks. Created attachment 20577 [details]
interrupts of 2.6.29rc8
ok, here's the interrupts for 2.6.29 failing.
Created attachment 20579 [details]
Contents of grep . /sys/firmware/acpi/interupts/* on 2.6.29rc7
I couldn't fully boot in to 2.6.25, however it was not a kernel issue, it stalled out when the boot process stated:
Begin: Running /scripts/local-premount ...
done.
as for the test set for 2.6.24, i'll provide that instead, since that does fully boot.
Created attachment 20580 [details]
Test Results for alll 3 acpi_sci test paramaters.
Interestingly enough, the 3 acpi_sci tests all solved the irq bug, however, it still sluggishly handles the statuses when i pull and insert the power plug... also, i added the dmesg logs for what happened when i logged in and then went right to suspend less than 20 seconds later. The result was a crash message in the dmesg, and the system was kicked out of suspend almost immediately.
Hi, Kenney As it can't be booted normally when using the 2.6.25 kernel, we can't know whether the following message is also complained: >irq 9: nobody cared (try booting with the "irqpoll" option) On the latest kernel there is no such complain if the boot option of "acpi_sci=high" is added. As there is no ACPI sci interrupt override in APIC table, the default configuration will be used for ACPI SCI interrupt.(Interrupt Pin: 9; Mode: low, level). But from the test it seems that the correct mode for ACPI SCI should be high/level. So in such case the following will be complained and the ACPI sci irq will be disabled. >irq 9: nobody cared (try booting with the "irqpoll" option) At the same time from the dmesg we know that there exists the following message: >ACPI: EC: driver started in poll mode In such case the EC will work in polling mode while doing EC transaction. If there is no EC GPE interrupt, maybe the hotkey notification event can't be triggered. Will you please attach the ouput of acpidump? Thanks. Created attachment 20631 [details]
acpidump output on kernel release 2.6.29rc7
please attach the dmesg output of the 2.6.24 kernel. (In reply to comment #0) > Latest working kernel version: 2.6.25 It's weird that 2.6.25 worked well without any boot parameters. (In reply to comment #13) > I couldn't fully boot in to 2.6.25, however it was not a kernel issue, it > stalled out when the boot process stated: > could you please fix this first, and see run git-bisect to see which commit introduces this regression? Created attachment 20640 [details]
Dmesg from 2.6.26 kernel
I believe the regression is all related to the start_secondary function which was integration in to the main kernel in 2.6.26. I'm also attaching the relevant log.
Hi, Kenney From the log in comment #18 it seems that the issue also exists on the 2.6.26 kernel. Can the hotkey work after the ACPI IRQ 9 is disabled? Will you please also attach the output of dmesg on the working kernel? For example: 2.6.24/2.6.25. Thanks. Will you please try the following boot option? a. acpi_sci=high b. acpi_sci=edge c. acpi_sci=edge acpi_sci=high can you try the boot option as suggested in comment #9? Created attachment 20673 [details]
logs from 2.6.26 and 2.6.24
Here is the dmesg log from 2.6.26 with the acpi_sci parameter set, and the working 2.6.24 kernel dmesg log.
oh, i forgot to mention that after irq #9 was disabled the keyboard hotkeys did work on all versions. Hi, Kenney Do you mean that hotkeys still can work well after ACPI IRQ is disabled? Is the brightness increased/decreased by hotkey? If so, it seems that the hotkey doesn't use the ACPI mechanism. Another issue I cared is that there is no ACPI IRQ storm on 2.6.24 kernel. In fact there is no change about ACPI IRQ9 configuration between 2.6.24 and latest kernel. It is werid. thanks. Yes, most of the special hotkeys work... only 1 or 2 didn't, but those where not working to begin with, and not really required. as for the hotkeys, they stopped working after i changed the acpi_sci settings. As the ACPI SCI worked properly in 2.6.24 in its default mode (level, low), and the latest kernel uses the same mode, I don't think that any combination of "acpi_sci=" is going to fix the issue with the latest kernel. What seems to have changed is that there are now a rash of ACPI SCI interrupts provoked, acpi_irq() does not claim them and so the kernel shuts down IRQ9 as a screaming interrupt. ("irqpoll" will work around this symptom when IRQ 9 gets shut off. However, it will work only to the extent that there are other interrupts going on in the system to kick off the polling) comment #13 shows that there were 999 invocations of the acpi_sci and that all 999 were GPE's, and all 999 were GPE 03: gpe03: 999 enabled ... gpe_all: 999 sci: 999 Was this grep taken after "irq 9: nobody cared"? What did "grep acpi /proc/interrupts" show? Unfortunately, /sys/firmware/acpi/interrupts is showing handled interrupts only, and not counting all calls to acpi_irq -- I'll send a debug patch to add that shortly... In the mean time, it may be useful to try to isolate this issue in two ways. 1. disable all features possible and see if your interrupt still works. eg. with CONFIG_ACPI=y, disable all the optional acpi drivers (eg. battery, ac etc.) and see if irq9 still gets disabled. If no, add them back, say, starting with "button" until you see which driver provokes the breakage. 2. git bisect drivers/acpi/ec.c between the working and failing kernels to see if it was an EC specific change that provoked the issue. Created attachment 21064 [details]
debug patch vs 2.6.30-rc2
Please apply this patch and show the the output from
grep . /sys/firmware/acpi/interrupts/*
and
grep acpi /proc/interrutps
after the failure.
ping Kenney I just tested the debug patch against kernel 2.6.30-rc2, and here's the results. Created attachment 21147 [details]
Interupts results with debug patch against 2.6.30-rc2
Created attachment 21148 [details]
contents of /sys/firmware/acpi/interupts in 2.6.30-rc2 with patch
whelp, the test patch confirmed that indeed, we have a screaming interrupt that is getting vectored through the acpi_irq(). I guess we could have believed /proc/interrupts on that one... Can you make the interrupt stop by any of the means in suggestions #1 and #2 in comment #25? (In reply to comment #31) > whelp, the test patch confirmed that indeed, we have a screaming interrupt > that is getting vectored through the acpi_irq(). I guess we could have > believed /proc/interrupts on that one... > > Can you make the interrupt stop by any of the means in suggestions > #1 and #2 in comment #25? I'll start with Suggestion #1, I don't have much experience with the changes to the build configuration, and as thus usually ask for help on this. ( this last build i completely forgot to build all the modules and as thus it didn't even include the sound and video drivers. ) Suggestion #2, I still have not fixed the boot issues surrounding the 2.6.25 kernel Before i can even think about running a git bisect, and i have no clue how to run the git bisect on this. All i know is that the break in acpi occured sometime between 2.6.24 and 2.6.26, but 2.6.25 didn't display the error message about acpi_irq up to the point where it tried to mount the file system. Created attachment 21159 [details]
Dmesg from Kernel version 2.6.25.20.
I just got version 2.6.25.20 to boot, and it didn't generate a error about acpi_irq.
Created attachment 21160 [details]
/sys/firmware/acpi/interupts/* from 2.6.25.20.
Created attachment 21161 [details]
/proc/interrupts from 2.6.25.20.
as this is a 2.6.26 regression, could you please use git-bisect to find out which patch introduced the bug? BTW, agree with suggestion #2 in comment #25, ec is a suspect here, and git-bisect drivers/acpi/ec.c would be a better chioce. I ran the Bisection, which produced this as a result: git bisect start 'drivers/acpi/ec.c' # bad: [bce7f793daec3e65ec5c5705d2457b81fe7b5725] Linux 2.6.26 git bisect bad bce7f793daec3e65ec5c5705d2457b81fe7b5725 # good: [4b119e21d0c66c22e8ca03df05d9de623d0eb50f] Linux 2.6.25 git bisect good 4b119e21d0c66c22e8ca03df05d9de623d0eb50f # good: [223883b7aafa02410ed2e571d6032c876d0b23b8] ACPI: EC: Switch off GPE mode during suspend/resume git bisect good 223883b7aafa02410ed2e571d6032c876d0b23b8 # good: [ce52ddf58cbc2c40f5f08d37d2217945e4d5adf3] ACPI: EC: Don't delete boot EC git bisect good ce52ddf58cbc2c40f5f08d37d2217945e4d5adf3 # bad: [08acd4f8af42affd8cbed81cc1b69fa12ddb213f] Merge branch 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux-acpi-2.6 git bisect bad 08acd4f8af42affd8cbed81cc1b69fa12ddb213f I did a few bisects more bisects in random to narrow down the time frame to a couple of hours now, the break occurs sometime between these 2 commits. # good: [c99fcf28b87d8cab592db7571e3164f5cb54c5b3] signals: send_group_sigqueue: don't take tasklist_lock git bisect good c99fcf28b87d8cab592db7571e3164f5cb54c5b3 # bad: [08acd4f8af42affd8cbed81cc1b69fa12ddb213f] Merge branch 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux-acpi-2.6 git bisect bad 08acd4f8af42affd8cbed81cc1b69fa12ddb213f I think I found the source of my regression, and here's the bisect for it. git bisect start 'drivers/acpi/scan.c' # bad: [08acd4f8af42affd8cbed81cc1b69fa12ddb213f] Merge branch 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux-acpi-2.6 git bisect bad 08acd4f8af42affd8cbed81cc1b69fa12ddb213f # good: [ce52ddf58cbc2c40f5f08d37d2217945e4d5adf3] ACPI: EC: Don't delete boot EC git bisect good ce52ddf58cbc2c40f5f08d37d2217945e4d5adf3 # bad: [729b2bdbfa19dd9be98dbd49caf2773b3271cc24] ACPI : Disable the device's ability to wake the sleeping system in the boot phase git bisect bad 729b2bdbfa19dd9be98dbd49caf2773b3271cc24 # good: [5c9fcb5deef4d3a49798d76c48b726d2e3c7df72] ACPI: fix a regression of ACPI device driver autoloading git bisect good 5c9fcb5deef4d3a49798d76c48b726d2e3c7df72 Created attachment 21185 [details]
Results from 2.6.30-rc4 with comparison of results
My last assumption proved true, the acpi issue of it getting ignored held true, the file includes the patch i did to fix my system, but it involed pulling out the other one. So, I'll rework my patch a little bit to help fix this problem, while not voiding out the effects of the previous patch.
Created attachment 21189 [details]
2.6.30-rc4 patched Debug Log of ACPI
I looked at my patch, and noticed that it might not be helpful, so i ran the debug statements given earlier, it's interestingly enough a few of the GPE's which should be in their own are getting routed through GPE03.
acpi.debug_layer=0x44 acpi.debug_level=0x08000004
I'll look at the debug options for ACPI to see if anything else might be useful
(In reply to comment #41) > > I looked at my patch, and noticed that it might not be helpful, I'm confused. You said that the issue goes away if you apply the patch in comment #40, right? what do you mean by saying "it might not be helpful"? > so i ran the > debug statements given earlier, it's interestingly enough a few of the GPE's > which should be in their own are getting routed through GPE03. > > acpi.debug_layer=0x44 acpi.debug_level=0x08000004 > > I'll look at the debug options for ACPI to see if anything else might be > useful please attach the acpidump output. so that we can see what GPE03 is used for. Yes, The acpidump output most likely hasn't changed, but i do know that it is possible, since this is a laptop that the device which is at fault is PNP0C09, which is the smart controller for my battery, because the bulk of the interrupt calls on that occur when i unplug and plug the power cord in. Created attachment 21234 [details]
ACPI Dump from patched kernel.
Here's the requested acpidump, when the irq is not disabled. However, The results do not change between the this one and the last one.
Created attachment 21287 [details]
2.6.30-rc5 dmesg log
Another dmesg output, however, this time i booted with these flags:
acpi.debug_layer=0x000000f acpi.debug_level=0x000000ff
At around 300 seconds, i removed the ac adaptor for a little bit, and found a neat set of messages. It appears that "ACAD", "BAT1" and "EHC0" all send events on the GP03 (Although, this also on my patched version which reverts the code in the function, acpi_bus_get_wakeup_device_flags back to what is seen in kernel revisions 2.6.25 and 2.6.24
Created attachment 21300 [details]
try the debug patch, in which the _PSW object will be skipped in course of scanning device
Hi, Kenney From the git-bisect log in comment #39 it seems that the issue is related with the following commit: >ACPI : Disable the device's ability to wake the sleeping system in the boot phase Will you please try the debug patch in comment #46 on the latest kernel(2.6.30-rc5) and see whether the issue still exists? In the debug patch the commit 729b2bdb is reverted. Will you please also attach the output of dmidecode? Thanks. Created attachment 21305 [details] DMI Decode results. I applied the modifications in the patch attached to comment #46 and the issue did not persist. Also, here's the dmidecode with the only 2 things edited out is serial and uuid (Considering those most likely don't matter) Please try booting with the following parameters together acpi_osi="!Windows 2006" acpi_osi="!Windows 2006 SP1" and also just for grins, you might also try simply acpi_osi=Linux I tested it on the official ubuntu 9.04 kernel which is 2.6.28. Default settings for acpi_osi: Irq is ignored. Sleep Special function (FN + F3) tries to suspend machine. Suspend fails on first try every time and complains about corrupt memory. With acpi_osi set to either of the suggestions you made: Irq is not disabled Suspend Works Sleep Special function (FN + F3) does not try to put machine to sleep, instead generates a keyboard error. Created attachment 21431 [details]
dmesg results from suspend
I figure i need to report, that the system failed to resume on this, but enters suspend properly. This log has the dmesg with boot params of...
acpi_osi="!Windows 2006" acpi_sci="!Windows 2006 SP1"
major failure is because of corrupted lower memory on resume.
Created attachment 21432 [details]
dmesg results from suspend ( acpi_osi=linux)
another log, this time with the acpi_osi=Linux (suspend fails due to automatically waking up because of the ignored irq)
also, the error does not change from the other one (even though i added fglrx and madwifi to the mix ), so it's known, normally when fglrx is used the graphics driver does not hang.
> With acpi_osi set to either of the suggestions you made:
> Irq is not disabled
This suggest that of the 7 instances of _PSW in the DSDT,
the one that is causing the failure is the one that
is checking the type of the OS (TPOS) for vista (0x40)
compatibility:
Device (PB6)
{
Name (_ADR, 0x00060000)
Name (MPRW, Package (0x02)
{
0x18,
0x05
})
Method (_PRW, 0, NotSerialized)
{
\_SB.QWMI.PHSR (0x11, 0x02)
Store (\_SB.PCI0.LPC0.OWNS, \_SB.QWMI.Q512)
If (LEqual (\_SB.PCI0.LPC0.WOLI, 0x00))
{
Store (0x00, Index (MPRW, 0x01))
}
Else
{
Store (0x05, Index (MPRW, 0x01))
}
Return (MPRW)
}
Method (_PSW, 1, NotSerialized)
{
Store (Arg0, \_SB.PCI0.SMB.WOLE)
If (LEqual (TPOS, 0x40))
{
Store (Arg0, \_SB.PCI0.SMB.WOLF)
}
}
The _PRW for this device says that he should be coming in
on GPE 0x18. Though in the working 2.6.25 kernel,
there are no interrupts recording on gpe-18, and in
the 2.6.30 this GPE is marked as disabled.
Just for grins...
The failing kernel has HPET support, the working kernel does not.
Please verify that you still see the failure with "hpet=disable"
Created attachment 21534 [details]
kernel log with hpet=disabled and acpi_osi=Linux flags
alright, i gave the two tests with the hpet.
First test: hpet=disabled boot param without acpi_osi modificiations.
result: disabled irq.
second test: hpet=disabled and acpi_osi modifications.
result: irq is not disabled, but the resume hangs. (also, here's the kernel log)
Created attachment 21589 [details]
suspend results - debug testing.
I did a few more tests with the /sys/power options and found out where the 2.6.28 cuts out and added a few notes. I will provide test results against the latest 2.6.30 release candidate. (2.6.30-rc8 shows a regression when compared to 2.6.28 when dealing with the apic seeing as how it is generating an apic error)
ping kenney... Created attachment 22015 [details]
acpi tests on 2.6.30
I didn't forget, but the bug results where the same, although i did notice that with the 2 configuration options for acpi_osi had a single response that was interesting. on 2.6.30 the system failed to have the monitor reawaken upon resume during most of the tests, but i was able to restart the system though. Also, to get the system to go in to suspend state, i had to activate a mouse movement with the touchpad when using acpi_osi.
Hello, All, I have the same Toshiba model and I have installed Ubuntu Linux on this machine recently. Naturally I have found this bug :) I have not done anything yet with 2.6.30 kernel but still I have some information that you might find interesting. While I'm not using strict technical terms I hope that will help: 1. On fresh installation (kernel 2.6.28-11) I can control power management without problems. Suspend is not working (but I think I have succeeded with fglrx driver once). irq 9 problem still exists. 2. Ubuntu automatic updates system updates kernel to version 2.6.28-13 and here begins power management's problems. I can't control On Battery Power actions (in Ubuntu's Power Management). acpi_osi=Linux (or Windows) as proposed above solves this problem. Hibernation works without problems and I'm using it now but I prefer Suspend which is not working. With acpi_osi 3. Actually I'm not sure if "suspend" is not working. I don't see anything but I can reset computer but clicking magic sequence alt+prtsc RSEINUB. That might be related to X drivers. I'm using radeonhd. radeon and fglrx works for me as well but with fglrx GPU fan runs as hell - I can hear high-pitched sound that is really annoying and I'm afraid that I can burn my GPU (the last thing I want to do). Since I'm developer I'm ready to play with git, linux kernel and etc. (I have never done that before but I believe that shouldn't be too hard with some instructions and help). I don't have a lof of time but I'm ready to help with this problem. Item 2. I have started writing sentence and not finished it: With acpi_osi=Linux irq 9 message disappears. hpet=disabled seems to make no difference. Tested both with 2.6.28-11 and -13. Yet another interesting thing: I have logitech wireless mouse and when it is connected then computers resumes from suspended state immediately after suspend. That happens with -11, but not with -13. As well after resume Ctrl+Alt+Delete restarts computer successfully in -11 kernel. Dalius, I found a interesting side note... Install the latest closed source ati driver, and the system will actually suspend right, but this is currently unable to be tested against 2.6.30 (not updated enough on the kernel module yet ) Hi, Kenney Thanks for the test. From the info in comment #52 it seems that irq is not disabled in the boot phase with the boot option of "acpi_osi=linux" or acpi_osi="!Windows 2006" acpi_osi="!Windows 2006 SP1". The second issue is the low memory corrupt after suspend/resume. This should be related with BIOS bug. Of course I can add the box to the DMI quirk table about low memory corruption. After adding the acpi_osi boot option, the following in the _PSW object won't be executed: >If (LEqual (TPOS, 0x40)) { Store (Arg0, \_SB.PCI0.SMB.WOLF) } In such case OS won't complain that the irq 9 is disabled. And from the AML code we know that the WOLF is accessed by using the I/O port behinds the LPC(SMB) bridge. Maybe the WOLF will change the interrupt polarity of ACPI interrupt 9. Before the commit 729b2bd is shipped, it won't call the _PSW object unless when it enters the suspend/resume. In such case the ACPI interrupt mode is low/level. But after the commit is shipped, OS will call the _PSW object for PB6 device. In such case the ACPI interrupt mode should be level/high. And from the test it seems that this issue can be workaround by adding the boot option of "acpi_osi=". So I will add this box to the dmi quirk table to enable linux osi. Thanks. Created attachment 22437 [details]
add the quirk for Toshiba P305D to avoid the low memory corruption
Created attachment 22438 [details]
add the quirk for Toshiba P305D to enable Linxu OSI
Hi, Kenney Will you please try the debug patches on the latest kernel and see whether it can work well? Thanks. close this bug as there is no reponse for more than a month. please reopen it if the problem still exists in the latest upstream kernel. Created attachment 23050 [details]
2.6.31-rc9 Test with default parameters.
Just got a few more tests on 2.6.31-rc9.
Default Option:
Suspend fails to even operate, due to irq 9 not being enabled, and with the radeon driver enabled the system fails to get video back.
Not windows 2006:
Suspend works flawlessly, however the led blinks a few times when trying to resume and video does not return.
Created attachment 23051 [details]
2.6.31-rc9 Test with acpi_osi set to not use windows 2006.
Created attachment 23053 [details]
2.6.31-rc9 Test after applying linux quirk patch.
I applied the Linux quirk patch, and the suspend still fails to reactivate the internal LCD upon resume. The only clue I have is that it is because the radeon driver fails to handle resume on the graphics chip I use properly. The chip to be precise is the Radeon HD3100 (RS780MC).
Created attachment 23060 [details]
Dmesg on 2.6.31 latest git (with working resume)
I managed to figure out why i could not get the video back upon resume, it's due to the video drivers.
the results where after the command:
s2ram -f --vbe_post
as for other notes, this is the resume testing, and with this, i get no memory corruption, however the system still locks up shortly afterwards. (Fixed by using fglrx drivers )
Created attachment 23103 [details]
grep of /sys/firmware/acpi/interrupts on 2.6.31 after linux quirk patch is applied.
I forgot that to add a little bit more debug, here is the acpi interrupts after the linux quirk patch is applied on 2.6.31.
Created attachment 23698 [details]
Dmesg 2.6.32-rc6 (No Patch)
I just tested against the latest upstream kernel, 2.6.32-rc6 at commit 7c9abfb884b8737f0afdc8a88bcea77526f0da87 and here is the dmesg log.
I should make note that after i booted with acpi_osi=Linux the irq is no longer ignored... This is also the case if i apply the patch which forces the linux entry in the dsdt.
As for the suspend and resume test results. They are failing due to radeon kms driver. This fault will most likely be fixed in a later revision of the driver, when the RS780 support is fully handled.
on a side note, a lot of the old kernel logs, because this is the latest kernel revision that is available.
*** This bug has been marked as a duplicate of bug 14736 *** |