Bug 10444 - Early EC init causes boot failures on multiple brands and models
Summary: Early EC init causes boot failures on multiple brands and models
Status: REJECTED INSUFFICIENT_DATA
Alias: None
Product: ACPI
Classification: Unclassified
Component: Config-Other (show other bugs)
Hardware: All Linux
: P1 high
Assignee: ykzhao
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2008-04-11 21:32 UTC by TJ
Modified: 2008-06-02 01:34 UTC (History)
1 user (show)

See Also:
Kernel Version: 2.6.25-rc8
Subsystem:
Regression: Yes
Bisected commit-id:


Attachments
VGN-FE41Z R0200J3 (136.34 KB, text/plain)
2008-04-16 08:59 UTC, TJ
Details

Description TJ 2008-04-11 21:32:49 UTC
Latest working kernel version: 2.6.22
Earliest failing kernel version: 2.6.24
Distribution: kernel.org, Ubuntu
Hardware Environment: Various laptops
Problem Description:

At boot-time early EC init can cause failures in various ways if the EC _REG() method makes Notify() calls since the Notify() targets aren't yet in the namespace. 

E.g.

ACPI: EC: acpi_ec_wait timeout, status=32, expect_event=1
ACPI: EC: read timeout, command=128

and

[ 18.127035] ACPI: EC: Look up EC in DSDT
[ 18.136591] ACPI: Interpreter enabled
[ 18.136594] ACPI: (supports S0 S3 S4 S5)
[ 18.136604] ACPI: Using IOAPIC for interrupt routing
[ 18.136844] ACPI: EC: non-query interrupt received, switching to interrupt mode
[ 18.634304] ACPI: EC: acpi_ec_wait timeout, status = 0, expect_event = 1
[ 18.634362] ACPI: EC: read timeout, command = 128
[ 18.634415] ACPI Exception (evregion-0420): AE_TIME, Returned by Handler for [EmbeddedControl] [20070126]
[ 18.634419] ACPI Exception (dswexec-0462): AE_TIME, While resolving operands for [OpcodeName unavailable] [20070126]
[ 18.634423] ACPI Error (psparse-0537): Method parse/execution failed [\_SB_.PCI0.LPCB.EC__._REG] (Node f7c4bd20), AE_TIME
[ 19.142018] ACPI: EC: missing confirmations, switch off interrupt mode.
[ 19.154286] ACPI: EC: GPE = 0x17, I/O: command/status = 0x66, data = 0x62
[ 19.154288] ACPI: EC: driver started in poll mode

Typically the EC _REG() method will contain something like:

  If (LEqual (Arg0, 0x03))
  {
   Store (Arg1, ECON)
   Store (BATP, BNUM)
   Store (RSCL, B0SC)
   Store (RPWR, PWRS)
   Notify (BAT0, 0x81)
   PNOT ()

Removing the Notify() and booting the system with the modified DSDT will succeed.

What makes this unusual is it seems to be *VERY* time-sensitive. With Ubuntu Hardy we've found that simply removing the "quiet" boot option and thus generating more kernel messages (and slightly slowing the boot process) is sufficient to work around this.

The time-sensitive nature also means that adding any kind of debugging messages to track it will also cure the issue!

Although I think I understand the reason for the commit that causes this (c04209a7948b95e8c52084e8595e74e9428653d3) the result is that a vast number of systems that currently have no issues will start experiencing the side-effects quite dramatically as they upgrade to newer kernels.

I have a large collection of Sony Vaio DSDTs for example, and I analysed them to see how many have a Notify() call in the EC _REG() method:

VGN-AR31S-R0200J6, VGN-AR370E-R0200J6, VGN-C140G-R0030J4, VGN-C1S, VGN-C1ZB-R0034J4, VGN-C22GH-R0080J4, VGN-C240E-R0080J4, VGN-C2S-R0080J4, VGN-C2Z-R0080J4, VGN-FE11H-R0072J3, VGN-FE11H-R0074J3, VGN-FE11M-R0172J3, VGN-FE11M-R0174J3, VGN-FE21H-R0100J3, VGN-FE21M-R0130J3, VGN-FE31M, VGN-FE31M-R0170J3, VGN-FE41E-R0190J3, VGN-FE41M-R0190J3, VGN-FE41Z-R0200J3, VGN-FE45G-R0190J3, VGN-FE550G-R0074J3, VGN-FE590P-R0072J3, VGN-FE660G-R0133J3, VGN-FE670G-R0130J3, VGN-FE690-R0172J3, VGN-FE770G-R0173J3, VGN-FE830, VGN-FE870E-R0190J3, VGN-FE880EH-R0200J3, VGN-FS115M-R0104J0, VGN-FS215B-R0040J1, VGN-FS215E-R0040J1, VGN-FS285H-R0040J1, VGN-FS315E-R0084J1, VGN-FS315H-R0080J1, VGN-FS315S-R0084J1, VGN-FS660W-R0044J1, VGN-FS730W-R0080J1, VGN-FS740W-R0080J1, VGN-FS760W-R0080J1, VGN-FS965F-R0044J2, VGN-FS980-R0044J2, VGN-FZ11M-R0050J7, VGN-N130G-R0020J4, VGN-N230E-R0070J4, VGN-N31Z-R0100J4

We're tracking this in Ubuntu bug #191137

https://bugs.launchpad.net/ubuntu/+source/linux/+bug/191137
Comment 1 TJ 2008-04-12 04:14:39 UTC
I neglected to mention that in many of the cases being reported to Ubuntu, the system only fails to boot due to this if it is on AC power. If it is allowed to boot past the ACPI initialisation phase AC power can then be resumed without an issue.

More research into this needs to be done on a wider range of model/DSDT combinations with and without power.

My tentative theory is that when on AC power at boot a Notify(BAT0, 0x81) event is triggered (Battery status change) to indicate charging whereas booting with no AC power connected this event presumably won't occur.
Comment 2 ykzhao 2008-04-13 18:49:55 UTC
Hi, TJ
    Will you please attach the output of acpidump?
    Thanks.
Comment 3 ykzhao 2008-04-13 19:10:27 UTC
Will you please try the latest kernel (2.6.25-rc9) and see whether the problem still exists?
Please attach the output of dmesg.
Thanks.
Comment 4 TJ 2008-04-16 08:59:12 UTC
Created attachment 15776 [details]
VGN-FE41Z R0200J3

This is from my PC where I was able to reproduce the issue. Remember that we're getting reports covering a several different models at Ubuntu. I'm guessing we can expect more once 8.04 Hardy is released from beta at the end of this month.
Comment 5 TJ 2008-04-16 10:06:49 UTC
In the related Ubuntu bug report I suggested forcing the EC into interrupt mode with "ec_intr=1"

https://bugs.edge.launchpad.net/linux/+bug/191137/comments/40

One user has reported back with success with "quiet ec_intr=1" with Hardy 2.6.24. I've just built 2.6.rc9 and will be testing it shortly.
Comment 6 TJ 2008-04-16 12:34:13 UTC
2.6.25-rc9 *with CONFIG_DEBUG=y* boots successfully. I now have to try another build without the DEBUG statements since they appear to be at the root of this issue.

Also, I mentioned recommending the "ec_intr=1" parameter but on searching the kernel source (2.6.24 - 2.6.25-rc9) can't find mention of it despite it being documented in Greg Koah Hartman's 'Linux Kernel in a Nutshell', chapter 9, 'Kernel Boot Command-Line Parameter Reference'.

If it's been removed the experience reported by the user must have been a fluke.
Comment 7 TJ 2008-04-16 23:10:05 UTC
2.6.25-rc9 boots successfully with no CONFIG_DEBUG_KERNEL set.

Any idea where to look in the source for commits that would cause this in 2.6.24? Ideally I'd like to cherry-pick the commit(s) that will prevent this error.
Comment 8 ykzhao 2008-04-18 01:21:48 UTC
Will you please try the git-bisect to identify which commit causes the problem in 2.6.24?
Thanks.
Comment 9 ykzhao 2008-05-06 00:03:27 UTC
Will you please try the latest kernel and see whether the problem still exists?
Thanks.
Comment 10 ykzhao 2008-05-29 00:24:28 UTC
Since there is no response for more than one month, it will be rejected.
If the problem still exists, please reopen it.
Thanks.
Comment 11 bekir serifoglu 2008-06-02 01:34:29 UTC
I had the problem with the kernel 2.6.24. Then I compiled the kernel 2.6.25.4 myself.and the problem seems to be gone.

Note You need to log in before you can comment on or make changes to this bug.