Subject : Oopses and ACPI problems (Linus 2.6.29-rc4) Submitter : Darren Salt <linux@youmustbejoking.demon.co.uk> Date : 2009-02-09 18:26 References : http://marc.info/?l=linux-kernel&m=123420431709877&w=4 Notify-Also : Matthew Garrett <mjg59@srcf.ucam.org> Notify-Also : Corentin Chary <corentin.chary@gmail.com> Notify-Also : "yakui_zhao" <yakui.zhao@intel.com> This entry is being used for tracking a regression from 2.6.28. Please don't close it until the problem is fixed in the mainline.
http://marc.info/?l=linux-kernel&m=123427814117781&w=2 – patch fixing the ACPI hotkey reporting (mentioning here in case it gets lost)
Patch : http://marc.info/?l=linux-kernel&m=123427814117781&w=2
http://marc.info/?l=linux-acpi&m=123427780517184&w=4 > 5740294: eeepc-laptop: Implement rfkill hotplugging in eeepc-laptop Reverting this fixes the rfkill oops; things work correctly again. so there are two regressions and one bug in all, right? 1. hotkey reporting. fixed by the patch in comment #1 2. kernel oops, should be fixed by comment 7695fb04aca62e2d8a7ca6ede50f6211e1d71e53 3. long time ( 26s! ) during eeepc-laptop init. I have a eeepc901 and I'll try to reproduce this problem with the latest upstream kernel.
(ref. comment no. 3) (and if bugzilla doesn't turn that into a link, it's buggy :-) ) 1. Yes. 2. That was a different problem. The oopses in *this* report were caused by interaction between pciehp and eeepc-laptop due to pciehp_force=1; setting that to 0 (or removing pciehp) fixes things. (I don't have the ACPI hotplug driver built, but from what I've read I don't think that that would make any difference anyway.) 3. Yes, still seeing that in -rc5. Details at http://marc.info/?l=linux-kernel&m=123429556717946&w=2; BIOS 1808.
Yes, I can reproduce this problem. this is a regression, as invoking INIT method causes 15+ seconds in 2.6.29, which used to be 1s.
Created attachment 20324 [details] dmesg-loading-eeepc-laptop-in-2.6.29-rc5 dmesg shows that invoking INIT costs about 17s, while most of the time (14s +) is spent in reading HSTS (smbus host status register, io address 0x400) register. By reading the AML code, it seems that HSTS doesn't return the right state in time and the AML code keeps reading it until timeout...
it would be great if someone can provide the dmesg while loading eeepc-laptop driver in an earlier kernel (INIT takes 1s). please do remember to set CONFIG_ACPI_DEBUG, run "echo 0x80 > /sys/module/acpi/parameters/debug_layer" and "echo 0x04 > /sys/module/acpi/parameters/debug_level" before loading eeepc-laptop
Created attachment 20349 [details] Boot log for 2.6.28.4 with ACPI debugging info Since eeepc-laptop is built in, I appended "acpi.debug_layer=128 acpi.debug_level=4 printk.time=1" to the kernel command line.
yes, there are a lot of differences in the dmesg output. Darren, please do the test again, with "acpi.debug_layer=0x90 acpi.debug_level=0x404 printk.time=1". let's see what the different code paths are in the two kernels.
Created attachment 20359 [details] Boot log for 2.6.28.4 with many reams of ACPI debugging ingo A little light reading for somebody...
well, this seems to be another EC problem. eeepc-laptop needs to evaluate INIT control method during inialization, which will invoke ECXW Method (ECXW, 2, Serialized) { If (ECAV ()) { If (LNot (Acquire (MUEC, 0xFFFF))) { IBFX () Store (Arg0, EC66) IBFX () Store (Arg1, EC62) IBFX () Release (MUEC) } } } And for this piece of code, IBFX () Store (Arg0, EC66) IBFX () Store (Arg1, EC62) IBFX () dmesg in 2.6.28.4 shows the correct log [ 3.894262] exregion-0289 [15] ex_system_io_space_han: System-IO (width 8) R/W 0 Address=0000000000000066 ------ECIE is cleared (in IBFX) [ 3.907389] exregion-0289 [16] ex_system_io_space_han: System-IO (width 8) R/W 1 Address=0000000000000066 ------Store (Arg0, EC66) [ 3.920458] exregion-0289 [15] ex_system_io_space_han: System-IO (width 8) R/W 0 Address=0000000000000066 ------ECIE is cleared (in IBFX) [ 3.933580] exregion-0289 [16] ex_system_io_space_han: System-IO (width 8) R/W 1 Address=0000000000000062 ------Store (Arg1, EC62) [ 3.946648] exregion-0289 [15] ex_system_io_space_han: System-IO (width 8) R/W 0 Address=0000000000000066 ------ECIE is cleared (in IBFX) But it's not in 2.6.29-rc5 [ 2481.733642] exregion-0290 [00] ex_system_io_space_han: System-IO (width 8) R/W 0 Address=0000000000000066 ------ECIE is cleared (in IBFX) [ 2481.734109] exregion-0290 [00] ex_system_io_space_han: System-IO (width 8) R/W 1 Address=0000000000000066 ------Store (Arg0, EC66) [ 2481.734490] exregion-0290 [00] ex_system_io_space_han: System-IO (width 8) R/W 0 Address=0000000000000066 ------ECIE is set [ 2481.734830] exregion-0290 [00] ex_system_io_space_han: System-IO (width 8) R/W 1 Address=00000000000000E1 ------Store (Zero, DELY) [ 2481.735287] exregion-0290 [00] ex_system_io_space_han: System-IO (width 8) R/W 0 Address=0000000000000066 ------ECIE is cleared [ 2481.735764] exregion-0290 [00] ex_system_io_space_han: System-IO (width 8) R/W 1 Address=0000000000000062 ------Store (Arg1, EC62) [ 2481.736184] exregion-0290 [00] ex_system_io_space_han: System-IO (width 8) R/W 0 Address=0000000000000066 ------ECIE is set [ 2481.736529] exregion-0290 [00] ex_system_io_space_han: System-IO (width 8) R/W 1 Address=00000000000000E1 ------Store (Zero, DELY) [ 2481.736877] exregion-0290 [00] ex_system_io_space_han: System-IO (width 8) R/W 0 Address=0000000000000066 ------ECIE is set [ 2481.737244] exregion-0290 [00] ex_system_io_space_han: System-IO (width 8) R/W 1 Address=00000000000000E1 ------Store (Zero, DELY) [ 2481.737698] exregion-0290 [00] ex_system_io_space_han: System-IO (width 8) R/W 0 Address=0000000000000066 ------ECIE is cleared I don't know if this results in the long latency of eeepc initialization, but may be related.
another log that shows a lot of times in 2.6.29-rc5 is that exregion-0290 [00] ex_system_io_space_han: System-IO (width 8) R/W 0 Address=0000000000000400 this log is printed out when invoking FSBT()-->RCLK()-->RBLK()-->SMBB() Method (SMBB, 2, NotSerialized) { ... Store (0x54, HSTC) Sleep (0x05) Store (0xFF, Local0) While (Local0) { Decrement (Local0) Sleep (0x02) If (And (HSTS, One)) { Sleep (0x02) } If (And (HSTS, 0x02)) { Store (Zero, Local0) Store (One, Local1) } } ... } it seems that bit1 of HSTS is not set, while it should be. HST_STS—Host Status Register (SMBUS—D31:F3) bit1: INTR — R/WC (special). This bit can only be set by termination of a command. INTR is not dependent on the INTREN bit (offset SMBASE + 02h, bit 0) of the Host controller register (offset 02h). It is only dependent on the termination of the command. If the INTREN bit is not set, then the INTR bit will be set, although the interrupt will not be generated. Software can poll the INTR bit in this non-interrupt case. 0 = Software clears this bit by writing a 1 to it. The ICH7 then deasserts the interrupt or SMI#. 1 = The source of the interrupt or SMI# was the successful completion of its last command. this line in the dmesg may suggest the real problem, ACPI: I/O resource 0000:00:1f.3 [0x400-0x41f] conflicts with ACPI region SMRG [0x400-0x40f]
it would be great if you can bisect drivers/acpi/ec.c and see if there is any ec commit introduce this regression.
cc Jean.
Does the problem disappear if you boot with acpi_enforce_resources=strict?
acpi_enforce_resources=strict does indeed make the problem disappear. If an ec.c bisection would still be useful, let me know...
do you have i2c_i801 driver built in 2.6.28.4? *** This bug has been marked as a duplicate of bug 12376 ***
Yes, without obvious problems. There's that resource conflict, but it seems harmless (or at least hasn't caused me any problems other than that boot delay in .29-rc*). This leaves, for me, two post-2.6.28 regressions: the ACPI hotkey reporting and the pciehp/eeepc-laptop oops-causing interaction. Both have patches.