Bug 12706 - Oopses and ACPI problems (Linus 2.6.29-rc4)
Oopses and ACPI problems (Linus 2.6.29-rc4)
Status: CLOSED DUPLICATE of bug 12376
Product: ACPI
Classification: Unclassified
Component: EC
All Linux
: P1 normal
Assigned To: Zhang Rui
:
Depends on:
Blocks: 12398
  Show dependency treegraph
 
Reported: 2009-02-14 12:59 UTC by Rafael J. Wysocki
Modified: 2009-02-26 01:42 UTC (History)
4 users (show)

See Also:
Kernel Version: 2.6.29-rc4
Tree: Mainline
Regression: Yes


Attachments
dmesg-loading-eeepc-laptop-in-2.6.29-rc5 (488.18 KB, text/plain)
2009-02-23 00:24 UTC, Zhang Rui
Details
Boot log for 2.6.28.4 with ACPI debugging info (165.33 KB, text/plain)
2009-02-24 06:12 UTC, Darren Salt
Details
Boot log for 2.6.28.4 with many reams of ACPI debugging ingo (348.03 KB, application/x-lzma)
2009-02-24 19:21 UTC, Darren Salt
Details

Description Rafael J. Wysocki 2009-02-14 12:59:42 UTC
Subject    : Oopses and ACPI problems (Linus 2.6.29-rc4)
Submitter  : Darren Salt <linux@youmustbejoking.demon.co.uk>
Date       : 2009-02-09 18:26
References : http://marc.info/?l=linux-kernel&m=123420431709877&w=4
Notify-Also : Matthew Garrett <mjg59@srcf.ucam.org>
Notify-Also : Corentin Chary <corentin.chary@gmail.com>
Notify-Also : "yakui_zhao" <yakui.zhao@intel.com>

This entry is being used for tracking a regression from 2.6.28.  Please don't
close it until the problem is fixed in the mainline.
Comment 1 Darren Salt 2009-02-15 04:16:28 UTC
http://marc.info/?l=linux-kernel&m=123427814117781&w=2
 – patch fixing the ACPI hotkey reporting

(mentioning here in case it gets lost)
Comment 2 Rafael J. Wysocki 2009-02-15 06:05:38 UTC
Patch : http://marc.info/?l=linux-kernel&m=123427814117781&w=2
Comment 3 Zhang Rui 2009-02-16 01:02:13 UTC
http://marc.info/?l=linux-acpi&m=123427780517184&w=4

> 5740294: eeepc-laptop: Implement rfkill hotplugging in eeepc-laptop
Reverting this fixes the rfkill oops; things work correctly again.

so there are two regressions and one bug in all, right?
1. hotkey reporting. fixed by the patch in comment #1
2. kernel oops, should be fixed by comment 7695fb04aca62e2d8a7ca6ede50f6211e1d71e53
3. long time ( 26s! ) during eeepc-laptop init. I have a eeepc901 and I'll try to reproduce this problem with the latest upstream kernel.
Comment 4 Darren Salt 2009-02-17 08:21:43 UTC
(ref. comment no. 3)
(and if bugzilla doesn't turn that into a link, it's buggy :-) )

1. Yes.

2. That was a different problem. The oopses in *this* report were caused by interaction between pciehp and eeepc-laptop due to pciehp_force=1; setting that to 0 (or removing pciehp) fixes things. (I don't have the ACPI hotplug driver built, but from what I've read I don't think that that would make any difference anyway.)

3. Yes, still seeing that in -rc5. Details at http://marc.info/?l=linux-kernel&m=123429556717946&w=2; BIOS 1808.
Comment 5 Zhang Rui 2009-02-23 00:13:31 UTC
Yes, I can reproduce this problem.
this is a regression, as invoking INIT method causes 15+ seconds in 2.6.29, which used to be 1s.
Comment 6 Zhang Rui 2009-02-23 00:24:00 UTC
Created attachment 20324 [details]
dmesg-loading-eeepc-laptop-in-2.6.29-rc5

dmesg shows that invoking INIT costs about 17s, while most of the time (14s +) is spent in reading HSTS (smbus host status register, io address 0x400) register.
By reading the AML code, it seems that HSTS doesn't return the right state in time and the AML code keeps reading it until timeout...
Comment 7 Zhang Rui 2009-02-23 01:07:28 UTC
it would be great if someone can provide the dmesg while loading eeepc-laptop driver in an earlier kernel (INIT takes 1s).
please do remember to set CONFIG_ACPI_DEBUG,
run "echo 0x80 > /sys/module/acpi/parameters/debug_layer"
and "echo 0x04 > /sys/module/acpi/parameters/debug_level"
before loading eeepc-laptop
Comment 8 Darren Salt 2009-02-24 06:12:35 UTC
Created attachment 20349 [details]
Boot log for 2.6.28.4 with ACPI debugging info

Since eeepc-laptop is built in, I appended "acpi.debug_layer=128 acpi.debug_level=4 printk.time=1" to the kernel command line.
Comment 9 Zhang Rui 2009-02-24 17:45:48 UTC
yes, there are a lot of differences in the dmesg output.
Darren, please do the test again, with "acpi.debug_layer=0x90
acpi.debug_level=0x404 printk.time=1".
let's see what the different code paths are in the two kernels.
Comment 10 Darren Salt 2009-02-24 19:21:11 UTC
Created attachment 20359 [details]
Boot log for 2.6.28.4 with many reams of ACPI debugging ingo

A little light reading for somebody...
Comment 11 Zhang Rui 2009-02-24 23:46:50 UTC
well, this seems to be another EC problem.
eeepc-laptop needs to evaluate INIT control method during inialization,
which will invoke ECXW
                    Method (ECXW, 2, Serialized)
                    {
                        If (ECAV ())
                        {
                            If (LNot (Acquire (MUEC, 0xFFFF)))
                            {
                                IBFX ()
                                Store (Arg0, EC66)
                                IBFX ()
                                Store (Arg1, EC62)
                                IBFX ()
                                Release (MUEC)
                            }
                        }
                    }

And for this piece of code, 
                                IBFX ()
                                Store (Arg0, EC66)
                                IBFX ()
                                Store (Arg1, EC62)
                                IBFX ()
dmesg in 2.6.28.4 shows the correct log
[    3.894262] exregion-0289 [15] ex_system_io_space_han: System-IO (width 8) R/W 0 Address=0000000000000066  ------ECIE is cleared (in IBFX)
[    3.907389] exregion-0289 [16] ex_system_io_space_han: System-IO (width 8) R/W 1 Address=0000000000000066  ------Store (Arg0, EC66)
[    3.920458] exregion-0289 [15] ex_system_io_space_han: System-IO (width 8) R/W 0 Address=0000000000000066  ------ECIE is cleared (in IBFX)
[    3.933580] exregion-0289 [16] ex_system_io_space_han: System-IO (width 8) R/W 1 Address=0000000000000062  ------Store (Arg1, EC62)
[    3.946648] exregion-0289 [15] ex_system_io_space_han: System-IO (width 8) R/W 0 Address=0000000000000066  ------ECIE is cleared (in IBFX)

But it's not in 2.6.29-rc5
[ 2481.733642] exregion-0290 [00] ex_system_io_space_han: System-IO (width 8) R/W 0 Address=0000000000000066  ------ECIE is cleared (in IBFX)
[ 2481.734109] exregion-0290 [00] ex_system_io_space_han: System-IO (width 8) R/W 1 Address=0000000000000066  ------Store (Arg0, EC66)
[ 2481.734490] exregion-0290 [00] ex_system_io_space_han: System-IO (width 8) R/W 0 Address=0000000000000066  ------ECIE is set
[ 2481.734830] exregion-0290 [00] ex_system_io_space_han: System-IO (width 8) R/W 1 Address=00000000000000E1  ------Store (Zero, DELY)
[ 2481.735287] exregion-0290 [00] ex_system_io_space_han: System-IO (width 8) R/W 0 Address=0000000000000066  ------ECIE is cleared
[ 2481.735764] exregion-0290 [00] ex_system_io_space_han: System-IO (width 8) R/W 1 Address=0000000000000062  ------Store (Arg1, EC62)
[ 2481.736184] exregion-0290 [00] ex_system_io_space_han: System-IO (width 8) R/W 0 Address=0000000000000066  ------ECIE is set
[ 2481.736529] exregion-0290 [00] ex_system_io_space_han: System-IO (width 8) R/W 1 Address=00000000000000E1  ------Store (Zero, DELY)
[ 2481.736877] exregion-0290 [00] ex_system_io_space_han: System-IO (width 8) R/W 0 Address=0000000000000066  ------ECIE is set
[ 2481.737244] exregion-0290 [00] ex_system_io_space_han: System-IO (width 8) R/W 1 Address=00000000000000E1  ------Store (Zero, DELY)
[ 2481.737698] exregion-0290 [00] ex_system_io_space_han: System-IO (width 8) R/W 0 Address=0000000000000066  ------ECIE is cleared

I don't know if this results in the long latency of eeepc initialization,
but may be related.
Comment 12 Zhang Rui 2009-02-24 23:54:35 UTC
another log that shows a lot of times in 2.6.29-rc5 is that
exregion-0290 [00] ex_system_io_space_han: System-IO (width 8) R/W 0 Address=0000000000000400

this log is printed out when invoking FSBT()-->RCLK()-->RBLK()-->SMBB()
                    Method (SMBB, 2, NotSerialized)
                    {
                        ...
                        Store (0x54, HSTC)
                        Sleep (0x05)
                        Store (0xFF, Local0)
                        While (Local0)
                        {
                            Decrement (Local0)
                            Sleep (0x02)
                            If (And (HSTS, One))
                            {
                                Sleep (0x02)
                            }

                            If (And (HSTS, 0x02))
                            {
                                Store (Zero, Local0)
                                Store (One, Local1)
                            }
                        }
                        ...
                    }
it seems that bit1 of HSTS is not set, while it should be.

HST_STS—Host Status Register (SMBUS—D31:F3)
bit1: INTR — R/WC (special). This bit can only be set by termination of a command. INTR is not dependent on the INTREN bit (offset SMBASE + 02h, bit 0) of the Host controller register (offset 02h). It is only dependent on the termination of the command. If the INTREN bit is not set, then the INTR bit will be set, although the interrupt will not be  generated. Software can poll the INTR bit in this non-interrupt case.
  0 = Software clears this bit by writing a 1 to it. The ICH7 then deasserts the interrupt or SMI#.
  1 = The source of the interrupt or SMI# was the successful completion of its last command.

this line in the dmesg may suggest the real problem,
ACPI: I/O resource 0000:00:1f.3 [0x400-0x41f] conflicts with ACPI region SMRG [0x400-0x40f]
Comment 13 Zhang Rui 2009-02-24 23:58:28 UTC
it would be great if you can bisect drivers/acpi/ec.c and see if there is any ec commit introduce this regression.
Comment 14 Zhang Rui 2009-02-25 00:15:05 UTC
cc Jean.
Comment 15 Jean Delvare 2009-02-25 00:47:26 UTC
Does the problem disappear if you boot with acpi_enforce_resources=strict?
Comment 16 Darren Salt 2009-02-25 12:07:59 UTC
acpi_enforce_resources=strict does indeed make the problem disappear.

If an ec.c bisection would still be useful, let me know...
Comment 17 Zhang Rui 2009-02-25 16:58:25 UTC
do you have i2c_i801 driver built in 2.6.28.4?

*** This bug has been marked as a duplicate of bug 12376 ***
Comment 18 Darren Salt 2009-02-25 19:09:56 UTC
Yes, without obvious problems. There's that resource conflict, but it seems harmless (or at least hasn't caused me any problems other than that boot delay in .29-rc*).

This leaves, for me, two post-2.6.28 regressions: the ACPI hotkey reporting and the pciehp/eeepc-laptop oops-causing interaction. Both have patches.

Note You need to log in before you can comment on or make changes to this bug.