Bug 10510

Summary: s3: interrupt storm upon resume - Asus M6700Me
Product: ACPI Reporter: Matthias Bläsing (matthias.blaesing)
Component: Power-BatteryAssignee: Zhang Rui (rui.zhang)
Status: REJECTED WILL_NOT_FIX    
Severity: normal CC: acpi-bugzilla, akpm
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: 2.6.25 Subsystem:
Regression: --- Bisected commit-id:
Attachments: lspci -vvxx output
gzipped output (syslog) of run without battery
output (syslog) of test-run with battery module loaded (beware: get expanded to 3MB)
ACPI Interrupts monitored while a suspend/resume cyle w/ and w/o the battery module
events while removing/reinserting battery
try the debug patch in which the query_pending bit is clear after processing EC notification event

Description Matthias Bläsing 2008-04-23 13:03:26 UTC
Latest working kernel version: 2.6.24 (sort of, with a patch in osl.c, that flushes the work queue (bug number unknown, don't know how to tame the bugzilla beast))
Earliest failing kernel version: 2.6.25
Distribution: Debian SID
Hardware Environment: Asus M6700Me Notebook
Software Environment: ?!
Problem Description:

After I do an suspend/resume cyle my system feels generally sluggish. I think it's caused by the acpi system as I get this:

root@prometheus:~# acpi -a
     Battery 1: charged, 100%, rate information unavailable.
     Battery 2: charged, 100%, rate information unavailable.
  AC Adapter 1: on-line
root@prometheus:~# acpi -a
     Battery 1: charged, 100%, rate information unavailable.
  AC Adapter 1: on-line
root@prometheus:~# acpi -a
     Battery 1: charged, 100%, rate information unavailable.
  AC Adapter 1: on-line
root@prometheus:~# acpi -a
     Battery 1: charged, 100%, rate information unavailable.
     Battery 2: charged, 100%, rate information unavailable.
  AC Adapter 1: on-line
root@prometheus:~# 

These commands are issued one after another and the second battery stays the whole time. Another indicator ist, that with the above mentioned patch (without it I get an oops) the multimedia keys are reported with a huge delay (if ever). 

I can't do a bisect on this machine, as it's needed for work. But I can recompile with acpi debugging compiled in.
Comment 1 Zhang Rui 2008-04-23 19:54:33 UTC
(In reply to comment #0)
> Latest working kernel version: 2.6.24 (sort of, with a patch in osl.c, that
> flushes the work queue (bug number unknown, don't know how to tame the
> bugzilla beast))

The patch in bug 9772/2884?

> root@prometheus:~# acpi -a
>      Battery 1: charged, 100%, rate information unavailable.
>      Battery 2: charged, 100%, rate information unavailable.
>   AC Adapter 1: on-line
> root@prometheus:~# acpi -a
>      Battery 1: charged, 100%, rate information unavailable.
>   AC Adapter 1: on-line
> root@prometheus:~# acpi -a
>      Battery 1: charged, 100%, rate information unavailable.
>   AC Adapter 1: on-line
> root@prometheus:~# acpi -a
>      Battery 1: charged, 100%, rate information unavailable.
>      Battery 2: charged, 100%, rate information unavailable.
>   AC Adapter 1: on-line
> root@prometheus:~# 
> 
Please attach the acpidump output.
Please attach the dmesg output after this test.
Please re-do the test with acpid killed.
Comment 2 Andrew Morton 2008-04-23 21:43:10 UTC
Marked as regression.

Can you please send out that osl.c patch also?  It might have got lost.
Comment 3 Zhang Rui 2008-04-23 23:21:39 UTC
(In reply to comment #2)
> Marked as regression.
> 
> Can you please send out that osl.c patch also?  It might have got lost.
> 
It's bug #10265.

Matthias,
Does battery 2 keep on popping up and disappearing all the time after S3?
Or things became normal after a few seconds?

This is a piece of BIOS code from your acpidump.
_L13() {
   ...
   If (\_SB.BT1S)
   {
           Store (0x00, \_SB.BT1S)
           Notify (\_SB.BAT1, 0x01)
   }
   Else
   {
           Notify (\_SB.BAT1, 0x00)
           Store (0x01, \_SB.BT1S)
   }
   ...
}
I don't not why the BIOS sets/clears the BAT1 present flag (BT1S) every time GPE 0x13 is fired.
But this can well explain the symptom on your laptop.
Linux/ACPI battery driver receives several ACPI Battery notifications when resuming, checks the _STA method and finds that BAT1 is absent and then become present.

I don't think this is a regression.
Matthias,
You said that 2.6.24 with one extra patch doesn't have this problem, can you make a double check?
Comment 4 Matthias Bläsing 2008-04-25 14:13:08 UTC
Hey,

please excuse my fault, I booted into 2.6.24 (the one with the above mentioned patch) and - yep I got the same symptoms. So I see two basic problem (maye 3):

1.) The battery reporting is flaky after resume from S3 (not sure whether this was the case with 2.6.23)
2.) Something breaks after resume with respect to the asus multimedia keys and
3.) (this might be selective attention) the system feels sluggish (afaik the cpu stages are controlled via acpi -- acpi_cpufreq might indicate that)

removing the acpid and it didn't helped.

The multimedia keys stopped working completely after the resume until I removed the battery module. Then the reaction to the button presses was slow. Half an hour later the system feels "normal" again. Could it be, that the acpi events pile up and the system is slow to work down the pile?

Maybe I can try a whole suspend/resume cycle without the battery module - maybe this enlightens this a bit more.
Comment 5 Matthias Bläsing 2008-04-26 06:25:03 UTC
Ok - battery seems a bit fishy. With battery loaded it takes approx. 10s to get the multimedia key-press registered. With the battery modules remove approx 2s. I even removed the hardware (second battery module) and got battery 2 reported ...

The dbus-monitor used on the system bus shows, that hal reports removes and additions of the hardware nearly every second.

I had a look at /proc/interrupts and I'm not sure what to take out of the fact, that there are more acpi interrupts counted than timer interrupts. Is this considered normal?
Comment 6 Len Brown 2008-04-28 19:32:21 UTC
> please excuse my fault, I booted into 2.6.24 
> and - yep I got the same symptoms

clearing regression flag on this report.

> Debian SID

Does this distro try to remove the battery driver before suspend?
(that is the only way one can explain the need for the patch
in bug #10265).

What happens if the distro scripts are modified so that battery
is not unloaded (or if you boot a kernel with ACPI_BATTERY=y so
that the unload fails?)
Comment 7 Zhang Rui 2008-04-28 22:51:55 UTC
Please attach the output of "lspci -vvxx".
Comment 8 Matthias Bläsing 2008-04-30 05:29:17 UTC
I run a few more tests and made sure that I didn't use the Debian scripts, but put the system to sleep by hand (echo mem > /sys/power/sleep).

When I remove the battery module prior to putting the system to sleep I get a few acpi interrupts for something like 20s (then system is back to normal) (approx. 10 interrupts per second). When the battery module is still loaded I get the "storm" of approx. 1000 interrups per second and I'm not sure whether or the not the situation settles down after some time.
Comment 9 Matthias Bläsing 2008-04-30 05:30:08 UTC
Created attachment 15987 [details]
lspci -vvxx output
Comment 10 Zhang Rui 2008-05-03 20:24:26 UTC
Please
make sure CONFIG_ACPI_DEBUG is set
echo 0x044 > /sys/module/acpi/parameters/debug_layer
echo 0x8800001f > /sys/module/acpi/parameters/debug_level
and re-do the same test you did in comment #8.
Please attach the dmesg output w/ and w/o the battery module.
Comment 11 Matthias Bläsing 2008-05-04 05:52:39 UTC
Some statistics before I attach the result from the syslog:

w/ battery:
acpi-ints before suspend/resume: 2968
acpi-ints after suspend/resume: 4986
acpi-ints 30s later: 10574

w/o battery:
acpi-ints before suspend/resume: 3664
acpi-ints after suspend/resume: 4156
acpi-ints 30s later: 4394
Comment 12 Matthias Bläsing 2008-05-04 05:54:15 UTC
Created attachment 16022 [details]
gzipped output (syslog) of run without battery
Comment 13 Matthias Bläsing 2008-05-04 05:55:04 UTC
Created attachment 16023 [details]
output (syslog) of test-run with battery module loaded (beware: get expanded to 3MB)
Comment 14 Zhang Rui 2008-05-05 00:05:20 UTC
(In reply to comment #11)
> Some statistics before I attach the result from the syslog:
> 
> w/ battery:
> acpi-ints before suspend/resume: 2968
> acpi-ints after suspend/resume: 4986
> acpi-ints 30s later: 10574
> 
> w/o battery:
> acpi-ints before suspend/resume: 3664
> acpi-ints after suspend/resume: 4156
> acpi-ints 30s later: 4394
> 
IMO, this is still too much.
Could you do the same test after kill all the processes that are reading /proc/acpi/event?

Could you please do this test
cd /sys/firmware/acpi/interrupts/
grep . *
and attach the result after resume. (w/ and w/o battery module)
Comment 15 Matthias Bläsing 2008-05-05 02:44:20 UTC
Created attachment 16027 [details]
ACPI Interrupts monitored while a suspend/resume cyle w/ and w/o the battery module

The result of the test is attached (acpi-interrups.tgz) this is how the data was created:

root@prometheus:~# cd /sys/firmware/acpi/interrupts/
root@prometheus:/sys/firmware/acpi/interrupts# /etc/init.d/acpid stop; modprobe -r battery; sleep 10; grep . * >> /tmp/acpi-interrups-battery-wo-prior; echo mem > /sys/power/state; grep . * >> /tmp/acpi-interrups-battery-wo-after; sleep 30; grep . * >> /tmp/acpi-interrups-battery-wo-30s; sleep 60; modprobe battery; sleep 60; grep . * >> /tmp/acpi-interrups-battery-w-prior; echo mem > /sys/power/state; grep . * >> /tmp/acpi-interrups-battery-w-after;sleep 30; grep . * >> /tmp/acpi-interrups-battery-w-30s
Comment 16 Matthias Bläsing 2008-05-15 12:32:13 UTC
Some more info: It's not tied to S3 resume. It also happens when I remove/plugin the second battery. The uptime is currently approx. 10 min and I'm about reaching 150000 ACPI Interrupts. I think this is the root problem and the suspend/resume cycle just makes it visible.
Comment 17 Matthias Bläsing 2008-05-17 02:19:35 UTC
I just had a thought: It seems that the ACPI subsystem is flooded with interrupts.  Couldn't be the interrupts deactivated and the battery system switch to a polling model (maybe monitoring the interrupt frequency from time to time). Not sure this is possible, but I would rather loose the multimedia buttons, than my akku reporting.
Comment 18 Zhang Rui 2008-05-26 01:25:20 UTC
it seems that some user space scripts are invoked to query the AC/Battery status, and this brings a lot of ec interrupts.
Could you please try the following test?
lsof /proc/acpi/event;
kill all the processes polling this file.
cat /proc/acpi/event
remove/plugin the second battery
see if the interrupt storm occurs and attach the ouput of /proc/acpi/event.
Comment 19 Matthias Bläsing 2008-06-07 06:27:24 UTC
Created attachment 16424 [details]
events while removing/reinserting battery

Yep also without having anything listening on /proc/acpi/events (apart from a cat) I got the storm. I stopped acpid, started cat, removed the battery (same 100 interrupts every 1-2 Minutes). When I reinserted the battery, the counter began to step up and in about a minute it went from 9000 to approx. 30000 (and its still going up).

This is kernel 2.6.26-rc5.
Comment 20 Zhang Rui 2008-08-28 02:04:52 UTC
Hi, Matthias,
sorry for the delay.

I'm afraid this is a BIOS/Hardware issue that we couldn't fix in Linux kernel.
GPE _L13 keeps on firing after resume, about 20 times in 30 minutes.
Each _L13 will result in the loading/unloading of battery driver, and loading/unloading the battery driver need to access some EC address space, which may bring hundreds of interrupts.

Please check if upgrading the BIOS helps.

Reject this bug.
Comment 21 ykzhao 2008-09-02 22:35:20 UTC
Created attachment 17585 [details]
try the debug patch in which the query_pending bit is clear after processing EC notification event

Hi, Matthias
    Will you please try the debug patch on the latest kernel(2.6.27-rc4) and see whether the number of ACPI interrupt is increased as fast as before?
   Thanks.
Comment 22 Matthias Bläsing 2008-09-05 12:21:32 UTC
Excuse me, but I sold the notebook approx. two months ago, so can't test anymore.
Comment 23 ykzhao 2008-09-09 02:09:22 UTC
Hi, Matthias
    Thanks for notification. 
    It doesn't matter. Maybe someone can test the attached patch.

    Thanks.