Bug 10811 - program hangs reading ac/battery {state,info} - 2.6.24 regression
Summary: program hangs reading ac/battery {state,info} - 2.6.24 regression
Status: CLOSED INSUFFICIENT_DATA
Alias: None
Product: ACPI
Classification: Unclassified
Component: EC (show other bugs)
Hardware: All Linux
: P1 normal
Assignee: Lin Ming
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2008-05-28 05:41 UTC by paolo
Modified: 2009-05-07 19:56 UTC (History)
6 users (show)

See Also:
Kernel Version: 2.6.24.7, 2.6.25.4
Subsystem:
Regression: Yes
Bisected commit-id:


Attachments
acpidump -b of notebook Gericom 255/259 (6.05 KB, application/gzip)
2008-05-28 22:03 UTC, paolo
Details
full acpidump of Gericom nb (6.05 KB, application/octet-stream)
2008-06-03 21:45 UTC, paolo
Details
debug patch to collect interpreter mutex info (1.93 KB, patch)
2008-06-16 02:16 UTC, Lin Ming
Details | Diff
dmesg + mutex.log as requested (503.99 KB, application/octet-stream)
2008-06-20 14:45 UTC, paolo
Details
last klog till reader for BAT0/* got stuck in D state; no ac module, only battery loaded (41.46 KB, application/octet-stream)
2008-06-20 16:08 UTC, paolo
Details
custom dsdt patch (445 bytes, patch)
2008-08-25 00:55 UTC, Lin Ming
Details | Diff

Description paolo 2008-05-28 05:41:51 UTC
Latest working kernel version: 2.6.18 (Debian's, latest)
Earliest failing kernel version: 2.6.24.* + www.tuxonice.net patch.
Distribution: Debian
Hardware Environment: x86 notebooks
Software Environment: 
Problem Description: programs reading ac/battery status from /proc hang in 'D' state.

The latest kernel 2.6.18 from Debian is the last one working here, on a couple 
of notebooks (Toshiba, Gericom).

In the logs I get lots of 

INFO: task grep:5012 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
grep          D d139b040     0  5012   5010
       dc669e00 00000082 00000000 d139b040 d3be42c0 ddc96160 dc669e18 de8029c0 
       d3be4060 00000246 dc669e38 c03967d5 de8029c8 00000001 d3be4060 c011eb00 
       db433dfc dc6b3dfc de8029c0 00000282 dc669e38 de8029c0 00000000 0000ffff 
Call Trace:
 [<c03967d5>] __down+0x85/0x110
 [<c011eb00>] ? default_wake_function+0x0/0x10
 [<c03966e2>] __down_failed+0xa/0x10
 [<c02420d7>] acpi_os_wait_semaphore+0xad/0x16e
 [<c025ce38>] acpi_ut_acquire_mutex+0x66/0xd3
 [<c024fdae>] acpi_ex_enter_interpreter+0xa/0x25
 [<c025393f>] acpi_ns_evaluate+0x107/0x1b4
 [<c0253155>] acpi_evaluate_object+0x18d/0x259
 [<c0165aed>] ? get_page_from_freelist+0x6d/0xb0
 [<c02429cb>] acpi_evaluate_integer+0x7c/0xfb
 [<df84a03f>] acpi_ac_get_state+0x25/0x57 [ac]
 [<df84a086>] acpi_ac_seq_show+0x15/0x52 [ac]
 [<c01a0295>] seq_read+0xd5/0x2c0
 [<c01a01c0>] ? seq_read+0x0/0x2c0
 [<c01b9ae3>] proc_reg_read+0x93/0xc0
 [<c018445e>] vfs_read+0x8e/0x140
 [<c0183a12>] ? do_sys_open+0xc2/0xe0
 [<c01847ad>] sys_read+0x3d/0x70
 [<c010406e>] sysenter_past_esp+0x5f/0x85
 =======================

where such process is like:

  4972 ?        S      0:00     /USR/SBIN/CRON
 4973 ?        Ss     0:00       /bin/sh -c test -x /etc/init.d/anacron && /usr/sbin/invoke-rc.d anacron start >/dev/n
 4974 ?        S      0:00         /bin/sh /usr/sbin/invoke-rc.d anacron start
 5005 ?        S      0:00           /bin/sh /etc/init.d/anacron start
 5010 ?        S      0:00             /bin/sh /usr/bin/on_ac_power
 5012 ?        D      0:00               grep --quiet on-line /proc/acpi/ac_adapter/AC0/state

Kenel 2.6.22.* won't even boot, no matter what - it'd hangs aftre printing a 
few
'ACPI ..."
lines.

Sorry no screenshots handy - will try to make them if required.

acpi_serialize is set; I've tried many acpi_* bootparam combination without success.
Comment 1 Andrew Morton 2008-05-28 11:54:00 UTC
Can you tell us whether this occurs when the tuxonice patches
are not applied?

Thanks.
Comment 2 paolo 2008-05-28 20:21:44 UTC
Sure. I'm going to compile vanilla kernel and install+reboot as soon as those nb are available.
Comment 3 ykzhao 2008-05-28 20:52:01 UTC
Will you please attach the output of acpidump?
thanks.
Comment 4 paolo 2008-05-28 22:03:49 UTC
Created attachment 16311 [details]
acpidump -b of notebook Gericom 255/259
Comment 5 paolo 2008-05-28 22:05:02 UTC
For the Toshiba, pls see #7214:
http://bugzilla.kernel.org/attachment.cgi?id=9122&action=view
if DSDT is enough (nb unavail right now)
Comment 6 paolo 2008-05-31 13:27:38 UTC
(In reply to comment #1)
> Can you tell us whether this occurs when the tuxonice patches
> are not applied?

yes. Checked both vanilla .25.4 and .24.7: same result. I'm not even able to 
power off the pc, since laptop-mode hangs (while checking for ac/battery, of course).
This round could check the Gericom nb only, will try the Toshiba as well
when possible.
Didn't check .22.19 this time, since I didn't have a camera handy.


thanks

======
FYI:

RAM::487 MB
BIOS::V=American Megatrends Inc. v=080010 R=Date: 11/17/2004
CPU::Intel(R) Celeron(R) M processor 1400MHz, 175.000 MHz
Chassis::M=Gericom
Processor::M=Intel v=Intel(R) Celeron(R) M processor 1400MHz
System::M=Gericom P=255/259 Series
PCI::CardBus bridge: O2 Micro, Inc. OZ711M1/MC1 4-in-1 MemoryCardBus Controller (rev 20)
PCI::Display controller: Intel Corporation 82852/855GM Integrated Graphics Device (rev 02)
PCI::Ethernet controller: Atheros Communications, Inc. AR5212 802.11abg NIC (rev 01)
PCI::Ethernet controller: Realtek Semiconductor Co., Ltd. RTL-8139/8139C/8139C+ (rev 10)
PCI::FireWire (IEEE 1394): Texas Instruments TSB43AB22/A IEEE-1394a-2000 Controller (PHY/Link)
PCI::Host bridge: Intel Corporation 82852/82855 GM/GME/PM/GMV Processor to I/O Controller (rev 02)
PCI::IDE interface: Intel Corporation 82801DBM (ICH4-M) IDE Controller (rev 03)
PCI::ISA bridge: Intel Corporation 82801DBM (ICH4-M) LPC Interface Bridge (rev 03)
PCI::Modem: Intel Corporation 82801DB/DBL/DBM (ICH4/ICH4-L/ICH4-M) AC'97 Modem Controller (rev 03)
PCI::Multimedia audio controller: Intel Corporation 82801DB/DBL/DBM (ICH4/ICH4-L/ICH4-M) AC'97 Audio Controller (rev 03)
PCI::PCI bridge: Intel Corporation 82801 Mobile PCI Bridge (rev 83)
PCI::SMBus: Intel Corporation 82801DB/DBL/DBM (ICH4/ICH4-L/ICH4-M) SMBus Controller (rev 03)
PCI::System peripheral: Intel Corporation 82852/82855 GM/GME/PM/GMV Processor to I/O Controller (rev 02)
PCI::System peripheral: O2 Micro, Inc. OZ711Mx 4-in-1 MemoryCardBus Accelerator
PCI::USB Controller: Intel Corporation 82801DB/DBL/DBM (ICH4/ICH4-L/ICH4-M) USB UHCI Controller #1 (rev 03)
PCI::USB Controller: Intel Corporation 82801DB/DBL/DBM (ICH4/ICH4-L/ICH4-M) USB UHCI Controller #2 (rev 03)
PCI::USB Controller: Intel Corporation 82801DB/DBL/DBM (ICH4/ICH4-L/ICH4-M) USB UHCI Controller #3 (rev 03)
PCI::USB Controller: Intel Corporation 82801DB/DBM (ICH4/ICH4-M) USB2 EHCI Controller (rev 03)
PCI::VGA compatible controller: Intel Corporation 82852/855GM Integrated Graphics Device (rev 02)
Comment 7 Zhang Rui 2008-06-02 01:05:08 UTC
Please re-attach the acpidump w/o any parameter.
Comment 8 paolo 2008-06-03 21:45:57 UTC
Created attachment 16385 [details]
full acpidump of Gericom nb

ok, this is the vanila acpidump of that notebook.
Comment 9 paolo 2008-06-06 14:40:44 UTC
attachment
  http://bugzilla.kernel.org/attachment.cgi?id=16419&action=view
to #10811 has some more infos on this same notebook.
HTH
Comment 10 Len Brown 2008-06-12 19:37:59 UTC
so 2.6.18 was the latest version tested that worked,
2.6.22 doesn't boot, and 2.6.24,5 hang a process:

grep --quiet on-line /proc/acpi/ac_adapter/AC0/state

presumably calling...
acpi_evaluate_integer(ac->device->handle, "_PSR", NULL, &ac->state);

does 2.6.19, 2.6.20, or 2.6.21 work?

what do you see if you load the "ac" module,
but do not load the "battery" module?
Comment 11 paolo 2008-06-13 01:14:02 UTC
Iirc it worked up to .19.7; never used .20.x, .21.x due to boot troubles.
Iirc .2[01].x where rather problematic on other machines as well.

I'll check the 'ac' stuff (on 2.6.2[45].x) asap.
Comment 12 Len Brown 2008-06-13 18:27:30 UTC
> Iirc it worked up to .19.7; never used .20.x, .21.x due to boot troubles.
> Iirc .2[01].x where rather problematic on other machines as well.

I don't see that you reported any problems a year ago when
2.6.20 and 2.6.21 came out, did I miss something?
Comment 13 paolo 2008-06-15 11:25:56 UTC
(In reply to comment #12)
...
> I don't see that you reported any problems a year ago when
> 2.6.20 and 2.6.21 came out, did I miss something?

probably no; likely it was just me lacking mostly time to fiddle with cconfig
and bootparams and eventually file a report.
Comment 14 Lin Ming 2008-06-16 02:16:45 UTC
Created attachment 16503 [details]
debug patch to collect interpreter mutex info

Seems that interpreter mutex is acquired by some process that never release it.

Please apply this debug patch to linux-2.6.25 to collect interpreter mutex info.

1. apply the patch, build and boot
2. grep --quiet on-line /proc/acpi/ac_adapter/AC0/state
3. wait until the bug shows up
4. cat /proc/debug_mutex > mutex.log
5. dmesg > dmesg.log

Please attach mutex.log and dmesg.log
Comment 15 paolo 2008-06-20 14:45:38 UTC
Created attachment 16563 [details]
dmesg + mutex.log as requested

See also dmesg with full acpi debug on modprobe ac, battery.

And (liekly mostly but last part useless) kernel log with full acpi debug
till a loop read on BAT0/* hangs.

that was with 2.6.25.6 (+ tuxonice3-rc7)
Comment 16 paolo 2008-06-20 16:01:58 UTC
> what do you see if you load the "ac" module,
> but do not load the "battery" module?

usual 1 line ACPI... log - but see attachment to #15 for log with full acpi_debug.

with only ac and reading /state in 1s loop, got no hangs after ~1h.

with battery only, reader got stuck after a few mins. See attachment to next # (should be #17).
Comment 17 paolo 2008-06-20 16:08:56 UTC
Created attachment 16566 [details]
last klog till reader for BAT0/* got stuck in D state; no ac module, only battery loaded

kernel is as in #15: 2.6.25.6 with mutex debug patch.

last lines from dmesg:

DEBUG: acpi_dump_stack
----------------------
Pid: 4619, comm: hald
 [<c0262a2e>] ? acpi_dump_mutex+0x0/0x95
 [<c01b8f19>] proc_file_read+0x1c9/0x280
 [<c01b8d50>] ? proc_file_read+0x0/0x280
 [<c01b411c>] proc_reg_read+0x6c/0xc0
 [<c0180084>] vfs_read+0x94/0x130
 [<c01b40b0>] ? proc_reg_read+0x0/0xc0
 [<c01804bd>] sys_read+0x3d/0x70
 [<c0103fee>] sysenter_past_esp+0x5f/0x85
 =======================

and debug_mutex was:

ACPI Interpreter Mutex acquired by: hald, use count: 5920
Comment 18 paolo 2008-06-20 16:28:29 UTC
... funny, now closing shop, I see:

17444 ?        S      0:00     /bin/sh /etc/rc0.d/K20laptop-mode stop
17446 ?        S      0:00       /bin/bash /usr/sbin/laptop_mode stop init
17456 ?        S      0:00         /bin/bash /usr/sbin/laptop_mode stop init
17457 ?        D      0:00           cat /proc/acpi/button/lid/LID/state

and it hangs there, reading LID (as well as BAT0, of course).
Comment 19 Lin Ming 2008-06-22 20:03:38 UTC
> and debug_mutex was:
> 
> ACPI Interpreter Mutex acquired by: hald, use count: 5920
> 

Please "killall hald" and then re-test.
Comment 20 paolo 2008-06-26 14:34:03 UTC
ok, works ... till somthing happens. Tested twice, pattern seems clear:

logger -t acpi.debug -p kern.info ===ACPI DEBUG START===
while true;do
 cat /proc/acpi/battery/BAT0/* &>/dev/null & sleep 2; ps $! && break
 echo -n .
done
logger -t acpi.debug -p kern.info ===ACPI DEBUG END===

without hald, it goes on for hours, but during night only:

# grep === /var/log/socklog/main/current 
user.info: Jun 26 01:20:01 acpi.debug: ===ACPI DEBUG START===
user.info: Jun 26 08:39:44 acpi.debug: ===ACPI DEBUG END===

that's typically the time the user starts his's wmaker session, then a wmapp
starts to poll acpi:

# cat mutex_no-hald.log 
ACPI Interpreter Mutex acquired by: wmacpi, use count: 220414

and in dmesg:
DEBUG: acpi_dump_stack
----------------------
Pid: 12015, comm: wmacpi
 [<c0262a2e>] ? acpi_dump_mutex+0x0/0x95
 [<c01b8f19>] proc_file_read+0x1c9/0x280
 [<c01b8d50>] ? proc_file_read+0x0/0x280
 [<c01b411c>] proc_reg_read+0x6c/0xc0
 [<c0180084>] vfs_read+0x94/0x130
 [<c01b40b0>] ? proc_reg_read+0x0/0xc0
 [<c01804bd>] sys_read+0x3d/0x70
 [<c0103fee>] sysenter_past_esp+0x5f/0x85
 =======================

and from then on my process if locked solid:

12020 pts/1    D      0:00 cat /proc/acpi/battery/BAT0/alarm ...

that's 2.6.25.6.

Likely, without hald, during that test time acpi is read by my 'cat' only, 
since the nb is on ac and nothing bothers to poll the battery level, nor 
the ac state, since all is expectd to react on a acpid event. But wmacpi
polls ac/battery 3 times/min, clashes with 'cat' and one or the other gets 
stuck there.
Comment 21 paolo 2008-08-01 16:16:23 UTC
update:

2.6.25.11: same problem

but seems fixed in 2.6.26: it's running since ~1h with several processes
concurrently reading bat/ac status/info, on both the Gericom's and Toshiba's
notebooks, with no lockup.
Comment 22 Lin Ming 2008-08-04 02:16:41 UTC
paolo, 

Would you please do more test?
I'll close this bug if you can confirm it's really fixed in 2.6.26.
Thanks.
Comment 23 paolo 2008-08-04 02:42:06 UTC
yep, going to upgrade to 2.6.26.1 today or tomorrow, if all goes well users will stress the nb with normal day work till EOW; then I'll report the results.
Comment 24 paolo 2008-08-05 09:28:17 UTC
sigh, you're right to be prudent: 2.6.26.1 WFM on the Toshiba (~20h, at least 4 concurrent reading processes), but hangs immediately on the Gericom - here's dmesg dump (debug_mutex patch applied): 

DEBUG: acpi_dump_stack
----------------------
Pid: 5389, comm: cat
 [<c0268e5a>] ? acpi_dump_mutex+0x0/0x95
 [<c01bbd43>] proc_file_read+0x103/0x280
 [<c01bbc40>] ? proc_file_read+0x0/0x280
 [<c01b6e9c>] proc_reg_read+0x6c/0xc0
 [<c0181b04>] vfs_read+0x94/0x130
 [<c01b6e30>] ? proc_reg_read+0x0/0xc0
 [<c0181f3d>] sys_read+0x3d/0x70
 [<c0103059>] sysenter_past_esp+0x6a/0x91
 =======================
root@pavi:~# cat /proc/debug_mutex 
ACPI Interpreter Mutex acquired by: cat, use count: 5883
Comment 25 paolo 2008-08-24 15:19:24 UTC
With 2.6.26.3 and 2 processes 'cat /proc/.../state', I get cat hung in 'D'
state immediately on _both_ notebooks - so somehow .3 got worse than .2 :(
Comment 26 paolo 2008-08-24 15:21:46 UTC
- ... got worse than .2
+ ... got worse than .1
Comment 27 Lin Ming 2008-08-25 00:55:09 UTC
Created attachment 17428 [details]
custom dsdt patch

Does cat hang if you just run cat 2 times?

And would you please try the attached custom dsdt patch?
You can find info of how to use custome DSDT at,
http://www.lesswatts.org/projects/acpi/overridingDSDT.php
Comment 28 Lin Ming 2008-08-25 01:23:40 UTC
If bug still exist for the latest kernel, please apply the debug patch at #14.

and then run "cat /proc/<PID>/wchan"

<PID> is the pid of task who acquires the mutex, for example
DEBUG: acpi_dump_stack
----------------------
Pid: 12015, comm: wmacpi

cat /proc/12015/wchan
 
Comment 29 paolo 2008-08-25 02:05:41 UTC
(In reply to comment #27)
> Created an attachment (id=17428) [details]
> custom dsdt patch
> 
> Does cat hang if you just run cat 2 times?

2 or more processes trying to read from /state|/info
a single process seems to have no trouble - eg it run whole 
night till user logged in in the morning, hence firing up 
some acpi applet.
 
> And would you please try the attached custom dsdt patch?

ok, I'll try asap.
Comment 30 paolo 2008-08-25 23:01:01 UTC
Hung immediately again (2.6.26.3):

ACPI: DSDT 1EFD03F0, 3714 (r1  255II 255II001        1 INTL  2002026)
ACPI: Override [DSDT-255II001], this is unsafe: tainting kernel
ACPI: Table DSDT replaced by host OS
ACPI: DSDT 00000000, 316F (r1  255II 255II001        1 INTL 20060912)
ACPI: DSDT override uses original SSDTs unless "acpi_no_auto_ssdt"
Table [DSDT](id 0001) - 521 Objects with 49 Devices 144 Methods 17 Regions
ACPI: EC: Look up EC in DSDT

DEBUG: acpi_dump_stack
----------------------
Pid: 4586, comm: cat
 [<c0269722>] ? acpi_dump_mutex+0x0/0x95
 [<c01bc453>] proc_file_read+0x103/0x280
 [<c01bc350>] ? proc_file_read+0x0/0x280
 [<c01b75ac>] proc_reg_read+0x6c/0xc0
 [<c01821f4>] vfs_read+0x94/0x130
 [<c01b7540>] ? proc_reg_read+0x0/0xc0
 [<c018262d>] sys_read+0x3d/0x70
 [<c0103059>] sysenter_past_esp+0x6a/0x91
 =======================
~# cat /proc/debug_mutex 
ACPI Interpreter Mutex acquired by: cat, use count: 1162

~# cat /proc/4586/wchan 
down_timeout
#--------------------------------------------------------

should I try to override SSDT as well? worth trying?
Comment 31 Lin Ming 2008-08-27 17:31:13 UTC
> should I try to override SSDT as well? worth trying?

Thanks, not worth trying
Comment 32 ykzhao 2008-09-23 01:49:33 UTC
Hi, Paolo
    Will you please try the latest kernel(2.6.27-rc7) and see whether the problem still exists?
    From the acpidump it seems that the battery info is related with EC. EC internal register will be accessed to get the battery info. 
    Now a bunch of patches about EC are merged to the upstream kernel. Please try it again.
    Thanks.
Comment 33 paolo 2008-09-26 12:08:56 UTC
Alas, no joy:

while true;do cat /proc/acpi/battery/*/* /proc/acpi/ac_adapter/*/state;echo 1.$SECONDS;sleep 1;done & while true;do ca
t /proc/acpi/battery/*/* /proc/acpi/ac_adapter/*/state;echo 2.$SECONDS;sleep 1;done

# while true;do ps ax|grep ' [D.] ';sleep 3;done
 4931 pts/1    D      0:00 cat /proc/acpi/battery/BAT0/alarm /proc/acpi/battery/BAT0/info /proc/acpi/battery/BAT0/stat
e /proc/acpi/ac_adapter/AC0/state
 4931 pts/1    D      0:00 cat /proc/acpi/battery/BAT0/alarm /proc/acpi/battery/BAT0/info /proc/acpi/battery/BAT0/stat
e /proc/acpi/ac_adapter/AC0/state
...

0 seconds after issueing command above.
Comment 34 Lin Ming 2009-03-25 08:17:12 UTC
Hi, paolo

Would you please check if this bug still exists with latest kernel 2.6.29?

Thanks.
Comment 35 Zhang Rui 2009-05-05 06:43:05 UTC
close this bug as there is no response from the bug reporter.
please re-open it if the problem still exists in the latest kernel.

Note You need to log in before you can comment on or make changes to this bug.