Latest working kernel version: 2.6.18 (Debian's, latest) Earliest failing kernel version: 2.6.24.* + www.tuxonice.net patch. Distribution: Debian Hardware Environment: x86 notebooks Software Environment: Problem Description: programs reading ac/battery status from /proc hang in 'D' state. The latest kernel 2.6.18 from Debian is the last one working here, on a couple of notebooks (Toshiba, Gericom). In the logs I get lots of INFO: task grep:5012 blocked for more than 120 seconds. "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. grep D d139b040 0 5012 5010 dc669e00 00000082 00000000 d139b040 d3be42c0 ddc96160 dc669e18 de8029c0 d3be4060 00000246 dc669e38 c03967d5 de8029c8 00000001 d3be4060 c011eb00 db433dfc dc6b3dfc de8029c0 00000282 dc669e38 de8029c0 00000000 0000ffff Call Trace: [<c03967d5>] __down+0x85/0x110 [<c011eb00>] ? default_wake_function+0x0/0x10 [<c03966e2>] __down_failed+0xa/0x10 [<c02420d7>] acpi_os_wait_semaphore+0xad/0x16e [<c025ce38>] acpi_ut_acquire_mutex+0x66/0xd3 [<c024fdae>] acpi_ex_enter_interpreter+0xa/0x25 [<c025393f>] acpi_ns_evaluate+0x107/0x1b4 [<c0253155>] acpi_evaluate_object+0x18d/0x259 [<c0165aed>] ? get_page_from_freelist+0x6d/0xb0 [<c02429cb>] acpi_evaluate_integer+0x7c/0xfb [<df84a03f>] acpi_ac_get_state+0x25/0x57 [ac] [<df84a086>] acpi_ac_seq_show+0x15/0x52 [ac] [<c01a0295>] seq_read+0xd5/0x2c0 [<c01a01c0>] ? seq_read+0x0/0x2c0 [<c01b9ae3>] proc_reg_read+0x93/0xc0 [<c018445e>] vfs_read+0x8e/0x140 [<c0183a12>] ? do_sys_open+0xc2/0xe0 [<c01847ad>] sys_read+0x3d/0x70 [<c010406e>] sysenter_past_esp+0x5f/0x85 ======================= where such process is like: 4972 ? S 0:00 /USR/SBIN/CRON 4973 ? Ss 0:00 /bin/sh -c test -x /etc/init.d/anacron && /usr/sbin/invoke-rc.d anacron start >/dev/n 4974 ? S 0:00 /bin/sh /usr/sbin/invoke-rc.d anacron start 5005 ? S 0:00 /bin/sh /etc/init.d/anacron start 5010 ? S 0:00 /bin/sh /usr/bin/on_ac_power 5012 ? D 0:00 grep --quiet on-line /proc/acpi/ac_adapter/AC0/state Kenel 2.6.22.* won't even boot, no matter what - it'd hangs aftre printing a few 'ACPI ..." lines. Sorry no screenshots handy - will try to make them if required. acpi_serialize is set; I've tried many acpi_* bootparam combination without success.
Can you tell us whether this occurs when the tuxonice patches are not applied? Thanks.
Sure. I'm going to compile vanilla kernel and install+reboot as soon as those nb are available.
Will you please attach the output of acpidump? thanks.
Created attachment 16311 [details] acpidump -b of notebook Gericom 255/259
For the Toshiba, pls see #7214: http://bugzilla.kernel.org/attachment.cgi?id=9122&action=view if DSDT is enough (nb unavail right now)
(In reply to comment #1) > Can you tell us whether this occurs when the tuxonice patches > are not applied? yes. Checked both vanilla .25.4 and .24.7: same result. I'm not even able to power off the pc, since laptop-mode hangs (while checking for ac/battery, of course). This round could check the Gericom nb only, will try the Toshiba as well when possible. Didn't check .22.19 this time, since I didn't have a camera handy. thanks ====== FYI: RAM::487 MB BIOS::V=American Megatrends Inc. v=080010 R=Date: 11/17/2004 CPU::Intel(R) Celeron(R) M processor 1400MHz, 175.000 MHz Chassis::M=Gericom Processor::M=Intel v=Intel(R) Celeron(R) M processor 1400MHz System::M=Gericom P=255/259 Series PCI::CardBus bridge: O2 Micro, Inc. OZ711M1/MC1 4-in-1 MemoryCardBus Controller (rev 20) PCI::Display controller: Intel Corporation 82852/855GM Integrated Graphics Device (rev 02) PCI::Ethernet controller: Atheros Communications, Inc. AR5212 802.11abg NIC (rev 01) PCI::Ethernet controller: Realtek Semiconductor Co., Ltd. RTL-8139/8139C/8139C+ (rev 10) PCI::FireWire (IEEE 1394): Texas Instruments TSB43AB22/A IEEE-1394a-2000 Controller (PHY/Link) PCI::Host bridge: Intel Corporation 82852/82855 GM/GME/PM/GMV Processor to I/O Controller (rev 02) PCI::IDE interface: Intel Corporation 82801DBM (ICH4-M) IDE Controller (rev 03) PCI::ISA bridge: Intel Corporation 82801DBM (ICH4-M) LPC Interface Bridge (rev 03) PCI::Modem: Intel Corporation 82801DB/DBL/DBM (ICH4/ICH4-L/ICH4-M) AC'97 Modem Controller (rev 03) PCI::Multimedia audio controller: Intel Corporation 82801DB/DBL/DBM (ICH4/ICH4-L/ICH4-M) AC'97 Audio Controller (rev 03) PCI::PCI bridge: Intel Corporation 82801 Mobile PCI Bridge (rev 83) PCI::SMBus: Intel Corporation 82801DB/DBL/DBM (ICH4/ICH4-L/ICH4-M) SMBus Controller (rev 03) PCI::System peripheral: Intel Corporation 82852/82855 GM/GME/PM/GMV Processor to I/O Controller (rev 02) PCI::System peripheral: O2 Micro, Inc. OZ711Mx 4-in-1 MemoryCardBus Accelerator PCI::USB Controller: Intel Corporation 82801DB/DBL/DBM (ICH4/ICH4-L/ICH4-M) USB UHCI Controller #1 (rev 03) PCI::USB Controller: Intel Corporation 82801DB/DBL/DBM (ICH4/ICH4-L/ICH4-M) USB UHCI Controller #2 (rev 03) PCI::USB Controller: Intel Corporation 82801DB/DBL/DBM (ICH4/ICH4-L/ICH4-M) USB UHCI Controller #3 (rev 03) PCI::USB Controller: Intel Corporation 82801DB/DBM (ICH4/ICH4-M) USB2 EHCI Controller (rev 03) PCI::VGA compatible controller: Intel Corporation 82852/855GM Integrated Graphics Device (rev 02)
Please re-attach the acpidump w/o any parameter.
Created attachment 16385 [details] full acpidump of Gericom nb ok, this is the vanila acpidump of that notebook.
attachment http://bugzilla.kernel.org/attachment.cgi?id=16419&action=view to #10811 has some more infos on this same notebook. HTH
so 2.6.18 was the latest version tested that worked, 2.6.22 doesn't boot, and 2.6.24,5 hang a process: grep --quiet on-line /proc/acpi/ac_adapter/AC0/state presumably calling... acpi_evaluate_integer(ac->device->handle, "_PSR", NULL, &ac->state); does 2.6.19, 2.6.20, or 2.6.21 work? what do you see if you load the "ac" module, but do not load the "battery" module?
Iirc it worked up to .19.7; never used .20.x, .21.x due to boot troubles. Iirc .2[01].x where rather problematic on other machines as well. I'll check the 'ac' stuff (on 2.6.2[45].x) asap.
> Iirc it worked up to .19.7; never used .20.x, .21.x due to boot troubles. > Iirc .2[01].x where rather problematic on other machines as well. I don't see that you reported any problems a year ago when 2.6.20 and 2.6.21 came out, did I miss something?
(In reply to comment #12) ... > I don't see that you reported any problems a year ago when > 2.6.20 and 2.6.21 came out, did I miss something? probably no; likely it was just me lacking mostly time to fiddle with cconfig and bootparams and eventually file a report.
Created attachment 16503 [details] debug patch to collect interpreter mutex info Seems that interpreter mutex is acquired by some process that never release it. Please apply this debug patch to linux-2.6.25 to collect interpreter mutex info. 1. apply the patch, build and boot 2. grep --quiet on-line /proc/acpi/ac_adapter/AC0/state 3. wait until the bug shows up 4. cat /proc/debug_mutex > mutex.log 5. dmesg > dmesg.log Please attach mutex.log and dmesg.log
Created attachment 16563 [details] dmesg + mutex.log as requested See also dmesg with full acpi debug on modprobe ac, battery. And (liekly mostly but last part useless) kernel log with full acpi debug till a loop read on BAT0/* hangs. that was with 2.6.25.6 (+ tuxonice3-rc7)
> what do you see if you load the "ac" module, > but do not load the "battery" module? usual 1 line ACPI... log - but see attachment to #15 for log with full acpi_debug. with only ac and reading /state in 1s loop, got no hangs after ~1h. with battery only, reader got stuck after a few mins. See attachment to next # (should be #17).
Created attachment 16566 [details] last klog till reader for BAT0/* got stuck in D state; no ac module, only battery loaded kernel is as in #15: 2.6.25.6 with mutex debug patch. last lines from dmesg: DEBUG: acpi_dump_stack ---------------------- Pid: 4619, comm: hald [<c0262a2e>] ? acpi_dump_mutex+0x0/0x95 [<c01b8f19>] proc_file_read+0x1c9/0x280 [<c01b8d50>] ? proc_file_read+0x0/0x280 [<c01b411c>] proc_reg_read+0x6c/0xc0 [<c0180084>] vfs_read+0x94/0x130 [<c01b40b0>] ? proc_reg_read+0x0/0xc0 [<c01804bd>] sys_read+0x3d/0x70 [<c0103fee>] sysenter_past_esp+0x5f/0x85 ======================= and debug_mutex was: ACPI Interpreter Mutex acquired by: hald, use count: 5920
... funny, now closing shop, I see: 17444 ? S 0:00 /bin/sh /etc/rc0.d/K20laptop-mode stop 17446 ? S 0:00 /bin/bash /usr/sbin/laptop_mode stop init 17456 ? S 0:00 /bin/bash /usr/sbin/laptop_mode stop init 17457 ? D 0:00 cat /proc/acpi/button/lid/LID/state and it hangs there, reading LID (as well as BAT0, of course).
> and debug_mutex was: > > ACPI Interpreter Mutex acquired by: hald, use count: 5920 > Please "killall hald" and then re-test.
ok, works ... till somthing happens. Tested twice, pattern seems clear: logger -t acpi.debug -p kern.info ===ACPI DEBUG START=== while true;do cat /proc/acpi/battery/BAT0/* &>/dev/null & sleep 2; ps $! && break echo -n . done logger -t acpi.debug -p kern.info ===ACPI DEBUG END=== without hald, it goes on for hours, but during night only: # grep === /var/log/socklog/main/current user.info: Jun 26 01:20:01 acpi.debug: ===ACPI DEBUG START=== user.info: Jun 26 08:39:44 acpi.debug: ===ACPI DEBUG END=== that's typically the time the user starts his's wmaker session, then a wmapp starts to poll acpi: # cat mutex_no-hald.log ACPI Interpreter Mutex acquired by: wmacpi, use count: 220414 and in dmesg: DEBUG: acpi_dump_stack ---------------------- Pid: 12015, comm: wmacpi [<c0262a2e>] ? acpi_dump_mutex+0x0/0x95 [<c01b8f19>] proc_file_read+0x1c9/0x280 [<c01b8d50>] ? proc_file_read+0x0/0x280 [<c01b411c>] proc_reg_read+0x6c/0xc0 [<c0180084>] vfs_read+0x94/0x130 [<c01b40b0>] ? proc_reg_read+0x0/0xc0 [<c01804bd>] sys_read+0x3d/0x70 [<c0103fee>] sysenter_past_esp+0x5f/0x85 ======================= and from then on my process if locked solid: 12020 pts/1 D 0:00 cat /proc/acpi/battery/BAT0/alarm ... that's 2.6.25.6. Likely, without hald, during that test time acpi is read by my 'cat' only, since the nb is on ac and nothing bothers to poll the battery level, nor the ac state, since all is expectd to react on a acpid event. But wmacpi polls ac/battery 3 times/min, clashes with 'cat' and one or the other gets stuck there.
update: 2.6.25.11: same problem but seems fixed in 2.6.26: it's running since ~1h with several processes concurrently reading bat/ac status/info, on both the Gericom's and Toshiba's notebooks, with no lockup.
paolo, Would you please do more test? I'll close this bug if you can confirm it's really fixed in 2.6.26. Thanks.
yep, going to upgrade to 2.6.26.1 today or tomorrow, if all goes well users will stress the nb with normal day work till EOW; then I'll report the results.
sigh, you're right to be prudent: 2.6.26.1 WFM on the Toshiba (~20h, at least 4 concurrent reading processes), but hangs immediately on the Gericom - here's dmesg dump (debug_mutex patch applied): DEBUG: acpi_dump_stack ---------------------- Pid: 5389, comm: cat [<c0268e5a>] ? acpi_dump_mutex+0x0/0x95 [<c01bbd43>] proc_file_read+0x103/0x280 [<c01bbc40>] ? proc_file_read+0x0/0x280 [<c01b6e9c>] proc_reg_read+0x6c/0xc0 [<c0181b04>] vfs_read+0x94/0x130 [<c01b6e30>] ? proc_reg_read+0x0/0xc0 [<c0181f3d>] sys_read+0x3d/0x70 [<c0103059>] sysenter_past_esp+0x6a/0x91 ======================= root@pavi:~# cat /proc/debug_mutex ACPI Interpreter Mutex acquired by: cat, use count: 5883
With 2.6.26.3 and 2 processes 'cat /proc/.../state', I get cat hung in 'D' state immediately on _both_ notebooks - so somehow .3 got worse than .2 :(
- ... got worse than .2 + ... got worse than .1
Created attachment 17428 [details] custom dsdt patch Does cat hang if you just run cat 2 times? And would you please try the attached custom dsdt patch? You can find info of how to use custome DSDT at, http://www.lesswatts.org/projects/acpi/overridingDSDT.php
If bug still exist for the latest kernel, please apply the debug patch at #14. and then run "cat /proc/<PID>/wchan" <PID> is the pid of task who acquires the mutex, for example DEBUG: acpi_dump_stack ---------------------- Pid: 12015, comm: wmacpi cat /proc/12015/wchan
(In reply to comment #27) > Created an attachment (id=17428) [details] > custom dsdt patch > > Does cat hang if you just run cat 2 times? 2 or more processes trying to read from /state|/info a single process seems to have no trouble - eg it run whole night till user logged in in the morning, hence firing up some acpi applet. > And would you please try the attached custom dsdt patch? ok, I'll try asap.
Hung immediately again (2.6.26.3): ACPI: DSDT 1EFD03F0, 3714 (r1 255II 255II001 1 INTL 2002026) ACPI: Override [DSDT-255II001], this is unsafe: tainting kernel ACPI: Table DSDT replaced by host OS ACPI: DSDT 00000000, 316F (r1 255II 255II001 1 INTL 20060912) ACPI: DSDT override uses original SSDTs unless "acpi_no_auto_ssdt" Table [DSDT](id 0001) - 521 Objects with 49 Devices 144 Methods 17 Regions ACPI: EC: Look up EC in DSDT DEBUG: acpi_dump_stack ---------------------- Pid: 4586, comm: cat [<c0269722>] ? acpi_dump_mutex+0x0/0x95 [<c01bc453>] proc_file_read+0x103/0x280 [<c01bc350>] ? proc_file_read+0x0/0x280 [<c01b75ac>] proc_reg_read+0x6c/0xc0 [<c01821f4>] vfs_read+0x94/0x130 [<c01b7540>] ? proc_reg_read+0x0/0xc0 [<c018262d>] sys_read+0x3d/0x70 [<c0103059>] sysenter_past_esp+0x6a/0x91 ======================= ~# cat /proc/debug_mutex ACPI Interpreter Mutex acquired by: cat, use count: 1162 ~# cat /proc/4586/wchan down_timeout #-------------------------------------------------------- should I try to override SSDT as well? worth trying?
> should I try to override SSDT as well? worth trying? Thanks, not worth trying
Hi, Paolo Will you please try the latest kernel(2.6.27-rc7) and see whether the problem still exists? From the acpidump it seems that the battery info is related with EC. EC internal register will be accessed to get the battery info. Now a bunch of patches about EC are merged to the upstream kernel. Please try it again. Thanks.
Alas, no joy: while true;do cat /proc/acpi/battery/*/* /proc/acpi/ac_adapter/*/state;echo 1.$SECONDS;sleep 1;done & while true;do ca t /proc/acpi/battery/*/* /proc/acpi/ac_adapter/*/state;echo 2.$SECONDS;sleep 1;done # while true;do ps ax|grep ' [D.] ';sleep 3;done 4931 pts/1 D 0:00 cat /proc/acpi/battery/BAT0/alarm /proc/acpi/battery/BAT0/info /proc/acpi/battery/BAT0/stat e /proc/acpi/ac_adapter/AC0/state 4931 pts/1 D 0:00 cat /proc/acpi/battery/BAT0/alarm /proc/acpi/battery/BAT0/info /proc/acpi/battery/BAT0/stat e /proc/acpi/ac_adapter/AC0/state ... 0 seconds after issueing command above.
Hi, paolo Would you please check if this bug still exists with latest kernel 2.6.29? Thanks.
close this bug as there is no response from the bug reporter. please re-open it if the problem still exists in the latest kernel.