Latest working kernel version: 2.6.23
Earliest failing kernel version: 2.6.24
Hardware Environment: notebook Acer TravelMate 4502WLMi, Pentium M 1.6GHz, ATI
Mobility Radeon 9700, two monitors (panel+LCD)
Software Environment: N/A
Reading of /proc/acpi/battery/BAT0/state ends in an endless loop. The first symptom was that the hald didn't want to finish startup and I figured out that the problem is in reading the battery status.
I patched (attached) sbs.c and sbshc.c to report method entry/return and also locking and I found that the execution ends in wait_transaction_complete (sbshc.c) in an endless wait_event_timeout - really strange.
Created attachment 15809 [details]
Patch for drivers/acpi (sbs.c, sbshc.c) to report method entry/return and locking
Created attachment 15810 [details]
Configuration used for 2.6.25
Created attachment 15811 [details]
Created attachment 15812 [details]
Created attachment 15813 [details]
Problem is here: /var/log/messages
Note: I again tried 2.6.24 (once) and it seems it boots-up well, so the regression is probably only in 2.6.25. I do not remember completely why I didn't use 2.6.24, but it was maybe because the battery status didn't get updated correctly (and also because of bug #9910). I will have a look at the latest 2.6.24 and provide some details at the end of the week. Thanks.
So the problem is in 2.6.24 already. I saw a timeout in reading from wait_transaction_complete after 4 minutes and 55 seconds. There are two questions:
1. Why the operation timed-out?
Some problem in EC code? I can provide more logs, but which kernel parameter enables EC logging?
2. Why it took so long (295 seconds instead of 1 second)?
Maybe some milliseconds->jiffies translation problem?
From the log in comment #5 it seems that OSPM will always call the function of smb_check_done after the lockup. But there is no wakeup event for the SBS HC controller. It looks inconsistent.
>SBS HC: smb_check_done, entry
>Apr 19 14:31:08 oldium ACPI: SBS HC: smb_check_done, return
Will you please try the latest kernel and see whether the problem can be reproduced? Please attach the output of dmesg.
Note: please apply the patch in comment #1.
I will try it with latest rc (2.6.26-rc1) in the evening. Logs from 2.6.25 doesn't contain the wake-up, because I didn't wait this long to get it (it might come). Anyway, I will wait for wake-up with 2.6.26-rc1.
Created attachment 16030 [details]
I started `watch -n 5 "cat /proc/acpi/battery/BAT0/state"` to see some output. I also unplugged the power cable (smbus_alarm at 23:17:44) and plugged it back (something around 23:20:09 or 23:20:24, I forgot to check the clock).
Just ask for more input, I will help as I can.
There is no long wait for timeout as can be seen in previous kernels. It looks like the battery status reading works. Anyway, there are still some "timed-out" results.
I will check tomorrow if the battery status changes in time (battery discharging), because I've seen the same "remaining capacity" value over this test - but the test was not long and there was no CPU load. I remember some problems with value stalling in 2.6.24, so I will check this tomorrow.
I've tried 2.6.26-rc2 (without my patch): no updates of /proc/acpi/battery/BAT0/state for more than 1 hour running on battery (the file had the same content). Do you need logs from 2.6.26-rc2? I think the problem should be visible (hopefully) in 2.6.26-rc1 logs already.
Please say if you need more logs from me, possibly with more logging options.
I've tried 2.6.26-rc6 now and the timeout doesn't expire again. See attached logs.
Created attachment 16476 [details]
Created attachment 16477 [details]
thanks for report
I tried 2.6.26-rc8, but no change. Is the patch you mention in PATCH_ALREADY_AVAILABLE resolution in the kernel already? May I try the patch myself?
I misunderstood your comment #14. I read it that you don't have problem with -rc6...
Created attachment 16644 [details]
Serialize SBSHC alarms
Please check if this patch helps.
Sorry for the late response, but I'm a little bit bussy. The patch makes no difference, `cat /proc/acpi/battery/BAT0/state` doesn't return (I waited two minutes).
Created attachment 17168 [details]
try the debug patch
Will you please try the debug patch and see whether the problem still exists?
Created attachment 17171 [details]
No success. The phenomenon is that the wait_transaction_complete (sbshc.c) uses wait_event_timeout with timeout 1000 [ms] that never expires. I don't know if there could be some conflict with something that smb_check_done does (call to smb_hc_read -> ec_read).
The attached log shows the problem in 2.6.26 (your patch plus my patch). The reading succeded once (`cat /proc/acpi/battery/BAT0/state`), but the second try failed - timeout didn't expire. Just to note: my former run succeded four times.
Created attachment 17237 [details]
try the debug patch
Will you please try the debug patch?
In this debug patch even the timeout expires, OS will try to check the SMbus status again to avoid the bogus timeout.
Of course the patch in comment #21 and your patch are still required.
After the test, please attach the output of dmesg.
(In reply to comment #23)
> In this debug patch even the timeout expires, OS will try to check the SMbus
> status again to avoid the bogus timeout.
The problem is that the timeout NEVER expires (at least it looks like it never expires, maybe it will in an hour).
I think the core of the problem is hidden in the wait_event_timeout macro (include/linux/wait.h). It tries to execute a check function and if it fails, it tries to sleep for a "timeout" time. It can be woken up, in this case the check function is called again and if it fails, the sleep _continues_ where it stopped. The smb_check_done takes some time to execute, but this time is not counted in this "timeout". And it looks like the sleep is interrupted quite fast, so the rest of the timeout gets lower only slightly. I didn't verify/measure this hypothesis, but I rewrote the wait_transaction_complete function to really take at most "timeout" time and it works.
Question is why the sleep is interrupted so often, I didn't investigate this.
I will attach patch and (my) debug output.
Created attachment 17275 [details]
Patch for wait_transaction_complete
This patch workarounds the problem with "endless" wait_transaction_complete execution in sbshc.c.
Created attachment 17276 [details]
[UPDATE] Patch for drivers/acpi (sbs.c, sbshc.c) to report method entry/return and locking
Created attachment 17277 [details]
2.6.26: /var/log/messages with workaround
The /var/log/messages file still contains some "timed-out" lines, but it is not a big problem now, because the possible lockup doesn't take more than 1 second (plus minus).
Will you please confirm whether the lockup issue disappears after applying the patch in comment #25? If the problem disappears, please send your patch to ACPI mail list and more people are involved with the discussion.
The patch in comment #25 is also reasonable, in which the polling mode is used and bogus smbus timeout is also avoided.
Maybe what you said in comment #24 is right. But in fact the weired is that smb_check_done is called so many times. When the function of wait_event_timeout is called, there are two possible wakeup events. One is timeout and the other is smbus alarm event. Only after the process is waked will the function of smb_check_done be called.
Thanks, I will write to the list after my vacation.
fixed by commit 266feefeb9ea2d846ac82eb6db1a54b230364ba4
I'm sorry, but I have to reopen the bug. The commited patch doesn't fix anything.
My problem is `wait_event_timeout` method itself. Moving discussion to ACPI mailing list.
How to mark this bug that the code fix (patch) is available in bug 9998? The patch from there is sufficient for me.
Easy, mark as duplicate of that bug.
Thanks for your report and testing.
*** This bug has been marked as a duplicate of bug 9998 ***