Bug 13306
Summary: | hibernate slow on _second_ run | ||
---|---|---|---|
Product: | ACPI | Reporter: | Johannes Berg (johannes) |
Component: | Power-Battery | Assignee: | Zhang Rui (rui.zhang) |
Status: | CLOSED UNREPRODUCIBLE | ||
Severity: | normal | CC: | acpi-bugzilla, akpm, astarikovskiy, lenb, rjw, rui.zhang |
Priority: | P1 | ||
Hardware: | All | ||
OS: | Linux | ||
Kernel Version: | 2.6.30-rc5-wl-23297-gbf2c6a3-dirty | Subsystem: | |
Regression: | Yes | Bisected commit-id: | |
Bug Depends on: | |||
Bug Blocks: | 7216, 13070 | ||
Attachments: |
acpi dump
dmesg containing boot + multiple hibernates debug patch debug patch log |
Description
Johannes Berg
2009-05-14 09:34:32 UTC
What was the last kernel where it didn't happen? I'm fairly sure it started happening at -rc1 or -rc2 but haven't double-checked now. I just remember seeing it, swearing, and then not seeing it etc. until I realised it had to do with being the second hibernate. (In reply to comment #3) > This: http://marc.info/?l=linux-kernel&m=124219402109245&w=4 ? It's unclear what you're asking Johannes here? That email thread looks pretty different from the problem which Johannes is seeing? Sorry. There's a patch in the email that I'd recommend to try. Still, it's already in the Linus' tree, so I'd recommend trying the top of the Linus' tree, actually. :-) Johannes, does the latest upstream still have the issue? Sorry, it took me a while to find time to reproduce/test -- the issue seems to be fixed in current upstream (I'm currently at 363383277081ce831642b72df40932ee05ce40a2) Good, thanks for the verification. Unfortunately, the bug has reappeared with the latest tree -- it seems I need to actually work with the machine between hibernations in order to make it not slow. Note the timing in these resume runs that I still had in my dmesg buffer now, on 2.6.30-rc6-wl-24018-g8020553-dirty (which is wireless-testing/master merged with Linus's master), there's always a ~20 second delay there: [46821.976003] ACPI: Waking up from system sleep state S4 [46822.521462] HDA Intel 0000:00:08.0: restoring config space at offset 0x1 (was 0xb00006, writing 0xb00002) [46822.529406] forcedeth 0000:00:0a.0: restoring config space at offset 0x1 (was 0xb00003, writing 0xb00007) [46822.537330] ahci 0000:00:0b.0: restoring config space at offset 0x1 (was 0xb00403, writing 0xb00407) [46843.140620] pci 0000:00:03.2: PME# disabled [76097.115916] ACPI: Waking up from system sleep state S4 [76097.641471] HDA Intel 0000:00:08.0: restoring config space at offset 0x1 (was 0xb00006, writing 0xb00002) [76097.649409] forcedeth 0000:00:0a.0: restoring config space at offset 0x1 (was 0xb00003, writing 0xb00007) [76097.657325] ahci 0000:00:0b.0: restoring config space at offset 0x1 (was 0xb00403, writing 0xb00407) [76117.200757] pci 0000:00:03.2: PME# disabled [85368.496002] ACPI: Waking up from system sleep state S4 [85369.061463] HDA Intel 0000:00:08.0: restoring config space at offset 0x1 (was 0xb00006, writing 0xb00002) [85369.069382] pci 0000:00:09.0: restoring config space at offset 0x7 (was 0x628000f0, writing 0x228000f0) [85369.077592] forcedeth 0000:00:0a.0: restoring config space at offset 0x1 (was 0xb00003, writing 0xb00007) [85369.086071] ahci 0000:00:0b.0: restoring config space at offset 0x1 (was 0xb00403, writing 0xb00407) [85389.570622] pci 0000:00:03.2: PME# disabled On Tuesday 07 July 2009, you wrote:
> On Tue, 2009-07-07 at 02:00 +0200, Rafael J. Wysocki wrote:
> > This message has been generated automatically as a part of a report
> > of regressions introduced between 2.6.29 and 2.6.30.
> >
> > The following bug entry is on the current list of known regressions
> > introduced between 2.6.29 and 2.6.30. Please verify if it still should
> > be listed and let me know (either way).
> >
> >
> > Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13306
> > Subject : hibernate slow on _second_ run
> > Submitter : Johannes Berg <johannes@sipsolutions.net>
> > Date : 2009-05-14 09:34 (54 days old)
>
> Still exists in 31-rc2.
On Monday 27 July 2009, Johannes Berg wrote:
> On Sun, 2009-07-26 at 22:45 +0200, Rafael J. Wysocki wrote:
> > This message has been generated automatically as a part of a report
> > of regressions introduced between 2.6.29 and 2.6.30.
> >
> > The following bug entry is on the current list of known regressions
> > introduced between 2.6.29 and 2.6.30. Please verify if it still should
> > be listed and let me know (either way).
>
> The problem persists in 31-rc3.
On Monday 27 July 2009, Johannes Berg wrote:
> On Sun, 2009-07-26 at 22:45 +0200, Rafael J. Wysocki wrote:
> > This message has been generated automatically as a part of a report
> > of regressions introduced between 2.6.29 and 2.6.30.
> >
> > The following bug entry is on the current list of known regressions
> > introduced between 2.6.29 and 2.6.30. Please verify if it still should
> > be listed and let me know (either way).
>
> The problem persists in 31-rc3.
Thanks for the update.
Unfortunately, I have no idea about what's wrong here.
Generally, there are three possible areas the problem may be related to:
(1) interrupts,
(2) CPU hotplug,
(3) ordering of the platform callbacks,
but I can't say much more than that.
Still persists in -rc4, but using "shutdown" mode works around it. That indicates a problem with ACPI AML. Does suspend to RAM work correctly at the same time? I can't STR because I have an nvidia card. Ouch. OK, can you please try STD with acpi_sleep=old_ordering, then? Interestingly, now after reproducing the problem with method=platform, method=shutdown also incurs the delay, where before I hibernated many times with method=shutdown w/o the problem -- it seems that once the problem is there it doesn't go away even with the shutdown method, but using shutdown method stops it from happening. (In reply to comment #16) > OK, can you please try STD with acpi_sleep=old_ordering, then? I suppose I'll have to reboot for that now? (In reply to comment #16) > OK, can you please try STD with acpi_sleep=old_ordering, then? Doesn't help either. So, I have no ideas except for the 'shutdown' mode workaround. Can you please build the current git kernel with CONFIG_PM_VERBOSE and attach dmesg output from that including one hibernation-resume cycle? Would -rc5 be current enough? Depends. :-) For the information I'd like to collect it would, for testing fixes that went in after -rc5, well, is wouldn't. ;-) [11639.607546] acpi device:36: legacy restore [11639.609099] acpi device:37: legacy restore [11639.610682] battery PNP0C0A:00: legacy restore [11640.002609] Clocksource tsc unstable (delta = -371189971 ns) [11660.130542] acpi LNXTHERM:00: legacy restore [11660.132622] pci 0000:00:00.0: restore [11660.134767] pci 0000:00:00.1: restore Hmm. I wonder if this commit changes anything: http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=74b5820808215f65b70b05a099d6d3c969b82689 No such luck, that doesn't change anything (applied it as a patch to my tree) Hmm, ok, this is confusing -- I'm now seeing the problem even with the shutdown method, with the same message as in comment #23. I think the one time I tried whether or not shutdown was a workaround I had done a single suspend with "platform" and then future ones with "shutdown" which seemed to help at the time. In any case, "shutdown" is also incurring the delay now. Any help appreciated. Oh and the TSC line in the log isn't always there, I don't think it's related. I wonder if you can reproduce it with: # echo devices > /sys/power/pm_test and then # echo disk > /sys/power/state repeated several times in a row. If you can't, then please try with # echo core > /sys/power/pm_test instead of the first command. I can reproduce the delay (at battery legacy restore) after having already hibernated a bunch of times with "devices" in pm_test. I think you meant after a clean reboot though, I'll test that later. With this patch diff --git a/drivers/acpi/battery.c b/drivers/acpi/battery.c index 58b4517..683b80a 100644 --- a/drivers/acpi/battery.c +++ b/drivers/acpi/battery.c @@ -499,6 +499,7 @@ static int acpi_battery_update(struct acpi_battery *battery) { int result, old_present = acpi_battery_present(battery); result = acpi_battery_get_status(battery); + printk(KERN_DEBUG "battery get status %d\n", result); if (result) return result; #ifdef CONFIG_ACPI_SYSFS_POWER @@ -507,19 +508,26 @@ static int acpi_battery_update(struct acpi_battery *battery) battery->update_time = 0; return 0; } + printk(KERN_DEBUG "checked battery present\n"); #endif if (!battery->update_time || old_present != acpi_battery_present(battery)) { + printk(KERN_DEBUG "battery present check again, getting info\n"); result = acpi_battery_get_info(battery); + printk(KERN_DEBUG "battery got info %d\n", result); if (result) return result; acpi_battery_init_alarm(battery); + printk(KERN_DEBUG "inited battery alarm\n"); } + printk("update done\n"); #ifdef CONFIG_ACPI_SYSFS_POWER if (!battery->bat.dev) sysfs_add_battery(battery); #endif - return acpi_battery_get_state(battery); + result = acpi_battery_get_state(battery); + printk(KERN_DEBUG "battery got state %d\n", result); + return result; } /* -------------------------------------------------------------------------- I get [19299.509310] battery PNP0C0A:00: legacy restore [19300.530433] battery get status 0 [19300.532196] checked battery present [19300.534051] battery present check again, getting info [19315.970806] battery got info 0 [19315.972903] inited battery alarm [19315.975057] update done [19320.150618] battery got state 0 so it seems that acpi_battery_get_info() takes ~15s, and acpi_battery_get_state() ~5s. This seems to be related to the embedded controller. Reassigning to ACPI. (In reply to comment #29) > [19299.509310] battery PNP0C0A:00: legacy restore > [19300.530433] battery get status 0 > [19300.532196] checked battery present > [19300.534051] battery present check again, getting info > [19315.970806] battery got info 0 > [19315.972903] inited battery alarm > [19315.975057] update done > [19320.150618] battery got state 0 > > so it seems that acpi_battery_get_info() takes ~15s, and > acpi_battery_get_state() ~5s. acpi_battery_get_info() just executes _BIF, and acpi_battery_get_state() executes _BST, and they both do little else so it seems those invocations cause the delay. please post outputs from dmesg and acpidump. do you see slow read from /proc/acpi/battery/*/info too after restore? does /proc/interrupts change during the read of the above file? I don't have /proc/acpi/battery/, but reading /sys/class/power_supply/BAT0/* is not quite that slow, about 4 seconds. will attach acpidump, but don't have a useful dmesg right now, will attach after my next reboot. The acpi interrupt doesn't change while reading the sysfs files. Created attachment 22836 [details]
acpi dump
Created attachment 23005 [details]
dmesg containing boot + multiple hibernates
Here's the requested dmesg output from boot including I think two hibernates.
Can't seem to be able to reopen this bug. Somebody please do that -- I've provided the info now. Created attachment 23115 [details]
debug patch
please apply this debug patch,
set CONFIG_ACPI_DEBUG
reboot with boot option "log-buf-len=4M"
re-do the test and attach the dmesg output after the bug is reproduced.
Created attachment 23122 [details]
debug patch
refreshed patch attached. please try this one
Created attachment 23170 [details]
log
Unfortunately truncated, but I had no idea it would be _this_ huge. My logbuf was only 2M, but 4M certainly wouldn't have been big enough.
If you tell me how to do the same with tracing instead of acpi, I can do -- but this took like 15 minutes to scroll by on the screen while hibernating so I won't be doing that again.
please run "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" before you do the second hibernation. does the long latency still exist? Note you need to set CONFIG_DETECT_HUNG_TASK and rebuilt your kernel if "/proc/sys/kernel/hung_task_timeout_secs" doesn't exist. Umm, I'm happy to try, but I'm sure that it's unrelated to the problem. The message just comes up because the debugging you asked for makes my system stall for long enough due to display output being slow that it thinks tasks are stuck. Without the debugging it doesn't come up. Hmm hmm. Looks like we may never find out -- it appears that my latest OSX upgrade changed the firmware too and now the ACPI battery reading doesn't stall this way any more. At least I haven't seen the problem since I used OSX a couple of days ago to do some stuff and upgraded it while I was using it... If this was a firmware bug, I don't want to know what it was. :-) Can't reproduce any more, marking so. |