Created attachment 21303 [details] dmesg of 2.6.30-rc2 with ff69.. reverted Hardware: IBM Thinkpad X40, 1.2Ghz Pentium M 1.5GB ram Software: Debian unstable 32bit ff69f2bba67bd45514923aaedbf40fe351787c59 caused a regression when booting-up on my setup. See http://bugzilla.kernel.org/show_bug.cgi?id=13087 for reference. This issue has been fixed with f461ddea0af8b98e2b7940eba9c693b0ee44d64a. Unfortunately ff69.. also caused a resume hang on my setup, which is _not_ yet fixed. Problem: 2.6.30-rc2 hangs after in the second suspend/resume cycle when resuming. The fan and harddisk spin up and the lcd light switches on (and shows the blinking cursor) but then the machine hangs. Only holding down the power button for 4 secs helps. The suspend indicator light does not switch to the blinking mode like it does when resuming normally (before it switches off completely). Reverting ff69.. on top of 2.6.30-rc2 fixes the issue. Latest kernel I tried is v2.6.30-rc5-96-ga4d7749. But then reverting f461.. and ff69.. didn't fully fixed the problem: I could only resume two times (instead of only once) before the machine hung when resuming. Kernels before ff69 work flawless (at least the ones I've tested, and I'm updating -linus from git fairly often). I'll add the dmesg one suspend-resume cylce of 2.6.30-rc2 with ff69 reverted.
Please test this patch: http://patchwork.kernel.org/patch/22499/ (ACPI: suspend: restore BM_RLD on resume)
Hi, Daniel Will you please try the patch in comment #1 and see whether the issue still exists? If it still exists, Will you please enable "CONFIG_PM_DEBUG" in kernel configuration and do the following test? a. echo core > /sys/power/pm_test b. echo mem > /sys/power/state It will be great if you can do several suspend/resume cycles . Thanks.
On Tue, May 12, 2009 at 01:51:03AM +0000, bugzilla-daemon@bugzilla.kernel.org wrote: > --- Comment #1 from Len Brown <len.brown@intel.com> 2009-05-12 01:51:02 --- > Please test this patch: > http://patchwork.kernel.org/patch/22499/ > (ACPI: suspend: restore BM_RLD on resume) I've done about a dozen suspend/resume cycles with this patch applied. Didn't hang one single time on works rock solid. Thanks for the speedy fix, Daniel [I'll close the bug report as soon as your patch hits mainline and I've retested]
Handled-By : Len Brown <len.brown@intel.com> Patch : http://patchwork.kernel.org/patch/22499/
*** This bug has been marked as a duplicate of bug 13032 ***
Looks like my problem is not really fixed yet and the patch only papered over the real issue. I've tested several kernels since v2.6.30-rc4-289-g815ab0f (this is the patch I've tested first as merged in mainline) and all where broken. I'll be testing now the suggestion in comment #2.
Created attachment 21438 [details] dmesg of 2.6.30-rc6-lockdep-00043-g22ef37e The attached dmesg contains 5 runs of # echo core > /sys/power/pm_state # echo mem > /sys/power/state The machine resumed always perfectly.
I've also tested the other options for /sys/power/pm_test (processors platform devices freezer). They all seem to work for multiple consequtiv runs on 2.6.30-rc6-lockdep-00043-g22ef37e.
On Tuesday 26 May 2009, Daniel Vetter wrote: > On Sun, May 24, 2009 at 09:11:52PM +0200, Rafael J. Wysocki wrote: > > This message has been generated automatically as a part of a report > > of recent regressions. > > > > The following bug entry is on the current list of known regressions > > from 2.6.29. Please verify if it still should be listed and let me know > > (either way). > I've just tested 2.6.30-rc7 an the problem still exists. > > > Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13277 > > Subject : Thinkpad X40 no longer resumes reliable since > ff69f2bba67bd45514923aaedbf40fe351787c59 > > Submitter : Daniel Vetter <daniel@ffwll.ch> > > Date : 2009-05-11 10:08 (14 days old) > > Handled-By : Len Brown <len.brown@intel.com> > > Patch : http://patchwork.kernel.org/patch/22499/ > This is not correct. This patch was merged, but does not fix the problem. > It worked while I tested it seperately, but obviously it only papered over > the real issue in this very specific kernel.
Ignore-Patch : http://patchwork.kernel.org/patch/22499/
Thanks for the additional testing, Daniel. What does this command show?: grep . /sys/devices/system/clocksource/clocksource0/* What happens if you boot the system with "clocksource=acpi_pm" (try all of the alternatives shown in available_clocksource)
On Wed, May 27, 2009 at 02:06:36AM +0000, bugzilla-daemon@bugzilla.kernel.org wrote: > --- Comment #11 from Len Brown <len.brown@intel.com> 2009-05-27 02:06:35 --- > Thanks for the additional testing, Daniel. > > What does this command show?: > > grep . /sys/devices/system/clocksource/clocksource0/* /sys/devices/system/clocksource/clocksource0/available_clocksource:hpet acpi_pm jiffies tsc /sys/devices/system/clocksource/clocksource0/current_clocksource:hpet > What happens if you boot the system with > "clocksource=acpi_pm" (try all of the alternatives shown > in available_clocksource) hpet - default, crashes in second resum acpi_pm - crashes like hpet jiffies - works (I've done about ten suspend-resume cycles) tsc - doesn't work, kernel switches back to hpet because the tsc is unstable I've also tried echo jiffies > current_clocksource before suspending (normal setup, i.e. clocksource=hept) as a work-around. But that does not work, it hangs when I try to suspend the first time. Furthermore suspend-to-disk has the same issue: In the second resume the kernel hangs right before it switches the suspend-led to blinking mode (indicating an ongoing suspend/resume operation). Like with suspend-to-ram I only see the console cursor frozen in the upper-left corner. Sysrq also doesn't work. -Daniel
Created attachment 21622 [details] dmesg of 2.6.30-rc7-lockdep-00082-g07f4f3e freezing after resume Just now my system froze a few seconds after resuming. This was with clocksource=jiffies. Luckily SysRq still worked so I could captured the dmesg (relevant parts attached). Might this be related to the problem?
On Monday 08 June 2009, Daniel Vetter wrote: > On Sun, Jun 07, 2009 at 11:52:49AM +0200, Rafael J. Wysocki wrote: > > This message has been generated automatically as a part of a report > > of recent regressions. > > > > The following bug entry is on the current list of known regressions > > from 2.6.29. Please verify if it still should be listed and let me know > > (either way). > > > > > > Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13277 > > Subject : 2.6.30 regression - unreliable resume - bisected - > Thinkpad X40 > > Submitter : Daniel Vetter <daniel@ffwll.ch> > > Date : 2009-05-11 10:08 (28 days old) > > Handled-By : Len Brown <len.brown@intel.com> > > I've just tested v2.6.30-rc8-34-g81ee1ba and this version has the exact > same problem. I've also tested a few versions in the rc7-rc8 timeframe. > Something must have slightly changed because with these kernels I've > tested, the resume-hang was way less likely. I've tried tracking down the > changeset that introduced this new behaviour, but it was to unreliable to > classify for a bisect run. But with v2.6.30-rc8-34-g81ee1ba the kernel > hangs again reliably in the second resume.
Can you reproduce the failure when you boot with "highres=off"? How about with "idle=poll"?
WARNING: at kernel/hrtimer.c:625 hres_timers_resume+0x2c/0x42() ... hres_timers_resume() called with IRQs enabled! Do you see this every time it hangs, and does it hang every time you see this?
> --- Comment #16 from Len Brown <len.brown@intel.com> 2009-06-09 02:04:59 --- > WARNING: at kernel/hrtimer.c:625 hres_timers_resume+0x2c/0x42() > ... > hres_timers_resume() called with IRQs enabled! > > Do you see this every time it hangs, > and does it hang every time you see this? If it hangs, I don't see anything at all (it hangs before the console comes up). But when I apply the clocksource=jiffies workaround I sometimes see this. Just this morning the system behaved strangely after a resume (clocksource=jiffies workaround applied) - X hung up. And there was the same backtrace in the logs. So my gut feeling tells me this backtrace might be related to the problem I'm seeing. But I don't have further evidence. -Daniel
[Moving some e-mail discussion back on the bug report -Daniel] From: "Rafael J. Wysocki" <rjw@sisk.pl> > > > I've just tested v2.6.30-rc8-34-g81ee1ba and this version has the exact > > > same problem. I've also tested a few versions in the rc7-rc8 timeframe. > > > Something must have slightly changed because with these kernels I've > > > tested, the resume-hang was way less likely. I've tried tracking down the > > > changeset that introduced this new behaviour, but it was to unreliable to > > > classify for a bisect run. But with v2.6.30-rc8-34-g81ee1ba the kernel > > > hangs again reliably in the second resume. > > > > Well, thanks for the update. > > > > I wonder what we've changed recently that it makes the problem more > > reproducible for you. Puzzled. > > Actually, it was for a few kernel revisions _less_ reproducible (only hung > after about a dozen suspend cycles). But that's nothing special: Since the > regression was introduced there were already a few other kernels that > almost never crashed. One of them was the reason I've preliminarily > declared the bug fixed. But I was never able to pinpoint an exact cause > for the change in behaviour. Most likely we have a very tight race window > and when instructions get moved around a little bit due to totally > unrelated changes, chances are massively lower that the kernel hangs (e.g. > because of delay due to a cache-miss). At least that's the only consistent > explanation I could come up with. And I always look at the patches/try to > bisect when something changes. > > -Daniel > > PS: It's also possible that this is not really a regression, but the bug > was just uncovered. I vaguely remember similar resume problems with this > exact machine from a few years ago. But I can't remember any details nor > which kernels might have been affected. Hmm. Can you please try to comment out suspend_device_irqs() and resume_device_irqs() in drivers/base/power/main.c ? Rafael
> --- Comment #15 from Len Brown <len.brown@intel.com> 2009-06-09 02:03:49 --- > Can you reproduce the failure when you boot with "highres=off"? > How about with "idle=poll"? Base kernel was v2.6.30-rc8-34-g81ee1ba plus an unrelated revert. Results: base: hung on 3rd resume base + "highres=off": survived 10 resume-to-mem cycles base + "idle=poll": hung on 2nd resume I'm now using "highres=off" as an workaround to test some more. One recent kernel (without any workaround) also survived 10 resume cycles but then crashed after a few days of day-to-day use. I'll report back how this one fares after a few days of use. -Daniel
> Hmm. Can you please try to comment out suspend_device_irqs() > and resume_device_irqs() in drivers/base/power/main.c ? > > Rafael I've tested this against the same base kernel as the previous tests. It hung on the 4th resume. -Daniel
> --- Comment #19 from Daniel Vetter <daniel@ffwll.ch> 2009-06-09 09:45:33 --- > I'm now using "highres=off" as an workaround to test some more. One recent > kernel (without any workaround) also survived 10 resume cycles but then > crashed after a few days of day-to-day use. I'll report back how this one > fares after a few days of use. I've now been using this workaround for a few days with suspend-resume cycles under various conditions. The system never hung on resume, so this really prevents the bug. -Daniel
> --- Comment #21 from Daniel Vetter <daniel@ffwll.ch> 2009-06-15 07:41:26 --- > > --- Comment #19 from Daniel Vetter <daniel@ffwll.ch> 2009-06-09 09:45:33 > --- > > I'm now using "highres=off" as an workaround to test some more. One recent > > kernel (without any workaround) also survived 10 resume cycles but then > > crashed after a few days of day-to-day use. I'll report back how this one > > fares after a few days of use. > I've now been using this workaround for a few days with suspend-resume > cycles under various conditions. The system never hung on resume, so this > really prevents the bug. I stand corrected: On 2.6.30-03984-g45e3e19, the highres=off workaround does not work anymore. Futher my laptop now hangs on the _first_ resume and no longer only on the second or a later resume cycle. -Daniel
On Tuesday 07 July 2009, Daniel Vetter wrote: > On Tue, Jul 07, 2009 at 02:00:35AM +0200, Rafael J. Wysocki wrote: > > This message has been generated automatically as a part of a report > > of regressions introduced between 2.6.29 and 2.6.30. > > > > The following bug entry is on the current list of known regressions > > introduced between 2.6.29 and 2.6.30. Please verify if it still should > > be listed and let me know (either way). > > > > > > Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13277 > > Subject : 2.6.30 regression - hang on 2nd resume - bisected - > Thinkpad X40 > > Submitter : Daniel Vetter <daniel@ffwll.ch> > > Date : 2009-05-11 10:08 (57 days old) > > Handled-By : Len Brown <len.brown@intel.com> > > I've now put two different recent kernel versions (2.6.31-rc1-00268 and > 2.6.31-rc2) through a few days of real-world testing. The machine _never_ > hung on resume. So by whatever means I don't know but I'd say the problem's > fixed and we can close this report.