Bug 13781

Summary: System freeze at resume after suspend to RAM
Product: Power Management Reporter: Christian Casteyde (casteyde.christian)
Component: Hibernation/SuspendAssignee: ykzhao (yakui.zhao)
Status: CLOSED CODE_FIX    
Severity: normal CC: rjw, yakui.zhao
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: 2.6.31-rc2 Subsystem:
Regression: Yes Bisected commit-id:
Bug Depends on:    
Bug Blocks: 7216, 13615    
Attachments: dmesg output after boot for 2.6.31-rc3
lspci for 2.6.31-rc3
lsusb for 2.6.31-rc3
syslog errors at resume
Add the suspend/resume flag to aovid smp call in course of cpufreq_suspend/resume
dmesg for 2.6.31+proposed fix after resume

Description Christian Casteyde 2009-07-15 18:42:28 UTC
Created attachment 22360 [details]
dmesg output after boot for 2.6.31-rc3

Since early 2.6.31-rc (rc1 not tested, rc2 and rc3 fail), my Laptop doesn't resume from suspend to RAM anymore. Works perfectly with 2.6.30.1

System : Acer Aspire 1511 LMi
CPU: Athlon 64 3GHz in 64 bits mode
Distro Bluewhite64 12.1 (64 bit native Slackware)
lspci, dmesg and lsusb appended for more info.
Comment 1 Christian Casteyde 2009-07-15 18:43:02 UTC
Created attachment 22361 [details]
lspci for 2.6.31-rc3
Comment 2 Christian Casteyde 2009-07-15 18:43:27 UTC
Created attachment 22362 [details]
lsusb for 2.6.31-rc3
Comment 3 Christian Casteyde 2009-07-15 18:50:44 UTC
Created attachment 22363 [details]
syslog errors at resume

seems to be a lock problem that cause the freeze.
Comment 4 ykzhao 2009-07-21 06:45:32 UTC
Hi, Christian
    From the log in comment #3 it seems that there exists the following warning message in course of hibernation.
    >WARNING: at drivers/base/sys.c:411 sysdev_suspend+0x1f7/0x2b0()
Jul 15 20:13:03 athor kernel: Hardware name: Aspire 1510
Jul 15 20:13:03 athor kernel: Interrupts enabled after cpufreq_suspend+0x0/0x110
    >WARNING: at drivers/base/sys.c:475 sysdev_resume+0x9f/0xb0()
Jul 15 20:13:03 athor kernel: Hardware name: Aspire 1510
Jul 15 20:13:03 athor kernel: Interrupts enabled while resuming system devices

    Will you please confirm whether the issue is S3 or hibernation?
   
    Thanks.
Comment 5 Christian Casteyde 2009-07-22 19:21:29 UTC
I still do not know exactly the difference between S3 and hibernation.
The laptop is suspended with the following script:

        # Stops network:
        /etc/rc.d/rc.inet1 eth1_stop

        # Flush all data to disk (just in case)
        LD_PRELOAD="" sync

        # Save the system time
        hwclock --systohc --utc

        # Suspend
        echo -n mem > /sys/power/state

        # 2.6.25 does not send lid event at resume:
        rm -rf /var/state/hibernating

        # Restore the system time
        hwclock --hctosys --utc

        # Reset the graphic card
        vbetool post

So it's basically a suspend to ram for me...
Comment 6 ykzhao 2009-07-23 02:06:03 UTC
Hi, Christian
    Will you please try the debug patch in http://bugzilla.kernel.org/show_bug.cgi?id=13269#C8 and attach the output of dmesg after suspend/resume?
    Thanks.
Comment 7 ykzhao 2009-07-23 07:15:46 UTC
Created attachment 22463 [details]
Add the suspend/resume flag to aovid smp call in course of cpufreq_suspend/resume

Will you please try the attached debug patch and see whether the issue still exist?
   Thanks.
Comment 8 Christian Casteyde 2009-07-23 17:38:00 UTC
Created attachment 22476 [details]
dmesg for 2.6.31+proposed fix after resume

Here is the dmesg output after suspend/resume.
The kernel was patched with both patches (printk debug patches + your proposed patch to avoid smp in suspend/resume - btw my kernel is not smp enabled, but preempt enabled), and it seems to resume correctly now. Without it, it would block on a black screen, so the proposed patch improves if not fixes the problem.

Hope this dmesg output will help you to analyze/confirm.

Thanks
Comment 9 Christian Casteyde 2009-07-23 17:41:33 UTC
Btw, I've also new critical errors at resume
(maybe not the same problem, but as it's the same context, I report here)

Message from syslogd@athor at Thu Jul 23 19:39:43 2009 ...
athor kernel: Uhhuh. NMI received for unknown reason a0 on CPU 0.

Message from syslogd@athor at Thu Jul 23 19:39:43 2009 ...
athor kernel: Dazed and confused, but trying to continue

Message from syslogd@athor at Thu Jul 23 19:39:43 2009 ...
athor kernel: You have some hardware problem, likely on the PCI bus.

This is also a new problem, that didn't exist with 2.6.30 kernels.

Otherwise, the system works after that.
Comment 10 ykzhao 2009-07-24 01:05:18 UTC
Hi, Christian
    Thanks for the test.
    It seems that it can be resumed correctly now. And OS won't complain the following warning message after applying the patch.
    >WARNING: at drivers/base/sys.c:411 sysdev_suspend+0x1f7/0x2b0()
Jul 15 20:13:03 athor kernel: Hardware name: Aspire 1510
Jul 15 20:13:03 athor kernel: Interrupts enabled after
cpufreq_suspend+0x0/0x110
    >WARNING: at drivers/base/sys.c:475 sysdev_resume+0x9f/0xb0()
Jul 15 20:13:03 athor kernel: Hardware name: Aspire 1510
Jul 15 20:13:03 athor kernel: Interrupts enabled while resuming system devices

    So this bug will be marked as resolved. And I will try to push the patch in comment #7.
    thanks.
Comment 11 Rafael J. Wysocki 2009-07-28 20:43:33 UTC
On Tuesday 28 July 2009, Christian Casteyde wrote:
> Still present in rc4, but a patch is pending for inclusion.
>
> Le dimanche 26 juillet 2009, vous avez écrit :
> > This message has been generated automatically as a part of a report
> > of recent regressions.
> > 
> > The following bug entry is on the current list of known regressions
> > from 2.6.30.  Please verify if it still should be listed and let me know
> > (either way).
> > 
> > 
> > Bug-Entry   : http://bugzilla.kernel.org/show_bug.cgi?id=13781
> > Subject             : System freeze at resume after suspend to RAM
> > Submitter   : Christian Casteyde <casteyde.christian@free.fr>
> > Date                : 2009-07-15 18:42 (12 days old)
Comment 12 Rafael J. Wysocki 2009-07-28 22:27:21 UTC
Handled-By : Zhao Yakui <yakui.zhao@intel.com>
Patch : http://bugzilla.kernel.org/attachment.cgi?id=22463
Comment 13 Rafael J. Wysocki 2009-08-05 11:48:55 UTC
Should be fixed by http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=4bc5d34135039566b8d6efa2de7515b2be505da8 .

Please reopen in case it's not fixed.