Bug 13149
Summary: | system unstable after resume from suspend | ||
---|---|---|---|
Product: | Power Management | Reporter: | Gu Rui (chaos.proton) |
Component: | Hibernation/Suspend | Assignee: | power-management_other |
Status: | CLOSED UNREPRODUCIBLE | ||
Severity: | normal | CC: | akpm, andyv133, chaos.proton, gavin.kinsey, rjw, rui.zhang, rw, yakui.zhao |
Priority: | P1 | ||
Hardware: | All | ||
OS: | Linux | ||
Kernel Version: | 2.6.30-rc3 | Subsystem: | |
Regression: | Yes | Bisected commit-id: | |
Bug Depends on: | |||
Bug Blocks: | 7216, 11808 | ||
Attachments: |
full dmesg
config config for 64bit kernel dmesg of 64bit kernel config for 32bit kernel dmesg of 32bit kernel Dmesg, including error. Kernel config for Slackware64 huge 2.6.29.6 acpidump output lspci -vxxx output |
Description
Gu Rui
2009-04-22 15:10:25 UTC
So this seems to be clock-related, doesn't it? Based on the information given in the original report to me, yes. I can't reproduce this on any of my hardware though. gack, what a nasty bug - crashes all over the place - it looks like memory got trashed. Rafael, what effect does "NEED_CLOCK_SYNC=1 in /etc/pm/config.d/defaults" have upon the kernel? If NEED_CLOCK_SYNC=1, then on suspend, this is run: /sbin/hwclock --systohc and on resume, this is run: /sbin/hwclock --hctosys Created attachment 21110 [details]
full dmesg
Unfortunately, bug still exists in 2.6.30-rc3-git1. I think the key parts are(attachment is full dmesg):
=====================================================
BUG: unable to handle kernel paging request at f76f8000
IP: [<c016f6e1>] get_page_from_freelist+0x2b1/0x480
*pde = 00007067 *pte = 7bec701a
Oops: 0002 [#1] SMP
last sysfs file: /sys/devices/pci0000:00/0000:00:14.4/0000:08:00.0/ssb1:0/net/eth0/carrier
Modules linked in: rfkill_input radeon drm snd_seq_dummy snd_seq_oss snd_seq_midi_event snd_seq snd_seq_device snd_pcm_oss snd_mixer_oss ipv6 nls_cp936 vfat fat ext4 jbd2 crc16 cpufreq_conservative cpufreq_performance cpufreq_powersave powernow_k8 freq_table fuse snd_hda_codec_idt b43 snd_hda_intel snd_hda_codec mac80211 snd_hwdep snd_pcm snd_timer snd cfg80211 psmouse soundcore sdhci_pci i2c_piix4 sdhci rtc_cmos dell_laptop mmc_core rtc_core rfkill b44 ati_agp video input_polldev ohci_hcd shpchp ssb agpgart thermal processor thermal_sys mii i2c_core pcmcia pcmcia_core dcdbas ricoh_mmc output ac led_class sg ehci_hcd evdev rtc_lib serio_raw snd_page_alloc battery wmi k8temp hwmon button
Pid: 3627, comm: firefox-bin Not tainted (2.6.30-rc3-smp-00329-g0c8454f #101) Inspiron 1501
EIP: 0060:[<c016f6e1>] EFLAGS: 00010246 CPU: 0
EIP is at get_page_from_freelist+0x2b1/0x480
EAX: 00000000 EBX: c16edf00 ECX: 00000400 EDX: 00000003
ESI: 00000000 EDI: f76f8000 EBP: c16edf00 ESP: cf9b3e90
DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
Process firefox-bin (pid: 3627, ti=cf9b2000 task=cf99fa70 task.ti=cf9b2000)
Stack:
00000002 00000044 00000246 f76f8000 c06133c0 00000002 00000000 00000000
001280d2 00000002 00000246 00000001 c05801c0 00000000 00000000 c0581aac
c05804a0 c057f9a0 c05804a0 cf99fa70 cf8d3d10 001280d2 c016fbbd c0581aa0
Call Trace:
[<c016fbbd>] ? __alloc_pages_internal+0xbd/0x480
[<c017ec55>] ? handle_mm_fault+0x385/0x610
[<c0119d92>] ? do_page_fault+0x132/0x290
[<c0119c60>] ? do_page_fault+0x0/0x290
[<c043fb2d>] ? error_code+0x6d/0x74
[<c0119c60>] ? do_page_fault+0x0/0x290
Code: 00 79 44 8b 44 24 2c 85 c0 7e 3c 89 dd 31 f6 8d 76 00 ba 03 00 00 00 89 e8 e8 1c e3 fa ff b9 00 04 00 00 89 44 24 0c 89 c7 31 c0 <f3> ab 8b 44 24 0c ba 03 00 00 00 83 c6 01 83 c5 20 e8 89 e1 fa
EIP: [<c016f6e1>] get_page_from_freelist+0x2b1/0x480 SS:ESP 0068:cf9b3e90
CR2: 00000000f76f8000
---[ end trace 1e8349f6b5d8828f ]---
note: firefox-bin[3627] exited with preempt_count 1
BUG: unable to handle kernel paging request at f76f9bf9
IP: [<c01735d1>] truncate_inode_pages_range+0x301/0x360
*pde = 00007067 *pte = 00336701
Oops: 0003 [#2] SMP
last sysfs file: /sys/devices/pci0000:00/0000:00:14.4/0000:08:00.0/ssb1:0/net/eth0/carrier
Modules linked in: rfkill_input radeon drm snd_seq_dummy snd_seq_oss snd_seq_midi_event snd_seq snd_seq_device snd_pcm_oss snd_mixer_oss ipv6 nls_cp936 vfat fat ext4 jbd2 crc16 cpufreq_conservative cpufreq_performance cpufreq_powersave powernow_k8 freq_table fuse snd_hda_codec_idt b43 snd_hda_intel snd_hda_codec mac80211 snd_hwdep snd_pcm snd_timer snd cfg80211 psmouse soundcore sdhci_pci i2c_piix4 sdhci rtc_cmos dell_laptop mmc_core rtc_core rfkill b44 ati_agp video input_polldev ohci_hcd shpchp ssb agpgart thermal processor thermal_sys mii i2c_core pcmcia pcmcia_core dcdbas ricoh_mmc output ac led_class sg ehci_hcd evdev rtc_lib serio_raw snd_page_alloc battery wmi k8temp hwmon button
Pid: 2611, comm: syslogd Tainted: G D (2.6.30-rc3-smp-00329-g0c8454f #101) Inspiron 1501
EIP: 0060:[<c01735d1>] EFLAGS: 00010213 CPU: 0
EIP is at truncate_inode_pages_range+0x301/0x360
EAX: 00000000 EBX: c16edf20 ECX: 00000101 EDX: 00000407
ESI: 00000bf9 EDI: f76f9bf9 EBP: f50f594c ESP: f61a3c60
DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
Process syslogd (pid: 2611, ti=f61a2000 task=f7162cb0 task.ti=f61a2000)
Stack:
0000000e 00000000 f6f1a030 f76f9000 00000407 00098bf9 00000000 00000099
ffffffff 00000bf9 00000000 00000000 f50f5974 f50f596c f50f594c 00000292
c012162e 00000206 00000000 00000003 f54bd03c f71fcc00 f54bd000 00000000
Call Trace:
[<c012162e>] ? __wake_up+0x3e/0x60
[<c0173647>] ? truncate_inode_pages+0x17/0x20
[<c017e75f>] ? vmtruncate+0x14f/0x180
[<c01e7683>] ? ext3_writeback_write_end+0x103/0x180
[<c016abb9>] ? generic_file_buffered_write+0x169/0x300
[<c016b1fa>] ? __generic_file_aio_write_nolock+0x23a/0x550
[<c013f210>] ? autoremove_wake_function+0x0/0x50
[<c016bfd0>] ? generic_file_aio_write+0x60/0xe0
[<c01a54f4>] ? dput+0x84/0x120
[<c01e4900>] ? ext3_file_write+0x30/0xc0
[<c01e48d0>] ? ext3_file_write+0x0/0xc0
[<c0194c0e>] ? do_sync_readv_writev+0xce/0x110
[<c01a54f4>] ? dput+0x84/0x120
[<c019ec5b>] ? getname+0x9b/0xd0
[<c013f210>] ? autoremove_wake_function+0x0/0x50
[<c0198897>] ? cp_new_stat64+0xf7/0x110
[<c020d5ac>] ? security_file_permission+0xc/0x10
[<c0194eea>] ? rw_verify_area+0x5a/0xd0
[<c01953ca>] ? do_readv_writev+0xaa/0x1a0
[<c01e48d0>] ? ext3_file_write+0x0/0xc0
[<c0195503>] ? vfs_writev+0x43/0x60
[<c0195607>] ? sys_writev+0x47/0x90
[<c0102ed5>] ? syscall_call+0x7/0xb
Code: 46 a4 fa ff 8b 74 24 24 c7 44 24 10 00 10 00 00 89 c7 29 74 24 10 01 f7 8b 54 24 10 89 44 24 0c 31 c0 c1 ea 02 89 d1 8b 54 24 10 <f3> ab f6 c2 02 74 02 66 ab f6 c2 01 74 01 aa 8b 44 24 0c ba 03
EIP: [<c01735d1>] truncate_inode_pages_range+0x301/0x360 SS:ESP 0068:f61a3c60
CR2: 00000000f76f9bf9
---[ end trace 1e8349f6b5d88290 ]---
note: syslogd[2611] exited with preempt_count 1
=====================================================
My laptop is DELL1501, AMD Turion(tm) 64 X2 Mobile Technology TL-56. Even with NEED_CLOCK_SYNC=1 in /etc/pm/config.d/defaults system won't survive. But it will run about 30mins. Without NEED_CLOCK_SYNC=1, system crash once I launch something(say, firefox) after resume.
Robby Workman, What configuration did you use? I use the config shipped with slackware can trigger the bug above. I'm using 2.6.29.1 (generic+initrd) in our -current tree. I'm on relatively old hardware (T41) most of the time, but I've not seen a pm regression in a long time (lucky me). I've even got two desktop boxes that I s2ram quite often, and aside from some issues with the r8169 modules having to be removed before suspend, it's been uneventful there too. Anyway... Are you using the hugesmp or genericsmp kernel image? It shouldn't make a difference, but just in case you're using hugesmp, try genericsmp to be sure. It's possible, though unlikely, that something built into the hugesmp kernel doesn't like the corresponding module for it coexisting in the module tree. Created attachment 21113 [details]
config
Currently I'm using a self-compiled 2.6.30-rc3-git1 kernel in order to see whether there is anyone who fixed this bug accidentally. But the configuration is based on the configuration of kernel-generic-smp-2.6.29.1 in our -current tree. The only difference I made is compiling in Ext3 support so there is no need to bother initrd. The attachment is the full configuration.
The bug still exists in 2.6.30-rc4-git1. I notice that in 2.6.27.21, once I resume from a long time suspend, I will have a screen-saver on the screen that just start up. But in recent kernel I couldn't see this. It seems it's too late to /sbin/hwclock --hctosys after resume. I installed slackware64 recently and found the suspend-resume works perfectly on the same hardware. While, 32 bit kernel still not working... I hope this info could help you to debug it. IOW, with 2.6.30-rc7 64-bit suspend-resume works, while with 2.6.30-rc7 32-bit it doesn't work correctly. Is that correct? If that's the case, please attach .config for both the 64-bit and 32-bit kernels. Also please attach a boot log (dmesg output) for each of them. Created attachment 21515 [details]
config for 64bit kernel
Created attachment 21516 [details]
dmesg of 64bit kernel
Created attachment 21517 [details]
config for 32bit kernel
Created attachment 21518 [details]
dmesg of 32bit kernel
Crash info included.
*** Bug 14074 has been marked as a duplicate of this bug. *** I believe I have this same issue, on the 2.6.29.6 Slackware64 kernel. Previously, this hardware suspended/resumed perfectly with the 2.6.28-15.49 kernel under 64 bit Ubuntu. Kernel config and dmesg to follow. Created attachment 23080 [details]
Dmesg, including error.
Created attachment 23081 [details]
Kernel config for Slackware64 huge 2.6.29.6
hi, Andy Will you please double check whether it can work on the 2.6.29.6 kernel? please wait for about 20 minutes after entering the sleeping state. Will you please also attach the output of lspci -vxxx, acpidump. Thanks. ykzhao, I'll post those up after this comment. Also, I think this might have something to do with cpu freqency scaling. I put it to sleep, let it sleep a half hour or so, then ran it for a good 45 minutes and never had any issues. I remembered that I had CPU frequency scaling disabled, re-enabled it, and repeated the process. It was immediately after it woke up that I received the kernel paging request error in dmesg. -Andy Created attachment 23438 [details]
acpidump output
Created attachment 23439 [details]
lspci -vxxx output
Andy and Gu Rui, does the problem still exists in the latest upstream kernel, say 2.6.32? Hello Zhang Rui and other kernel devs, I'm sorry that I don't have 32bit linux in my box now(replaced by an ugly Windows...). So I can't test the 32bit kernel. Anyway, 64bit kernel(2.6.32.2) works very well on my side. Thanks for your concern. close this bug as it can not be reproduced by the original bug reporter. |