Bug 42691

Summary: crash or "list_del corruption. prev->next should be .. but was (null)" after resume from hibernation
Product: Power Management Reporter: Arne Woerner (arne_woerner)
Component: Hibernation/SuspendAssignee: power-management_other
Status: CLOSED CODE_FIX    
Severity: high CC: lenb, mishu
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: 3.2.2-1.fc16.x86_64 Subsystem:
Regression: No Bisected commit-id:
Attachments: syslog messages
contents of /proc/cpuinfo
output of lspci -v
output of lsusb -v
output of lspci -v

Description Arne Woerner 2012-01-30 08:22:55 UTC
Created attachment 72224 [details]
syslog messages

sometimes it even crashes when i log out from my gnome session...
but resume from hibernation or suspend crashes more often (30%)...
i use Fedora Core 16 on an Asrock H61M-ITX with an Intel Core i7-2600K with no 2nd graphix card...
maybe it is not hibernate/suspend related?
i cant say if it is a new bug, because the box is new...
i play SecondLife a lot with the /usr/lib64/xorg/modules/drivers/intel_drv.so and onboard audio driver (HDMI and analog-stereo)...

Linux version 3.2.2-1.fc16.x86_64 (mockbuild@x86-13.phx2.fedoraproject.org) (gcc version 4.6.2 20111027 (Red Hat 4.6.2-1) (GCC) ) #1 SMP Thu Jan 26 03:21:58 UTC 2012
Comment 1 Arne Woerner 2012-01-30 08:24:33 UTC
Created attachment 72225 [details]
contents of /proc/cpuinfo
Comment 2 Len Brown 2012-01-31 02:30:11 UTC
Is this really related to suspend or hibernate?

Can you reproduce it (by logging out, or whatever)
on a system where you have never suspended or hibernated
since booting?
Comment 3 Arne Woerner 2012-01-31 08:18:40 UTC
last night it refused to hibernate
(it stuck at a black screen with white cursor)...

i will just turn off monitors in the next few days instead of using suspend/hibernate... playing SecondLife (once) after a fresh reboot and then logging off+on doesnt trigger the bug...

but i m quite sure, that the uptime before a hang/panic was more than a day (that means i hibernated/suspended it at least once), IIRC...

rtcwake or not rtcwake makes no difference...
Comment 4 Arne Woerner 2012-01-31 15:29:59 UTC
Created attachment 72245 [details]
output of lspci -v
Comment 5 Arne Woerner 2012-01-31 15:30:22 UTC
Created attachment 72246 [details]
output of lsusb -v
Comment 6 Arne Woerner 2012-01-31 15:31:02 UTC
Created attachment 72247 [details]
output of lspci -v
Comment 7 Arne Woerner 2012-02-01 10:52:06 UTC
no crash after 28hrs uptime and normal use (just no suspend)... :-) -arne
Comment 8 Arne Woerner 2012-02-01 12:30:33 UTC
the intel-gfx@lists.freedesktop.org mailing list told me this:
http://lists.freedesktop.org/archives/intel-gfx/2012-January/014825.html

seems to b a known prob...

-arne
Comment 9 Arne Woerner 2012-02-03 09:46:56 UTC
last night i hibernated again (after 65hrs uptime) and
today after thawn it had again this:
list_add corruption. next->prev should be prev (ffff88023017cbf8), but was        
   (null). (next=ffff88023017cbf8).
and it failed to reboot...

now i try to do it with tuxonice and shutdown method...

-arne
Comment 10 Arne Woerner 2012-02-03 12:44:39 UTC
/sys/power/disk:
"shutdown" isnt better than "platform"...

tuxonice:
i dont know how to activate it...
it seems like i need a custom kernel...

now i will use "halt -p" and restore the applications every morning...
as a workaround... :-)

-arne
Comment 11 Arne Woerner 2012-02-10 23:33:19 UTC
today i got a new symptom:
i hibernated the box and when i came back i tried to thaw it...
everything went quite good (some time just a cursor and then the background image on the right monitor)...
but then: the left monitor had still black background with very fast scrolling messages (i couldnt read them)...

is there a workaround?
i mean: KMS is quite old... or isnt it KMS related?
could it increase overall stabilily when i buy a dedicated gfx card?

why r graphix cards so complicated? :-)

-arne
Comment 12 Arne Woerner 2012-02-11 15:38:06 UTC
3.2.5-3.fc16.x86_64 has this bug, too:
i just did a "find /sys | grep fan" after 2 otherwise successful hibernate/thaw cycles, and my GNOME crashed (b&w text mode with panic messages) and i had to reboot...
-arne
Comment 13 Arne Woerner 2012-02-21 22:33:12 UTC
it did it again...
this time it seems like it tried to execute an invalid instruction... :-)

kernel BUG at fs/dcache.c:154!
invalid opcode: 0000 [#1] SMP 
CPU 1  
Modules linked in: usblp tcp_lp ftdi_sio binfmt_misc bnep bluetooth rfkill ppdev parport_pc lp parport fuse nfs fscache auth_rpcgss nfs_acl lockd nf_conntrack_tftp ipt_LOG ip6t_REJECT nf_conntrack_ipv6 nf_conntrack_ipv4 nf_defrag_ipv4 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter ip6_tables coretemp w83627ehf hwmon_vid snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_intel snd_hda_codec snd_hwdep snd_seq snd_seq_device snd_pcm snd_timer snd iTCO_wdt cdc_acm r8169 mii iTCO_vendor_support i2c_i801 soundcore snd_page_alloc microcode sunrpc uinput i915 drm_kms_helper drm i2c_algo_bit i2c_core video [last unloaded: scsi_wait_scan]

Pid: 70, comm: kswapd0 Not tainted 3.2.6-3.fc16.x86_64 #1 To Be Filled By O.E.M. To Be Filled By O.E.M./H61M-ITX
RIP: 0010:[<ffffffff8118dd04>]  [<ffffffff8118dd04>] d_free+0x64/0x70
RSP: 0018:ffff88022e19db20  EFLAGS: 00010286
RAX: 0000000000000000 RBX: ffff8802303d79c0 RCX: 00000000ffffffff 
RDX: 0000000000000002 RSI: ffff880214dab150 RDI: ffff8802303d79c0 
RBP: ffff88022e19db30 R08: ffff8802303d7a70 R09: ffffc90000002000 
R10: 000000000001ccf0 R11: 0000000000000002 R12: ffff880214dab0d0 
R13: ffff8802303d7cc0 R14: ffff8802303d7a70 R15: ffff8802303d7cc0 
FS:  0000000000000000(0000) GS:ffff88023fa40000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 00000000e8c0a040 CR3: 0000000211dad000 CR4: 00000000000406e0 
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 
Process kswapd0 (pid: 70, threadinfo ffff88022e19c000, task ffff88022e1ddc80)
Stack: 
 ffff8802303d7cc0 ffff8802303d79c0 ffff88022e19db60 ffffffff8118ee17
 ffff8802303d79c0 ffff88022e19dbf0 ffff880214dab0d0 ffff8802303d7a1c
 ffff88022e19dbc0 ffffffff8118eff7 0000000000000001 ffff8802303d79c0
Call Trace:
 [<ffffffff8118ee17>] d_kill+0xa7/0x100
 [<ffffffff8118eff7>] shrink_dentry_list+0x187/0x1e0
 [<ffffffff8118fc11>] prune_dcache_sb+0x121/0x140
 [<ffffffff8117c060>] prune_super+0x130/0x1a0
 [<ffffffff8112bab4>] shrink_slab+0x154/0x310
 [<ffffffff8112f22a>] balance_pgdat+0x4fa/0x6c0
 [<ffffffff8112f568>] kswapd+0x178/0x3d0
 [<ffffffff815df2c4>] ? __schedule+0x3d4/0x8c0
 [<ffffffff81090440>] ? remove_wait_queue+0x50/0x50
 [<ffffffff8112f3f0>] ? balance_pgdat+0x6c0/0x6c0
 [<ffffffff8108fb9c>] kthread+0x8c/0xa0
 [<ffffffff815ebaf4>] kernel_thread_helper+0x4/0x10
 [<ffffffff8108fb10>] ? kthread_worker_fn+0x190/0x190
 [<ffffffff815ebaf0>] ? gs_change+0x13/0x13  
Code: bb 90 00 00 00 74 18 48 c7 c6 e0 da 18 81 e8 b4 6a f5 ff 48 83 c4 08 5b 5d c3 0f 1f 44 00 00 e8 e3 fd ff ff 48 83 c4 08 5b 5d c3 <0f> 0b 66 2e 0f 1f 84 00 00 00 00 00 55 48 89 e5 41 57 41 56 41 
RIP  [<ffffffff8118dd04>] d_free+0x64/0x70
 RSP <ffff88022e19db20>
HDMI hot plug event: Codec=3 Pin=7 Presence_Detect=0 ELD_Valid=0
HDMI status: Codec=3 Pin=7 Presence_Detect=0 ELD_Valid=0
HDMI hot plug event: Codec=3 Pin=7 Presence_Detect=1 ELD_Valid=0 
HDMI status: Codec=3 Pin=7 Presence_Detect=1 ELD_Valid=0

-arne
Comment 14 Arne Woerner 2012-02-22 07:29:11 UTC
today (after the first thaw since last reboot) it says this:
# find /sys | wc -l
find: WARNING: file `/sys/kernel/debug/dri/64/i915_blt_ringbuffer_data' appears to have mode 0000
23480
# ls -l /sys/kernel/debug/dri/64/i915_blt_ringbuffer_data
?--------- 1 root root 0 Feb 21 20:14 /sys/kernel/debug/dri/64/i915_blt_ringbuffer_data

looks like it needs a reboot...

why does it do that?

-arne
Comment 15 Arne Woerner 2012-02-25 07:18:01 UTC
with 3.2.7-1.fc16.x86_64 it still doesnt thaw properly (in spite of "3.2.6-4 Freeze all filesystems during system suspend/hibernate."):
------------[ cut here ]------------
kernel BUG at fs/inode.c:429!
invalid opcode: 0000 [#1] SMP
CPU 1
Modules linked in: tcp_lp usblp binfmt_misc usb_storage ppdev parport_pc lp parport fuse bnep bluetooth rf
kill nfs fscache auth_rpcgss nfs_acl lockd ipt_LOG nf_conntrack_ipv4 nf_conntrack_tftp ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 nf_defrag_ipv4 xt_state nf_conntrack ip6table_filter ip6_tables coretemp w83627ehf hwmon_vid snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_intel snd_hda_codec snd_hwdep snd_seq snd_seq_device snd_pcm snd_timer snd soundcore r8169 snd_page_alloc iTCO_wdt mii cdc_acm i2c_i801 microcode iTCO_vendor_support sunrpc uinput i915 drm_kms_helper drm i2c_algo_bit i2c_core video [last unloaded: scsi_wait_scan]

Pid: 26157, comm: crond Tainted: G          I  3.2.7-1.fc16.x86_64 #1 To Be Filled By O.E.M. To Be Filled By O.E.M./H61M-ITX
RIP: 0010:[<ffffffff811921dc>]  [<ffffffff811921dc>] end_writeback+0x9c/0xa0
RSP: 0018:ffff880231539e08  EFLAGS: 00010207
RAX: ffff880230180c00 RBX: ffff880230180a30 RCX: dead000000200200
RDX: 000000000000002f RSI: ffff880230180ab0 RDI: ffff880230180b88
RBP: ffff880231539e18 R08: ffff88020cadc5f0 R09: 0000000000000001
R10: ffff88009e0aeb10 R11: 0000000000000001 R12: ffff880230180b28
R13: ffffffff816790a0 R14: ffffffff816790a0 R15: ffff880230180a30
FS:  00007f49d17e07c0(0000) GS:ffff88023fa40000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007fff9405c030 CR3: 0000000209629000 CR4: 00000000000406e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process crond (pid: 26157, threadinfo ffff880231538000, task ffff8802314f2e40)
Stack:
 0000000000000000 ffff880230180a30 ffff880231539e48 ffffffff81192412
 ffff880231539e58 ffff880230180a30 ffff880230180ab0 ffff880232498800
 ffff880231539e78 ffffffff81192593 ffff88020cadc540 ffff880230180a30
Call Trace:
 [<ffffffff81192412>] evict+0x122/0x1a0
 [<ffffffff81192593>] iput+0x103/0x200
 [<ffffffff8118f0e0>] d_kill+0xf0/0x100
 [<ffffffff8118f772>] dput+0xe2/0x1b0
 [<ffffffff8117aad6>] fput+0x176/0x220
 [<ffffffff81177216>] filp_close+0x66/0x90
 [<ffffffff811772d8>] sys_close+0x98/0xf0
 [<ffffffff815e9d82>] system_call_fastpath+0x16/0x1b
Code: 02 00 00 00 48 c7 c2 a0 10 19 81 be 07 00 00 00 e8 fa e5 44 00 48 c7 83 98 00 00 00 60 00 00 00 48 83 c4 08 5b 5d c3 0f 0b 0f 0b <0f> 0b 0f 0b 55 48 89 e5 53 48 83 ec 08 66 66 66 66 90 48 89 fb
RIP  [<ffffffff811921dc>] end_writeback+0x9c/0xa0
 RSP <ffff880231539e08>
HDMI hot plug event: Codec=3 Pin=7 Presence_Detect=0 ELD_Valid=0
HDMI status: Codec=3 Pin=7 Presence_Detect=0 ELD_Valid=0
HDMI hot plug event: Codec=3 Pin=7 Presence_Detect=1 ELD_Valid=0
HDMI status: Codec=3 Pin=7 Presence_Detect=1 ELD_Valid=0
---[ end trace a7919e7f17c0a727 ]---

-arne
Comment 16 Arne Woerner 2012-02-29 07:28:56 UTC
since i disable the write cache of my hard disc (WDC WD10EARS-00Y5B1) 10 seconds before hibernation (with "shutdown"), i was able to thaw 4 times without any intermediate reboot/oops/panic... :-)
-arne
Comment 17 Arne Woerner 2012-03-16 18:16:32 UTC
neither
(1) disabling the cache of my hard disc
nor
(2) emptying the swap area
is a workaround for this bug...

now i try tuxonice from atrpms...
Comment 18 Arne Woerner 2012-04-20 16:50:10 UTC
seems to work now...
with 3.3.2-1.fc17.x86_64...
Comment 19 Arne Woerner 2012-05-04 17:01:00 UTC
works now...