Bug 66181

Summary: WARNING: CPU: 3 PID: 3737 at lib/list_debug.c:59 __list_del_entry+0xa1/0xd0()
Product: Memory Management Reporter: Mikhail (mikhail.v.gavrilov)
Component: OtherAssignee: Andrew Morton (akpm)
Status: RESOLVED DUPLICATE    
Severity: normal CC: alan
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: 3.12 Subsystem:
Regression: No Bisected commit-id:
Attachments: dmesg output
system log

Description Mikhail 2013-11-30 07:54:54 UTC

    
Comment 1 Mikhail 2013-11-30 07:55:15 UTC
35507.083263] WARNING: CPU: 3 PID: 3737 at lib/list_debug.c:59 __list_del_entry+0xa1/0xd0()
[35507.083264] list_del corruption. prev->next should be ffff880068bc3d30, but was ffff88006abc3d30
[35507.083265] Modules linked in: rfcomm nls_utf8 vfat fat isofs fuse nf_conntrack_netbios_ns nf_conntrack_broadcast ipt_MASQUERADE ip6t_REJECT xt_conntrack cfg80211 ebtable_nat ebtable_broute bridge stp llc ebtable_filter ebtables ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 ip6table_mangle ip6table_security ip6table_raw ip6table_filter ip6_tables iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack iptable_mangle iptable_security iptable_raw bnep iTCO_wdt iTCO_vendor_support ppdev joydev btusb option usb_wwan x86_pkg_temp_thermal bluetooth coretemp kvm_intel kvm cdc_ncm usbnet crct10dif_pclmul crc32_pclmul rfkill crc32c_intel ghash_clmulni_intel microcode snd_hda_codec_realtek snd_hda_codec_hdmi serio_raw snd_hda_intel snd_hda_codec snd_hwdep snd_seq snd_seq_device
[35507.083290]  i2c_i801 snd_pcm r8169 mii mei_me lpc_ich mfd_core snd_page_alloc mei shpchp snd_timer snd soundcore parport_pc parport binfmt_misc nfsd auth_rpcgss nfs_acl lockd sunrpc hid_logitech_dj usb_storage i915 i2c_algo_bit drm_kms_helper drm i2c_core video
[35507.083308] CPU: 3 PID: 3737 Comm: Compositor Tainted: G        W    3.12.0-2.fc21.x86_64+debug #1
[35507.083309] Hardware name: Gigabyte Technology Co., Ltd. Z87M-D3H/Z87M-D3H, BIOS F8 08/03/2013
[35507.083311]  0000000000000009 ffff880068bc3c08 ffffffff81742e92 ffff880068bc3c50
[35507.083315]  ffff880068bc3c40 ffffffff8107570d ffff880068bc3d28 ffff880068bc3d30
[35507.083318]  ffffffff829f2770 ffffffff829f2728 ffff880676cecac0 ffff880068bc3ca0
[35507.083322] Call Trace:
[35507.083327]  [<ffffffff81742e92>] dump_stack+0x54/0x74
[35507.083330]  [<ffffffff8107570d>] warn_slowpath_common+0x7d/0xa0
[35507.083333]  [<ffffffff8107577c>] warn_slowpath_fmt+0x4c/0x50
[35507.083337]  [<ffffffff8137937d>] ? plist_check_list+0x3d/0x50
[35507.083339]  [<ffffffff8138e631>] __list_del_entry+0xa1/0xd0
[35507.083342]  [<ffffffff813794df>] plist_del+0x3f/0x80
[35507.083346]  [<ffffffff810f6f51>] __unqueue_futex+0x31/0x60
[35507.083348]  [<ffffffff810f801d>] futex_wait+0xfd/0x2a0
[35507.083352]  [<ffffffff810a53b0>] ? hrtimer_get_res+0x50/0x50
[35507.083354]  [<ffffffff810a6514>] ? hrtimer_start_range_ns+0x14/0x20
[35507.083358]  [<ffffffff810f9de6>] do_futex+0xe6/0xc70
[35507.083361]  [<ffffffff810f4e78>] ? lock_release_non_nested+0x308/0x350
[35507.083365]  [<ffffffff811a68f7>] ? might_fault+0x57/0xb0
[35507.083368]  [<ffffffff810fa9e1>] SyS_futex+0x71/0x150
[35507.083371]  [<ffffffff81756450>] tracesys+0xdd/0xe2
[35507.083373] ---[ end trace f372458ccb4eb28f ]---
[35508.057193] ------------[ cut here ]------------
Comment 2 Mikhail 2013-11-30 07:55:40 UTC
Created attachment 116851 [details]
dmesg output
Comment 3 Mikhail 2013-11-30 15:58:50 UTC
Created attachment 116901 [details]
system log
Comment 4 Andrew Morton 2013-12-03 00:45:15 UTC
ffff880068bc3d30 versus ffff88006abc3d30.  A single bit change like this is very very often due to a hardware problem.
Comment 5 Mikhail 2013-12-03 01:37:24 UTC
This is new machine based on haswell processor and Z series chipset. I can replace any part of system by guarantee if you can help me prove to seller that this part has defect. Any idea how I can do it?
Comment 6 Andrew Morton 2013-12-03 02:25:52 UTC
You could try running memtest86 for a very long time (days).  I can't think of any other way which would convince the seller, sorry :(
Comment 7 Alan 2013-12-03 15:18:17 UTC
A single bit flip is most likely to be bad memory if its hardware. It could be other things but you would expect to see a machine check event from a CPU cache fault or thermal problem, and other diagnostics from things like SATA cable problems.

I would definitely give it a 24hr run with memtest86 if possible, and depnding on the DIMM configuration look at testing it with half the memory fitted only and then the other half. 

Might show nothing useful but memtest86 at least is any easy thing to do and if it turns up errors you've nailed the problem
Comment 8 Alan 2013-12-18 14:39:42 UTC

*** This bug has been marked as a duplicate of bug 64521 ***