Created attachment 120961 [details] Dmesg (trimmed) Captured using kdump/kexec. (ABRT seems unable to process for some reason so I did my best manually.) Apologies if it is a duplicate, a quick google didn't find it. Info on wireless chip from dmesg: [ 4.538099] ieee80211 phy0: rt2x00_set_rt: Info - RT chipset 3572, rev 0223 detected [ 4.563253] ieee80211 phy0: rt2x00_set_rf: Info - RF chipset 0009 detected [ 4.573946] usbcore: registered new interface driver rt2800usb [ 11.956641] ieee80211 phy0: rt2x00lib_request_firmware: Info - Loading firmware file 'rt2870.bin' [ 11.957474] ieee80211 phy0: rt2x00lib_request_firmware: Info - Firmware detected - version: 0.29 Last part of dmesg including panic (note that I have moved some messages to make the panic more legible, originally they were all interleaved as per the timestamps): [19459.906320] wlp0s20u6u1u2: disassociated from 00:26:f2:fe:d6:55 (Reason: 7) [19459.965253] wlp0s20u6u1u2: deauthenticating from 00:26:f2:fe:d6:55 by local choice (reason=3) [19459.965280] cfg80211: Calling CRDA to update world regulatory domain [19459.966419] wlp0s20u6u1u2: authenticate with 00:26:f2:fe:d6:55 [19459.990863] wlp0s20u6u1u2: send auth to 00:26:f2:fe:d6:55 (try 1/3) [19459.991041] cfg80211: World regulatory domain updated: [19459.991043] cfg80211: (start_freq - end_freq @ bandwidth), (max_antenna_gain, max_eirp) [19459.991045] cfg80211: (2402000 KHz - 2472000 KHz @ 40000 KHz), (300 mBi, 2000 mBm) [19459.991046] cfg80211: (2457000 KHz - 2482000 KHz @ 40000 KHz), (300 mBi, 2000 mBm) [19459.991047] cfg80211: (2474000 KHz - 2494000 KHz @ 20000 KHz), (300 mBi, 2000 mBm) [19459.991048] cfg80211: (5170000 KHz - 5250000 KHz @ 40000 KHz), (300 mBi, 2000 mBm) [19459.991049] cfg80211: (5735000 KHz - 5835000 KHz @ 40000 KHz), (300 mBi, 2000 mBm) [19459.992057] wlp0s20u6u1u2: authenticated [19459.993080] wlp0s20u6u1u2: associate with 00:26:f2:fe:d6:55 (try 1/3) [19459.994172] wlp0s20u6u1u2: RX AssocResp from 00:26:f2:fe:d6:55 (capab=0x11 status=0 aid=2) [19460.004413] wlp0s20u6u1u2: associated [19460.004645] cfg80211: Calling CRDA for country: AU [19460.008199] cfg80211: Regulatory domain changed to country: AU [19460.008200] cfg80211: (start_freq - end_freq @ bandwidth), (max_antenna_gain, max_eirp) [19460.008201] cfg80211: (2402000 KHz - 2482000 KHz @ 40000 KHz), (N/A, 2000 mBm) [19460.008202] cfg80211: (5170000 KHz - 5250000 KHz @ 40000 KHz), (300 mBi, 2300 mBm) [19460.008204] cfg80211: (5250000 KHz - 5330000 KHz @ 40000 KHz), (300 mBi, 2300 mBm) [19460.008205] cfg80211: (5735000 KHz - 5835000 KHz @ 40000 KHz), (300 mBi, 3000 mBm) [19580.152090] ieee80211 phy2: rt2800usb_entry_txstatus_timeout: Warning - TX status timeout for entry 2 in queue [19580.152097] ieee80211 phy2: rt2800usb_entry_txstatus_timeout: Warning - TX status timeout for entry 2 in queue [19580.152099] ieee80211 phy2: rt2800usb_entry_txstatus_timeout: Warning - TX status timeout for entry 2 in queue [19699.726188] wlp0s20u6u1u2: disassociated from 00:26:f2:fe:d6:55 (Reason: 7) [19699.784571] wlp0s20u6u1u2: deauthenticating from 00:26:f2:fe:d6:55 by local choice (reason=3) [19699.784605] cfg80211: Calling CRDA to update world regulatory domain [19699.785919] wlp0s20u6u1u2: authenticate with 00:26:f2:fe:d6:55 [19699.813369] wlp0s20u6u1u2: send auth to 00:26:f2:fe:d6:55 (try 1/3) [19699.813562] cfg80211: World regulatory domain updated: [19699.813563] cfg80211: (start_freq - end_freq @ bandwidth), (max_antenna_gain, max_eirp) [19699.813564] cfg80211: (2402000 KHz - 2472000 KHz @ 40000 KHz), (300 mBi, 2000 mBm) [19699.813565] cfg80211: (2457000 KHz - 2482000 KHz @ 40000 KHz), (300 mBi, 2000 mBm) [19699.813565] cfg80211: (2474000 KHz - 2494000 KHz @ 20000 KHz), (300 mBi, 2000 mBm) [19699.813566] cfg80211: (5170000 KHz - 5250000 KHz @ 40000 KHz), (300 mBi, 2000 mBm) [19699.813566] cfg80211: (5735000 KHz - 5835000 KHz @ 40000 KHz), (300 mBi, 2000 mBm) [19699.814599] wlp0s20u6u1u2: authenticated [19699.815359] wlp0s20u6u1u2: associate with 00:26:f2:fe:d6:55 (try 1/3) [19699.816443] wlp0s20u6u1u2: RX AssocResp from 00:26:f2:fe:d6:55 (capab=0x11 status=0 aid=2) [19699.827322] wlp0s20u6u1u2: associated [19699.827543] cfg80211: Calling CRDA for country: AU [19699.831450] cfg80211: Regulatory domain changed to country: AU [19699.831450] cfg80211: (start_freq - end_freq @ bandwidth), (max_antenna_gain, max_eirp) [19699.831451] cfg80211: (2402000 KHz - 2482000 KHz @ 40000 KHz), (N/A, 2000 mBm) [19699.831451] cfg80211: (5170000 KHz - 5250000 KHz @ 40000 KHz), (300 mBi, 2300 mBm) [19699.831451] cfg80211: (5250000 KHz - 5330000 KHz @ 40000 KHz), (300 mBi, 2300 mBm) [19699.831452] cfg80211: (5735000 KHz - 5835000 KHz @ 40000 KHz), (300 mBi, 3000 mBm) [19699.784881] ------------[ cut here ]------------ [19699.784905] kernel BUG at include/linux/skbuff.h:1434! [19699.784927] invalid opcode: 0000 [#1] SMP [19699.784944] Modules linked in: rfcomm fuse xt_CHECKSUM tun ipt_MASQUERADE ip6t_REJECT xt_conntrack ebtable_nat [19699.785172] snd_hwdep microcode snd_seq snd_seq_device snd_pcm serio_raw i2c_i801 snd_page_alloc mei_me snd_ti [19699.785248] CPU: 7 PID: 43 Comm: ksoftirqd/7 Not tainted 3.12.6-300.fc20.x86_64 #1 [19699.785268] Hardware name: Gigabyte Technology Co., Ltd. Z87-HD3/Z87-HD3, BIOS F6 08/03/2013 [19699.785290] task: ffff88081aa7afd0 ti: ffff88081aaac000 task.ti: ffff88081aaac000 [19699.785309] RIP: 0010:[<ffffffff81665536>] [<ffffffff81665536>] __skb_pull.part.40+0x4/0x6 [19699.785334] RSP: 0018:ffff88081aaadc18 EFLAGS: 00010287 [19699.785348] RAX: 000000002a2058db RBX: ffff88081aaadc58 RCX: 0000000000000000 [19699.785366] RDX: 000000000000001a RSI: 000000000000006b RDI: ffff88081a045f80 [19699.785385] RBP: ffff88081aaadc18 R08: 0000c050faff7f5e R09: 000152e15d046df4 [19699.785404] R10: 52e15d046df455d6 R11: fef2260000004188 R12: ffff88081a045f80 [19699.785423] R13: ffff88081a045f80 R14: ffff88063abec600 R15: ffff88081aaadd38 [19699.785441] FS: 0000000000000000(0000) GS:ffff88083edc0000(0000) knlGS:0000000000000000 [19699.785462] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [19699.785477] CR2: 00007f313b69a000 CR3: 0000000807f9a000 CR4: 00000000001407e0 [19699.785495] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [19699.785513] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [19699.785531] Stack: [19699.785536] ffff88081aaadc28 ffffffff815561f3 ffff88081aaadc48 ffffffffa02ce5e4 [19699.785556] 0000000000000000 ffff8807e3ab9140 ffff88081aaadcb0 ffffffffa02cb845 [19699.788586] 0000000000001688 00000007001a01ce 0000000300000000 0000000200000000 [19699.790163] Call Trace: [19699.791638] [<ffffffff815561f3>] skb_pull+0x33/0x40 [19699.793297] [<ffffffffa02ce5e4>] rt2x00crypto_tx_remove_iv+0x54/0x70 [rt2x00lib] [19699.795214] [<ffffffffa02cb845>] rt2x00queue_write_tx_frame+0x2a5/0x410 [rt2x00lib] [19699.797103] [<ffffffffa02c8d88>] rt2x00mac_tx+0xa8/0x380 [rt2x00lib] [19699.799039] [<ffffffff8130b510>] ? timerqueue_add+0x60/0xb0 [19699.800950] [<ffffffffa05314b9>] __ieee80211_tx+0x249/0x350 [mac80211] [19699.802899] [<ffffffffa0534c46>] ieee80211_tx_pending+0x146/0x200 [mac80211] [19699.804577] [<ffffffff8106e26e>] tasklet_action+0x6e/0x110 [19699.806557] [<ffffffff8106e747>] __do_softirq+0xf7/0x240 [19699.808500] [<ffffffff8106e8c0>] run_ksoftirqd+0x30/0x50 [19699.810428] [<ffffffff810932ef>] smpboot_thread_fn+0xff/0x1b0 [19699.812348] [<ffffffff810931f0>] ? lg_local_lock+0x40/0x40 [19699.844129] [<ffffffff8108b0d0>] kthread+0xc0/0xd0 [19699.845599] [<ffffffff8108b010>] ? insert_kthread_work+0x40/0x40 [19699.847044] [<ffffffff8167207c>] ret_from_fork+0x7c/0xb0 [19699.848437] [<ffffffff8108b010>] ? insert_kthread_work+0x40/0x40 [19699.849857] Code: 10 8b 87 c8 00 00 00 48 89 44 24 08 48 8b 87 d8 00 00 00 48 c7 c7 e8 ca a8 81 48 89 04 24 31 [19699.851507] RIP [<ffffffff81665536>] __skb_pull.part.40+0x4/0x6 [19699.852951] RSP <ffff88081aaadc18> crash 7.0.3-1.fc20: KERNEL: /usr/lib/debug/lib/modules/3.12.6-300.fc20.x86_64/vmlinux DUMPFILE: /var/crash/127.0.0.1-2014.01.04-17:40:09/vmcore [PARTIAL DUMP] CPUS: 8 DATE: Sat Jan 4 17:40:05 2014 UPTIME: 05:28:36 LOAD AVERAGE: 1.45, 1.42, 1.05 TASKS: 588 NODENAME: <snip> RELEASE: 3.12.6-300.fc20.x86_64 VERSION: #1 SMP Mon Dec 23 16:44:31 UTC 2013 MACHINE: x86_64 (3492 Mhz) MEMORY: 32 GB PANIC: "kernel BUG at include/linux/skbuff.h:1434!" PID: 43 COMMAND: "ksoftirqd/7" TASK: ffff88081aa7afd0 [THREAD_INFO: ffff88081aaac000] CPU: 7 STATE: TASK_RUNNING (PANIC) crash> bt PID: 43 TASK: ffff88081aa7afd0 CPU: 7 COMMAND: "ksoftirqd/7" #0 [ffff88081aaad900] machine_kexec at ffffffff810495e2 #1 [ffff88081aaad950] crash_kexec at ffffffff810db133 #2 [ffff88081aaada18] oops_end at ffffffff8166ae60 #3 [ffff88081aaada40] die at ffffffff81015c2b #4 [ffff88081aaada70] do_trap at ffffffff8166a6f0 #5 [ffff88081aaadac0] do_invalid_op at ffffffff81012fa5 #6 [ffff88081aaadb60] invalid_op at ffffffff816737de [exception RIP: __skb_pull+4] RIP: ffffffff81665536 RSP: ffff88081aaadc18 RFLAGS: 00010287 RAX: 000000002a2058db RBX: ffff88081aaadc58 RCX: 0000000000000000 RDX: 000000000000001a RSI: 000000000000006b RDI: ffff88081a045f80 RBP: ffff88081aaadc18 R8: 0000c050faff7f5e R9: 000152e15d046df4 R10: 52e15d046df455d6 R11: fef2260000004188 R12: ffff88081a045f80 R13: ffff88081a045f80 R14: ffff88063abec600 R15: ffff88081aaadd38 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 #7 [ffff88081aaadc20] skb_pull at ffffffff815561f3 #8 [ffff88081aaadc30] rt2x00crypto_tx_remove_iv at ffffffffa02ce5e4 [rt2x00lib] #9 [ffff88081aaadc50] rt2x00queue_write_tx_frame at ffffffffa02cb845 [rt2x00lib] #10 [ffff88081aaadcb8] rt2x00mac_tx at ffffffffa02c8d88 [rt2x00lib] #11 [ffff88081aaadd08] __ieee80211_tx at ffffffffa05314b9 [mac80211] #12 [ffff88081aaadd70] ieee80211_tx_pending at ffffffffa0534c46 [mac80211] #13 [ffff88081aaaddd8] tasklet_action at ffffffff8106e26e #14 [ffff88081aaaddf8] __do_softirq at ffffffff8106e747 #15 [ffff88081aaade68] run_ksoftirqd at ffffffff8106e8c0 #16 [ffff88081aaade80] smpboot_thread_fn at ffffffff810932ef #17 [ffff88081aaaded0] kthread at ffffffff8108b0d0 #18 [ffff88081aaadf50] ret_from_fork at ffffffff8167207c crash> bt -l PID: 43 TASK: ffff88081aa7afd0 CPU: 7 COMMAND: "ksoftirqd/7" #0 [ffff88081aaad900] machine_kexec at ffffffff810495e2 /usr/src/debug/kernel-3.12.fc20/linux-3.12.6-300.fc20.x86_64/arch/x86/kernel/machine_kexec_64.c: 266 #1 [ffff88081aaad950] crash_kexec at ffffffff810db133 /usr/src/debug/kernel-3.12.fc20/linux-3.12.6-300.fc20.x86_64/kernel/kexec.c: 1106 #2 [ffff88081aaada18] oops_end at ffffffff8166ae60 /usr/src/debug/kernel-3.12.fc20/linux-3.12.6-300.fc20.x86_64/arch/x86/kernel/dumpstack.c: 225 #3 [ffff88081aaada40] die at ffffffff81015c2b /usr/src/debug/kernel-3.12.fc20/linux-3.12.6-300.fc20.x86_64/arch/x86/kernel/dumpstack.c: 305 #4 [ffff88081aaada70] do_trap at ffffffff8166a6f0 /usr/src/debug/kernel-3.12.fc20/linux-3.12.6-300.fc20.x86_64/arch/x86/kernel/traps.c: 175 #5 [ffff88081aaadac0] do_invalid_op at ffffffff81012fa5 /usr/src/debug/kernel-3.12.fc20/linux-3.12.6-300.fc20.x86_64/arch/x86/kernel/traps.c: 218 #6 [ffff88081aaadb60] invalid_op at ffffffff816737de /usr/src/debug/kernel-3.12.fc20/linux-3.12.6-300.fc20.x86_64/arch/x86/kernel/entry_64.S: 1306 [exception RIP: __skb_pull+4] RIP: ffffffff81665536 RSP: ffff88081aaadc18 RFLAGS: 00010287 RAX: 000000002a2058db RBX: ffff88081aaadc58 RCX: 0000000000000000 RDX: 000000000000001a RSI: 000000000000006b RDI: ffff88081a045f80 RBP: ffff88081aaadc18 R8: 0000c050faff7f5e R9: 000152e15d046df4 R10: 52e15d046df455d6 R11: fef2260000004188 R12: ffff88081a045f80 R13: ffff88081a045f80 R14: ffff88063abec600 R15: ffff88081aaadd38 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 #7 [ffff88081aaadc20] skb_pull at ffffffff815561f3 /usr/src/debug/kernel-3.12.fc20/linux-3.12.6-300.fc20.x86_64/net/core/skbuff.c: 1307 #8 [ffff88081aaadc30] rt2x00crypto_tx_remove_iv at ffffffffa02ce5e4 [rt2x00lib] #9 [ffff88081aaadc50] rt2x00queue_write_tx_frame at ffffffffa02cb845 [rt2x00lib] #10 [ffff88081aaadcb8] rt2x00mac_tx at ffffffffa02c8d88 [rt2x00lib] #11 [ffff88081aaadd08] __ieee80211_tx at ffffffffa05314b9 [mac80211] #12 [ffff88081aaadd70] ieee80211_tx_pending at ffffffffa0534c46 [mac80211] #13 [ffff88081aaaddd8] tasklet_action at ffffffff8106e26e /usr/src/debug/kernel-3.12.fc20/linux-3.12.6-300.fc20.x86_64/arch/x86/include/asm/bitops.h: 111 #14 [ffff88081aaaddf8] __do_softirq at ffffffff8106e747 /usr/src/debug/kernel-3.12.fc20/linux-3.12.6-300.fc20.x86_64/kernel/softirq.c: 251 #15 [ffff88081aaade68] run_ksoftirqd at ffffffff8106e8c0 /usr/src/debug/kernel-3.12.fc20/linux-3.12.6-300.fc20.x86_64/kernel/softirq.c: 775 #16 [ffff88081aaade80] smpboot_thread_fn at ffffffff810932ef /usr/src/debug/kernel-3.12.fc20/linux-3.12.6-300.fc20.x86_64/kernel/smpboot.c: 160 #17 [ffff88081aaaded0] kthread at ffffffff8108b0d0 /usr/src/debug/kernel-3.12.fc20/linux-3.12.6-300.fc20.x86_64/kernel/kthread.c: 200 #18 [ffff88081aaadf50] ret_from_fork at ffffffff8167207c /usr/src/debug/kernel-3.12.fc20/linux-3.12.6-300.fc20.x86_64/arch/x86/kernel/entry_64.S: 555 crash> bt -F PID: 43 TASK: ffff88081aa7afd0 CPU: 7 COMMAND: "ksoftirqd/7" #0 [ffff88081aaad900] machine_kexec at ffffffff810495e2 ffff88081aaad908: 000000007ffafbff ffff880000000000 ffff88081aaad918: 0000000030001000 ffff880030001000 ffff88081aaad928: 0000000030000000 0000000000000000 ffff88081aaad938: ffff88081aaadb68 ffff88081aaad958 ffff88081aaad948: ffff88081aaada10 crash_kexec+99 #1 [ffff88081aaad950] crash_kexec at ffffffff810db133 ffff88081aaad958: ffff88081aaadd38 ffff88063abec600 ffff88081aaad968: [skbuff_head_cache] [skbuff_head_cache] ffff88081aaad978: ffff88081aaadc18 ffff88081aaadc58 ffff88081aaad988: fef2260000004188 52e15d046df455d6 ffff88081aaad998: 000152e15d046df4 0000c050faff7f5e ffff88081aaad9a8: 000000002a2058db 0000000000000000 ffff88081aaad9b8: 000000000000001a 000000000000006b ffff88081aaad9c8: [skbuff_head_cache] ffffffffffffffff ffff88081aaad9d8: __skb_pull+4 0000000000000010 ffff88081aaad9e8: 0000000000010287 ffff88081aaadc18 ffff88081aaad9f8: 0000000000000018 000000000000000b ffff88081aaada08: ffff88081aaadb68 ffff88081aaada38 ffff88081aaada18: oops_end+176 #2 [ffff88081aaada18] oops_end at ffffffff8166ae60 ffff88081aaada20: ffff88081aaadb68 0000000000000246 ffff88081aaada30: kallsyms_token_index+6277 ffff88081aaada68 ffff88081aaada40: die+75 #3 [ffff88081aaada40] die at ffffffff81015c2b ffff88081aaada48: ffff88081aaadb68 [task_struct] ffff88081aaada58: [skbuff_head_cache] 0000000000000006 ffff88081aaada68: ffff88081aaadab8 do_trap+96 #4 [ffff88081aaada70] do_trap at ffffffff8166a6f0 ffff88081aaada78: ffff88063abec600 ffff88081aaadd38 ffff88081aaada88: kallsyms_token_index+6277 ffff88081aaadb68 ffff88081aaada98: 0000000000000000 [skbuff_head_cache] ffff88081aaadaa8: ffff88063abec600 ffff88081aaadd38 ffff88081aaadab8: ffff88081aaadb58 do_invalid_op+149 #5 [ffff88081aaadac0] do_invalid_op at ffffffff81012fa5 ffff88081aaadac8: 0000000000000004 [pid] ffff88081aaadad8: __skb_pull+4 [scsi_cmd_cache] ffff88081aaadae8: ffff88081aaadb20 set_track+97 ffff88081aaadaf8: 000000100000000f [scsi_cmd_cache] ffff88081aaadb08: init_object+61 ffffea0020556c00 ffff88081aaadb18: [kmem_cache] ffff88081aaadb78 ffff88081aaadb28: free_debug_processing+478 [blkdev_requests] ffff88081aaadb38: ffffea0001e66800 [kmem_cache] ffff88081aaadb48: 0000000000000001 [skbuff_head_cache] ffff88081aaadb58: ffff88081aaadc18 invalid_op+30 #6 [ffff88081aaadb60] invalid_op at ffffffff816737de [exception RIP: __skb_pull+4] RIP: ffffffff81665536 RSP: ffff88081aaadc18 RFLAGS: 00010287 RAX: 000000002a2058db RBX: ffff88081aaadc58 RCX: 0000000000000000 RDX: 000000000000001a RSI: 000000000000006b RDI: ffff88081a045f80 RBP: ffff88081aaadc18 R8: 0000c050faff7f5e R9: 000152e15d046df4 R10: 52e15d046df455d6 R11: fef2260000004188 R12: ffff88081a045f80 R13: ffff88081a045f80 R14: ffff88063abec600 R15: ffff88081aaadd38 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 ffff88081aaadb68: ffff88081aaadd38 ffff88063abec600 ffff88081aaadb78: [skbuff_head_cache] [skbuff_head_cache] ffff88081aaadb88: ffff88081aaadc18 ffff88081aaadc58 ffff88081aaadb98: fef2260000004188 52e15d046df455d6 ffff88081aaadba8: 000152e15d046df4 0000c050faff7f5e ffff88081aaadbb8: 000000002a2058db 0000000000000000 ffff88081aaadbc8: 000000000000001a 000000000000006b ffff88081aaadbd8: [skbuff_head_cache] ffffffffffffffff ffff88081aaadbe8: __skb_pull+4 0000000000000010 ffff88081aaadbf8: 0000000000010287 ffff88081aaadc18 ffff88081aaadc08: 0000000000000018 ffff88081aaadc48 ffff88081aaadc18: ffff88081aaadc28 skb_pull+51 #7 [ffff88081aaadc20] skb_pull at ffffffff815561f3 ffff88081aaadc28: ffff88081aaadc48 rt2x00crypto_tx_remove_iv+84 #8 [ffff88081aaadc30] rt2x00crypto_tx_remove_iv at ffffffffa02ce5e4 [rt2x00lib] ffff88081aaadc38: 0000000000000000 [kmalloc-1024] ffff88081aaadc48: ffff88081aaadcb0 rt2x00queue_write_tx_frame+677 #9 [ffff88081aaadc50] rt2x00queue_write_tx_frame at ffffffffa02cb845 [rt2x00lib] ffff88081aaadc58: 0000000000001688 00000007001a01ce ffff88081aaadc68: 0000000300000000 0000000200000000 ffff88081aaadc78: 0000000000000002 0000006b001a006b ffff88081aaadc88: [kmalloc-1024] [skbuff_head_cache] ffff88081aaadc98: ffff88063abed7c0 ffff88063abec600 ffff88081aaadca8: ffff88081aaadd38 ffff88081aaadd00 ffff88081aaadcb8: rt2x00mac_tx+168 #10 [ffff88081aaadcb8] rt2x00mac_tx at ffffffffa02c8d88 [rt2x00lib] ffff88081aaadcc0: ffff88081aaadce8 ffff88081aaadce8 ffff88081aaadcd0: timerqueue_add+96 ffff88063abec720 ffff88081aaadce0: ffff88063abec600 0000000000000002 ffff88081aaadcf0: [skbuff_head_cache] ffff88081aaadd90 ffff88081aaadd00: ffff88081aaadd68 __ieee80211_tx+585 #11 [ffff88081aaadd08] __ieee80211_tx at ffffffffa05314b9 [mac80211] ffff88081aaadd10: 000001ce8109ef40 [kmalloc-8192] ffff88081aaadd20: 0000000000000000 ffff88081aaadd90 ffff88081aaadd30: 0000000141880180 0000000000000000 ffff88081aaadd40: ffff88063abed0f0 ffff88063abec720 ffff88081aaadd50: [skbuff_head_cache] ffff88081aaadd90 ffff88081aaadd60: ffff88063abec610 ffff88081aaaddd0 ffff88081aaadd70: ieee80211_tx_pending+326 #12 [ffff88081aaadd70] ieee80211_tx_pending at ffffffffa0534c46 [mac80211] ffff88081aaadd78: 0000000000000001 000000021aaaddb8 ffff88081aaadd88: ffff88063abec600 ffff88081aaadd90 ffff88081aaadd98: ffff88081aaadd90 0000000000000000 ffff88081aaadda8: ffff88063abed248 0000000000000000 ffff88081aaaddb8: softirq_threads softirq_vec+48 ffff88081aaaddc8: 0000000000000001 ffff88081aaaddf0 ffff88081aaaddd8: tasklet_action+110 #13 [ffff88081aaaddd8] tasklet_action at ffffffff8106e26e ffff88081aaadde0: 0000000000000006 0000000000000006 ffff88081aaaddf0: ffff88081aaade60 __do_softirq+247 #14 [ffff88081aaaddf8] __do_softirq at ffffffff8106e747 ffff88081aaade00: ffff88081aaadfd8 0000000a0420a040 ffff88081aaade10: 000000010128452b 0000000000000006 ffff88081aaade20: ffff88081aaadfd8 ffff88081aaadfd8 ffff88081aaade30: 0000010000000007 [task_struct] ffff88081aaade40: [kmalloc-16] softirq_threads ffff88081aaade50: [task_struct] [task_struct] ffff88081aaade60: ffff88081aaade78 run_ksoftirqd+48 #15 [ffff88081aaade68] run_ksoftirqd at ffffffff8106e8c0 ffff88081aaade70: 0000000781667d89 ffff88081aaadec8 ffff88081aaade80: smpboot_thread_fn+255 #16 [ffff88081aaade80] smpboot_thread_fn at ffffffff810932ef ffff88081aaade88: 0000000000000000 ffff88081aaadea0 ffff88081aaade98: 0000000000000001 ffff88081aecdd30 ffff88081aaadea8: [kmalloc-16] smpboot_thread_fn ffff88081aaadeb8: 0000000000000000 0000000000000000 ffff88081aaadec8: ffff88081aaadf48 kthread+192 #17 [ffff88081aaaded0] kthread at ffffffff8108b0d0 ffff88081aaaded8: 0000000000000001 b48592a000000007 ffff88081aaadee8: [kmalloc-16] ed0ca34e00000000 ffff88081aaadef8: ed8e845800030003 ffff88081aaadf00 ffff88081aaadf08: ffff88081aaadf00 [anon_vma_chain] ffff88081aaadf18: ffffffff00000000 ffff88081aaadf20 ffff88081aaadf28: ffff88081aaadf20 kthread ffff88081aaadf38: 0000000000000000 0000000000000000 ffff88081aaadf48: ffff88081aecdd30 ret_from_fork+124 #18 [ffff88081aaadf50] ret_from_fork at ffffffff8167207c
Created attachment 120971 [details] Backtrace using crash/vmcore
Moving to drivers/wireless as this seems to be the driver not properly checking if it got a short frame with no iv block and then trying to remove data that was not present.
I suspect that txdesc->iv_len has somehow wrong value, but not sure how this could happen. Is this bug reproducible ? Could you provide vmcore file for download somewhere ?
I do not know how to reproduce the crash. I have been having sporadic crashes, usually while the machine is unattended, for the past month or so. Since I got kdump working and submitted this bug, I have removed the hardware from the machine, but if it would be useful, I can use the device again and see if I can capture another vmcore. I have sent the original vmcore corresponding to the trace above to Stanislaw privately.
Thanks, I'm looking in vmcore now, but analyzing memory dump can be hard, so it can take some time ...
crash> struct sk_buff ffff88081a045f80 | head -20 struct sk_buff { next = 0x0, prev = 0x0, tstamp = { tv64 = 0 }, sk = 0xffff8807c7fde880, dev = 0xffff8808039b42a0, cb = "P\000\200@\001\002\000\000\000\000\a(\000\000\000\000\000\000\000\000\000\000\000\000\252\252\003\000\000\000\b\000E\000\001\254lZ@\000\004\021T\256\300\250\003\226", _skb_refdst = 7784309262464843759, sp = 0x49544f4ef6959801, len = 706762971, data_len = 1414809632, mac_len = 12112, hdr_len = 11825, { csum = 1208618289, { csum_start = 3377, csum_offset = 18442 crash> rd ffff88081a045f80 30 ffff88081a045f80: 0000000000000000 0000000000000000 ................ ffff88081a045f90: 0000000000000000 ffff8807c7fde880 ................ ffff88081a045fa0: ffff8808039b42a0 0000020140800050 .B......P..@.... ffff88081a045fb0: 0000000028070000 0000000000000000 ...(............ ffff88081a045fc0: 000800000003aaaa 00405a6cac010045 ........E...lZ@. ffff88081a045fd0: 9603a8c0ae541104 6c076c07faffffef ..T..........l.l ffff88081a045fe0: 49544f4ef6959801 545448202a2058db ....NOTI.X * HTT ffff88081a045ff0: 480a0d312e312f50 393332203a74736f P/1.1..Host: 239 ffff88081a046000: 3535322e3535322e 3039313a3035322e .255.255.250:190 ffff88081a046010: 65686361430a0d30 6c6f72746e6f432d 0..Cache-Control ffff88081a046020: 67612d78616d203a 0000000000313d65 : max-age=1..... ffff88081a046030: 0000000000000000 0000000000000000 ................ ffff88081a046040: 0000003e00600074 000002c00000020c t.`.>........... ffff88081a046050: ffff8805399ca4f8 ffff8805399ca536 ...9....6..9.... ffff88081a046060: 0000000100000500 cccccccccccccccc ................ sk_buff structure is corrupted by network packet (there is HTTP data where actual skb len & data_len values should be). Alex, please install fedora kernel-debug, try run it for some time and check if it detect some problems. If not, I will provide you kernel compiled with CONFIG_DEBUG_PAGEALLOC , which is even more intensive memory corruption debug method, than are used in kernel-debug, but it slow down performance of the kernel vastly. If this is software bug, it should be detectable by above methods, but this could be also firmware bug or DMA settings bug, which are not easy detectable. rt2x00usb driver does not set DMA mappings directly, it is done by usb host driver, which one are you using ("lsusb -t" should show that) ?
Thank you Stanislaw. I have downloaded kernel-debug, and will run that exclusively for a while. Some other info: I've been having sporadic crashes since performing a hardware upgrade at the start of December (new motherboard, CPU, RAM and SSD, but same graphics card and rt2800usb device). Previous machine was fairly stable, hardly any unexplained crashes. One other change is that the boot process on the new hardware is via UEFI rather than legacy BIOS. The machine is not overclocked in any way and memory timing is at default settings. Unfortunately the 32G of RAM is not ECC, and non-Xeon Haswell CPUs don't seem to support ECC RAM. Initial memtest86 with the new hardware detected no errors after 12 hours. I've had slub_debug=FZPU on the kernel command line since 2013-12-23. I've had kdump/crashkernel enabled since 2014-01-04, shortly before capturing this crash. The rt2800usb has been physically removed from the computer since 2014-01-04. The frequency of crashes does seem to have decreased since 2014-01-04, but I have had one more crash with IP in __slab_alloc called from __alloc_skb, unix_stream_sendmsg, reported at: https://bugzilla.redhat.com/show_bug.cgi?id=1051476 As you can see, Dave Jones suspected a bitflip, so I ran memtest86 again for another 20 hours, with no errors detected. I've also run single and parallel instances of memtester from userspace for another ~24 hours in total, with no errors detected. --- lsusb -t /: Bus 02.Port 1: Dev 1, Class=root_hub, Driver=xhci_hcd/6p, 5000M |__ Port 5: Dev 2, If 0, Class=Mass Storage, Driver=usb-storage, 5000M /: Bus 01.Port 1: Dev 1, Class=root_hub, Driver=xhci_hcd/14p, 480M |__ Port 5: Dev 2, If 0, Class=Audio, Driver=snd-usb-audio, 12M |__ Port 5: Dev 2, If 1, Class=Audio, Driver=snd-usb-audio, 12M |__ Port 5: Dev 2, If 2, Class=Audio, Driver=snd-usb-audio, 12M |__ Port 6: Dev 3, If 0, Class=Hub, Driver=hub/4p, 480M |__ Port 1: Dev 4, If 0, Class=Hub, Driver=hub/2p, 480M |__ Port 1: Dev 7, If 0, Class=Mass Storage, Driver=usb-storage, 480M |__ Port 2: Dev 8, If 0, Class=Wireless, Driver=btusb, 12M |__ Port 2: Dev 8, If 1, Class=Wireless, Driver=btusb, 12M |__ Port 2: Dev 10, If 0, Class=Vendor Specific Class, Driver=rt2800usb, 480M |__ Port 3: Dev 5, If 0, Class=Human Interface Device, Driver=usbhid, 1.5M |__ Port 4: Dev 6, If 0, Class=Human Interface Device, Driver=usbhid, 1.5M |__ Port 4: Dev 6, If 1, Class=Human Interface Device, Driver=usbhid, 1.5M --- lspci -v 00:00.0 Host bridge: Intel Corporation 4th Gen Core Processor DRAM Controller (rev 06) Subsystem: Gigabyte Technology Co., Ltd Device 5000 Flags: bus master, fast devsel, latency 0 Capabilities: [e0] Vendor Specific Information: Len=0c <?> 00:01.0 PCI bridge: Intel Corporation Xeon E3-1200 v3/4th Gen Core Processor PCI Express x16 Controller (rev 06) (prog-if 00 [Normal decode]) Flags: bus master, fast devsel, latency 0 Bus: primary=00, secondary=01, subordinate=01, sec-latency=0 I/O behind bridge: 0000e000-0000efff Memory behind bridge: e0000000-f00fffff Capabilities: [88] Subsystem: Gigabyte Technology Co., Ltd Device 5000 Capabilities: [80] Power Management version 3 Capabilities: [90] MSI: Enable+ Count=1/1 Maskable- 64bit- Capabilities: [a0] Express Root Port (Slot+), MSI 00 Capabilities: [100] Virtual Channel Capabilities: [140] Root Complex Link Capabilities: [d94] #19 Kernel driver in use: pcieport 00:14.0 USB controller: Intel Corporation 8 Series/C220 Series Chipset Family USB xHCI (rev 04) (prog-if 30 [XHCI]) Subsystem: Gigabyte Technology Co., Ltd Device 5007 Flags: bus master, medium devsel, latency 0, IRQ 43 Memory at f0200000 (64-bit, non-prefetchable) [size=64K] Capabilities: [70] Power Management version 2 Capabilities: [80] MSI: Enable+ Count=1/8 Maskable- 64bit+ Kernel driver in use: xhci_hcd 00:16.0 Communication controller: Intel Corporation 8 Series/C220 Series Chipset Family MEI Controller #1 (rev 04) Subsystem: Gigabyte Technology Co., Ltd Device 1c3a Flags: bus master, fast devsel, latency 0, IRQ 45 Memory at f0218000 (64-bit, non-prefetchable) [size=16] Capabilities: [50] Power Management version 3 Capabilities: [8c] MSI: Enable+ Count=1/1 Maskable- 64bit+ Kernel driver in use: mei_me 00:1b.0 Audio device: Intel Corporation 8 Series/C220 Series Chipset High Definition Audio Controller (rev 04) Subsystem: Gigabyte Technology Co., Ltd Device a002 Flags: bus master, fast devsel, latency 0, IRQ 47 Memory at f0210000 (64-bit, non-prefetchable) [size=16K] Capabilities: [50] Power Management version 2 Capabilities: [60] MSI: Enable+ Count=1/1 Maskable- 64bit+ Capabilities: [70] Express Root Complex Integrated Endpoint, MSI 00 Capabilities: [100] Virtual Channel Kernel driver in use: snd_hda_intel 00:1c.0 PCI bridge: Intel Corporation 8 Series/C220 Series Chipset Family PCI Express Root Port #1 (rev d4) (prog-if 00 [Normal decode]) Flags: bus master, fast devsel, latency 0 Bus: primary=00, secondary=02, subordinate=02, sec-latency=0 Capabilities: [40] Express Root Port (Slot-), MSI 00 Capabilities: [80] MSI: Enable- Count=1/1 Maskable- 64bit- Capabilities: [90] Subsystem: Gigabyte Technology Co., Ltd Device 5001 Capabilities: [a0] Power Management version 3 Kernel driver in use: pcieport 00:1c.2 PCI bridge: Intel Corporation 8 Series/C220 Series Chipset Family PCI Express Root Port #3 (rev d4) (prog-if 00 [Normal decode]) Flags: bus master, fast devsel, latency 0 Bus: primary=00, secondary=03, subordinate=03, sec-latency=0 I/O behind bridge: 0000d000-0000dfff Memory behind bridge: f0100000-f01fffff Capabilities: [40] Express Root Port (Slot+), MSI 00 Capabilities: [80] MSI: Enable- Count=1/1 Maskable- 64bit- Capabilities: [90] Subsystem: Gigabyte Technology Co., Ltd Device 5001 Capabilities: [a0] Power Management version 3 Kernel driver in use: pcieport 00:1c.3 PCI bridge: Intel Corporation 82801 PCI Bridge (rev d4) (prog-if 01 [Subtractive decode]) Flags: bus master, fast devsel, latency 0 Bus: primary=00, secondary=04, subordinate=05, sec-latency=0 Capabilities: [40] Express Root Port (Slot+), MSI 00 Capabilities: [80] MSI: Enable- Count=1/1 Maskable- 64bit- Capabilities: [90] Subsystem: Gigabyte Technology Co., Ltd Device 5001 Capabilities: [a0] Power Management version 3 00:1f.0 ISA bridge: Intel Corporation Z87 Express LPC Controller (rev 04) Subsystem: Gigabyte Technology Co., Ltd Device 5001 Flags: bus master, medium devsel, latency 0 Capabilities: [e0] Vendor Specific Information: Len=0c <?> Kernel driver in use: lpc_ich 00:1f.2 SATA controller: Intel Corporation 8 Series/C220 Series Chipset Family 6-port SATA Controller 1 [AHCI mode] (rev 04) (prog-if 01 [AHCI 1.0]) Subsystem: Gigabyte Technology Co., Ltd Device b005 Flags: bus master, 66MHz, medium devsel, latency 0, IRQ 42 I/O ports at f070 [size=8] I/O ports at f060 [size=4] I/O ports at f050 [size=8] I/O ports at f040 [size=4] I/O ports at f020 [size=32] Memory at f0216000 (32-bit, non-prefetchable) [size=2K] Capabilities: [80] MSI: Enable+ Count=1/1 Maskable- 64bit- Capabilities: [70] Power Management version 3 Capabilities: [a8] SATA HBA v1.0 Kernel driver in use: ahci 00:1f.3 SMBus: Intel Corporation 8 Series/C220 Series Chipset Family SMBus Controller (rev 04) Subsystem: Gigabyte Technology Co., Ltd Device 5001 Flags: medium devsel, IRQ 18 Memory at f0215000 (64-bit, non-prefetchable) [size=256] I/O ports at f000 [size=32] 01:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Juniper XT [Radeon HD 6770] (prog-if 00 [VGA controller]) Subsystem: Gigabyte Technology Co., Ltd Device 220c Flags: bus master, fast devsel, latency 0, IRQ 44 Memory at e0000000 (64-bit, prefetchable) [size=256M] Memory at f0020000 (64-bit, non-prefetchable) [size=128K] I/O ports at e000 [size=256] Expansion ROM at f0000000 [disabled] [size=128K] Capabilities: [50] Power Management version 3 Capabilities: [58] Express Legacy Endpoint, MSI 00 Capabilities: [a0] MSI: Enable+ Count=1/1 Maskable- 64bit+ Capabilities: [100] Vendor Specific Information: ID=0001 Rev=1 Len=010 <?> Capabilities: [150] Advanced Error Reporting Kernel driver in use: radeon 01:00.1 Audio device: Advanced Micro Devices, Inc. [AMD/ATI] Juniper HDMI Audio [Radeon HD 5700 Series] Subsystem: Gigabyte Technology Co., Ltd Device aa58 Flags: bus master, fast devsel, latency 0, IRQ 48 Memory at f0040000 (64-bit, non-prefetchable) [size=16K] Capabilities: [50] Power Management version 3 Capabilities: [58] Express Legacy Endpoint, MSI 00 Capabilities: [a0] MSI: Enable+ Count=1/1 Maskable- 64bit+ Capabilities: [100] Vendor Specific Information: ID=0001 Rev=1 Len=010 <?> Capabilities: [150] Advanced Error Reporting Kernel driver in use: snd_hda_intel 03:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller (rev 06) Subsystem: Gigabyte Technology Co., Ltd Motherboard Flags: bus master, fast devsel, latency 0, IRQ 46 I/O ports at d000 [size=256] Memory at f0104000 (64-bit, non-prefetchable) [size=4K] Memory at f0100000 (64-bit, prefetchable) [size=16K] Capabilities: [40] Power Management version 3 Capabilities: [50] MSI: Enable+ Count=1/1 Maskable- 64bit+ Capabilities: [70] Express Endpoint, MSI 01 Capabilities: [b0] MSI-X: Enable- Count=4 Masked- Capabilities: [d0] Vital Product Data Capabilities: [100] Advanced Error Reporting Capabilities: [140] Virtual Channel Capabilities: [160] Device Serial Number 01-00-00-00-68-4c-e0-00 Kernel driver in use: r8169 04:00.0 PCI bridge: Intel Corporation 82801 PCI Bridge (rev 41) (prog-if 01 [Subtractive decode]) Flags: bus master, fast devsel, latency 0 Bus: primary=04, secondary=05, subordinate=05, sec-latency=32 Capabilities: [90] Power Management version 2 Capabilities: [a0] Subsystem: Gigabyte Technology Co., Ltd Device 8892 --- cat /proc/interrupts CPU0 CPU1 CPU2 CPU3 CPU4 CPU5 CPU6 CPU7 0: 134 0 0 0 0 0 0 0 IR-IO-APIC-edge timer 1: 15 0 0 0 0 0 0 0 IR-IO-APIC-edge i8042 8: 1 0 0 0 0 0 0 0 IR-IO-APIC-edge rtc0 9: 0 0 0 0 0 0 0 0 IR-IO-APIC-fasteoi acpi 40: 0 0 0 0 0 0 0 0 DMAR_MSI-edge dmar0 42: 17331 0 0 0 0 0 183103 0 IR-PCI-MSI-edge ahci 43: 212972 0 0 0 0 0 0 0 IR-PCI-MSI-edge xhci_hcd 44: 157 0 0 0 0 0 0 51950 IR-PCI-MSI-edge radeon 45: 13 0 0 0 0 0 0 0 IR-PCI-MSI-edge mei_me 46: 95727 0 0 0 0 0 0 0 IR-PCI-MSI-edge p4p1 47: 1579 0 0 0 0 0 0 0 IR-PCI-MSI-edge snd_hda_intel 48: 7224 0 0 0 0 0 0 0 IR-PCI-MSI-edge snd_hda_intel NMI: 49106 715007 30487 59315 14622 39729 216281 36009 Non-maskable interrupts LOC: 351949 451316 237676 336484 221566 270926 554846 464766 Local timer interrupts SPU: 0 0 0 0 0 0 0 0 Spurious interrupts PMI: 49106 715007 30487 59315 14622 39729 216281 36009 Performance monitoring interrupts IWI: 16272 10259 7591 10770 7830 14900 9828 23661 IRQ work interrupts RTR: 0 0 0 0 0 0 0 0 APIC ICR read retries RES: 15533 8333 5785 5419 30757 32330 21055 21429 Rescheduling interrupts CAL: 3892 2352 2214 2289 989 883 828 835 Function call interrupts TLB: 12772 14182 5244 13896 12839 10366 6443 10613 TLB shootdowns TRM: 113 113 113 113 113 113 113 113 Thermal event interrupts THR: 0 0 0 0 0 0 0 0 Threshold APIC interrupts MCE: 0 0 0 0 0 0 0 0 Machine check exceptions MCP: 26 26 26 26 26 26 26 26 Machine check polls ERR: 0 MIS: 0 --- Once again, thanks for your help.
OK, I have a new error message using 3.12.7-300.fc20.x86_64+debug, after more than 34 hours uptime and many more error-free loops of memtester: ============================================================================= BUG vm_area_struct (Not tainted): Poison overwritten ----------------------------------------------------------------------------- INFO: 0xffff8802ca728039-0xffff8802ca728039. First byte 0x4b instead of 0x6b INFO: Allocated in dup_mm+0x230/0x710 age=112633 cpu=0 pid=369 __slab_alloc+0x3eb/0x4fe kmem_cache_alloc+0x294/0x340 dup_mm+0x230/0x710 copy_process.part.23+0x12d4/0x1890 do_fork+0xce/0x450 SyS_clone+0x16/0x20 stub_clone+0x69/0x90 INFO: Freed in remove_vma+0x76/0x80 age=109627 cpu=5 pid=28464 __slab_free+0x3a/0x382 kmem_cache_free+0x356/0x370 remove_vma+0x76/0x80 exit_mmap+0xf4/0x170 mmput+0x7f/0x110 do_exit+0x2a5/0xcd0 do_group_exit+0x4c/0xc0 SyS_exit_group+0x14/0x20 system_call_fastpath+0x16/0x1b INFO: Slab 0xffffea000b29ca00 objects=32 used=32 fp=0x (null) flags=0x5ff00000004080 INFO: Object 0xffff8802ca728000 @offset=0 fp=0xffff8802ca72aa00 Object ffff8802ca728000: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk Object ffff8802ca728010: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk Object ffff8802ca728020: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk Object ffff8802ca728030: 6b 6b 6b 6b 6b 6b 6b 6b 6b 4b 6b 6b 6b 6b 6b 6b kkkkkkkkkKkkkkkk Object ffff8802ca728040: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk Object ffff8802ca728050: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk Object ffff8802ca728060: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk Object ffff8802ca728070: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk Object ffff8802ca728080: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk Object ffff8802ca728090: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk Object ffff8802ca7280a0: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk Object ffff8802ca7280b0: 6b 6b 6b 6b 6b 6b 6b a5 kkkkkkk. Redzone ffff8802ca7280b8: bb bb bb bb bb bb bb bb ........ Padding ffff8802ca7281f8: 5a 5a 5a 5a 5a 5a 5a 5a ZZZZZZZZ CPU: 7 PID: 5195 Comm: Xorg Tainted: G B 3.12.7-300.fc20.x86_64+debug #1 Hardware name: Gigabyte Technology Co., Ltd. Z87-HD3/Z87-HD3, BIOS F6 08/03/2013 ffff8802ca728000 ffff8807ec29bbd8 ffffffff81749d2b ffff88081d019200 ffff8807ec29bc18 ffffffff811d40fd 0000000000000008 ffff880200000001 ffff8802ca72803a ffff88081d019200 000000000000006b ffff8802ca728000 Call Trace: [<ffffffff81749d2b>] dump_stack+0x54/0x74 [<ffffffff811d40fd>] print_trailer+0x14d/0x200 [<ffffffff811d42ef>] check_bytes_and_report+0xcf/0x110 [<ffffffff811d5177>] check_object+0x1d7/0x250 [<ffffffff811b28e8>] ? mmap_region+0x348/0x5d0 [<ffffffff817474b2>] alloc_debug_processing+0x76/0x118 [<ffffffff817480ed>] __slab_alloc+0x3eb/0x4fe [<ffffffffa00171e9>] ? drm_gem_object_lookup+0x29/0x160 [drm] [<ffffffff811b28e8>] ? mmap_region+0x348/0x5d0 [<ffffffff811d6cc4>] kmem_cache_alloc+0x294/0x340 [<ffffffff811b28e8>] ? mmap_region+0x348/0x5d0 [<ffffffff811b28e8>] mmap_region+0x348/0x5d0 [<ffffffff811b2ed0>] do_mmap_pgoff+0x360/0x3f0 [<ffffffff8119cf50>] vm_mmap_pgoff+0x90/0xc0 [<ffffffff811b1423>] SyS_mmap_pgoff+0x1d3/0x270 [<ffffffff8101e7e2>] SyS_mmap+0x22/0x30 [<ffffffff8175d029>] system_call_fastpath+0x16/0x1b FIX vm_area_struct: Restoring 0xffff8802ca728039-0xffff8802ca728039=0x6b FIX vm_area_struct: Marking all objects used --- Possibly related: afterwards I ran slabinfo --validate, and saw this message: SLUB: vm_area_struct 1086 slabs counted but counter=1087 Is there enough info here to figure out who scribbled on the slab, or is it another mysterious bitflip? Process 369 is systemd-udevd, but process 28464 had already exited by the time I looked. I had one previous "BUG" message from the same boot, "MAX_LOCKDEP_ENTRIES too low!", but it didn't seem worthy of a report. Saved details in case anyone is interested. I've also seen various stack depth reports that made me nervous, the latest being: btrfs (2954) used greatest stack depth: 1856 bytes left If i understand correctly, that means ~78% of the stack was used. Is 22% an adequate safety margin, or am I at risk of stack overflow? I don't use LVM or network mounts, so there shouldn't be that much layering involved.
This is bit flip again (from 0x6b to 0x4b). According to that the problems start to happen after replacing motherboard this looks most likely some hardware (or firmware) fault. There is still possibility of kernel bug, because with H/W you start to use different drivers, but usually kernel/driver memory corruption bugs are not single bit flip, but some memory override. I lunched kernel build with DEBUG_PAGEALLOC here: http://koji.fedoraproject.org/koji/taskinfo?taskID=6434128 When driver or some kernel subsystem start to write to memory which do not belongs to it, that kernel will crash. That allow detect actual broken module. You can use debug_guardpage_minorder=1 (or 2) to increase protected area, hence increase probability to find faulty driver. Please try that kernel, possibly with rt2x00 driver to increase chance to catch corruption. If it will not detect a fault, or kernel will crash in old way, that will probably mean that problem lies in H/W. In such case you can divide memory into two half then use only first and then second half, and see which one half do not cause corruption. Other than that you can blacklist various modules you are using (i.e. visualization, networking, sound, ... ) and see if that prevent memory corruption.
Thank you for the DEBUG_PAGEALLOC kernel, which I've been running. Here is another instance of slab corruption. ============================================================================= BUG kmalloc-32 (Not tainted): Redzone overwritten ----------------------------------------------------------------------------- Disabling lock debugging due to kernel taint INFO: 0xffff880551f33dab-0xffff880551f33dab. First byte 0x8c instead of 0xcc INFO: Allocated in radeon_fence_emit+0x2d/0x1c0 [radeon] age=119 cpu=3 pid=15902 __slab_alloc+0x3eb/0x4fe kmem_cache_alloc_trace+0x2a8/0x360 radeon_fence_emit+0x2d/0x1c0 [radeon] radeon_ib_schedule+0x222/0x2b0 [radeon] radeon_cs_ioctl+0x929/0xbb0 [radeon] drm_ioctl+0x512/0x650 [drm] do_vfs_ioctl+0x305/0x530 SyS_ioctl+0x81/0xa0 system_call_fastpath+0x16/0x1b INFO: Freed in radeon_semaphore_free+0x55/0x70 [radeon] age=119 cpu=3 pid=15902 __slab_free+0x3a/0x382 kfree+0x2c0/0x2d0 radeon_semaphore_free+0x55/0x70 [radeon] radeon_ib_schedule+0x1ce/0x2b0 [radeon] radeon_cs_ioctl+0x929/0xbb0 [radeon] drm_ioctl+0x512/0x650 [drm] do_vfs_ioctl+0x305/0x530 SyS_ioctl+0x81/0xa0 system_call_fastpath+0x16/0x1b INFO: Slab 0xffffea001547cc80 objects=22 used=12 fp=0xffff880551f32438 flags=0x5ff00000004081 INFO: Object 0xffff880551f33d88 @offset=7560 fp=0xffff880551f325a0 Bytes b4 ffff880551f33d78: 4f e5 cd 01 01 00 00 00 5a 5a 5a 5a 5a 5a 5a 5a O.......ZZZZZZZZ Object ffff880551f33d88: 00 80 5f 0d 08 88 ff ff 00 00 00 00 6b 6b 6b 6b .._.........kkkk Object ffff880551f33d98: 00 00 00 00 00 00 00 00 00 00 00 00 6b 6b 6b a5 ............kkk. Redzone ffff880551f33da8: cc cc cc 8c cc cc cc cc ........ Padding ffff880551f33ee8: 5a 5a 5a 5a 5a 5a 5a 5a ZZZZZZZZ CPU: 3 PID: 15902 Comm: mplayer Tainted: G B 3.12.8-300.bz68171.fc20.x86_64+debug #1 Hardware name: Gigabyte Technology Co., Ltd. Z87-HD3/Z87-HD3, BIOS F6 08/03/2013 ffff880551f33d88 ffff88057b37dae0 ffffffff8174a25b ffff880819004480 ffff88057b37db20 ffffffff811d452d 0000000000000008 ffff880500000001 ffff880551f33dac ffff880819004480 00000000000000cc ffff880551f33d88 Call Trace: [<ffffffff8174a25b>] dump_stack+0x54/0x74 [<ffffffff811d452d>] print_trailer+0x14d/0x200 [<ffffffff811d471f>] check_bytes_and_report+0xcf/0x110 [<ffffffff811d5562>] check_object+0x192/0x250 [<ffffffff81747b3d>] free_debug_processing+0xb9/0x22a [<ffffffff817534e6>] ? _raw_spin_unlock_irqrestore+0x36/0x70 [<ffffffffa009e7e4>] ? radeon_fence_unref+0x34/0x40 [radeon] [<ffffffffa009e7e4>] ? radeon_fence_unref+0x34/0x40 [radeon] [<ffffffff81747ce8>] __slab_free+0x3a/0x382 [<ffffffff81391e4e>] ? debug_check_no_obj_freed+0x14e/0x250 [<ffffffff811d6e1c>] ? kfree+0xbc/0x2d0 [<ffffffffa009e7e4>] ? radeon_fence_unref+0x34/0x40 [radeon] [<ffffffff811d7020>] kfree+0x2c0/0x2d0 [<ffffffffa009e7e4>] radeon_fence_unref+0x34/0x40 [radeon] [<ffffffffa009ed5e>] radeon_sync_obj_unref+0xe/0x10 [radeon] [<ffffffffa006f87a>] ttm_bo_wait+0x13a/0x190 [ttm] [<ffffffffa00a15fe>] radeon_bo_wait+0x9e/0x140 [radeon] [<ffffffffa00b3c52>] radeon_gem_busy_ioctl+0x52/0x130 [radeon] [<ffffffffa0014e82>] drm_ioctl+0x512/0x650 [drm] [<ffffffff8130e1d5>] ? avc_has_perm+0x25/0x350 [<ffffffff81310773>] ? inode_has_perm.isra.48+0x53/0x80 [<ffffffff8120c7e5>] do_vfs_ioctl+0x305/0x530 [<ffffffff81310dab>] ? selinux_file_ioctl+0x5b/0x110 [<ffffffff8120ca91>] SyS_ioctl+0x81/0xa0 [<ffffffff8175d569>] system_call_fastpath+0x16/0x1b FIX kmalloc-32: Restoring 0xffff880551f33dab-0xffff880551f33dab=0xcc ============================================================================= BUG kmalloc-32 (Tainted: G B ): Redzone overwritten ----------------------------------------------------------------------------- INFO: 0xffff880551f33da8-0xffff880551f33daf. First byte 0xcc instead of 0xbb INFO: Allocated in radeon_fence_emit+0x2d/0x1c0 [radeon] age=998 cpu=3 pid=15902 __slab_alloc+0x3eb/0x4fe kmem_cache_alloc_trace+0x2a8/0x360 radeon_fence_emit+0x2d/0x1c0 [radeon] radeon_ib_schedule+0x222/0x2b0 [radeon] radeon_cs_ioctl+0x929/0xbb0 [radeon] drm_ioctl+0x512/0x650 [drm] do_vfs_ioctl+0x305/0x530 SyS_ioctl+0x81/0xa0 system_call_fastpath+0x16/0x1b INFO: Freed in radeon_semaphore_free+0x55/0x70 [radeon] age=998 cpu=3 pid=15902 __slab_free+0x3a/0x382 kfree+0x2c0/0x2d0 radeon_semaphore_free+0x55/0x70 [radeon] radeon_ib_schedule+0x1ce/0x2b0 [radeon] radeon_cs_ioctl+0x929/0xbb0 [radeon] drm_ioctl+0x512/0x650 [drm] do_vfs_ioctl+0x305/0x530 SyS_ioctl+0x81/0xa0 system_call_fastpath+0x16/0x1b INFO: Slab 0xffffea001547cc80 objects=22 used=21 fp=0xffff880551f32e10 flags=0x5ff00000004080 INFO: Object 0xffff880551f33d88 @offset=7560 fp=0xffff880551f325a0 Bytes b4 ffff880551f33d78: 08 e9 cd 01 01 00 00 00 5a 5a 5a 5a 5a 5a 5a 5a ........ZZZZZZZZ Object ffff880551f33d88: 00 80 5f 0d 08 88 ff ff 00 00 00 00 6b 6b 6b 6b .._.........kkkk Object ffff880551f33d98: 00 00 00 00 00 00 00 00 00 00 00 00 6b 6b 6b a5 ............kkk. Redzone ffff880551f33da8: cc cc cc cc cc cc cc cc ........ Padding ffff880551f33ee8: 5a 5a 5a 5a 5a 5a 5a 5a ZZZZZZZZ CPU: 1 PID: 16782 Comm: dinoshade Tainted: G B 3.12.8-300.bz68171.fc20.x86_64+debug #1 Hardware name: Gigabyte Technology Co., Ltd. Z87-HD3/Z87-HD3, BIOS F6 08/03/2013 ffff880551f33d88 ffff880566ead8e8 ffffffff8174a25b ffff880819004480 ffff880566ead928 ffffffff811d452d 0000000000000008 ffff880500000001 ffff880551f33db0 ffff880819004480 00000000000000bb ffff880551f33d88 Call Trace: [<ffffffff8174a25b>] dump_stack+0x54/0x74 [<ffffffff811d452d>] print_trailer+0x14d/0x200 [<ffffffff811d471f>] check_bytes_and_report+0xcf/0x110 [<ffffffff811d5562>] check_object+0x192/0x250 [<ffffffffa00fcf5c>] ? radeon_semaphore_create+0x2c/0xe0 [radeon] [<ffffffff817479e2>] alloc_debug_processing+0x76/0x118 [<ffffffff8174861d>] __slab_alloc+0x3eb/0x4fe [<ffffffffa00fcf5c>] ? radeon_semaphore_create+0x2c/0xe0 [radeon] [<ffffffffa00fdacb>] ? radeon_sa_bo_new+0x27b/0x480 [radeon] [<ffffffff811d7e98>] kmem_cache_alloc_trace+0x2a8/0x360 [<ffffffffa00fcf5c>] ? radeon_semaphore_create+0x2c/0xe0 [radeon] [<ffffffffa00fcf5c>] radeon_semaphore_create+0x2c/0xe0 [radeon] [<ffffffffa00b4700>] radeon_ib_get+0x50/0x110 [radeon] [<ffffffffa00b749d>] radeon_cs_ioctl+0x82d/0xbb0 [radeon] [<ffffffffa0014e82>] drm_ioctl+0x512/0x650 [drm] [<ffffffff81310773>] ? inode_has_perm.isra.48+0x53/0x80 [<ffffffff8120c7e5>] do_vfs_ioctl+0x305/0x530 [<ffffffff81310dab>] ? selinux_file_ioctl+0x5b/0x110 [<ffffffff8120ca91>] SyS_ioctl+0x81/0xa0 [<ffffffff8175d569>] system_call_fastpath+0x16/0x1b FIX kmalloc-32: Restoring 0xffff880551f33da8-0xffff880551f33daf=0xbb FIX kmalloc-32: Marking all objects used --- My main question: does this look like another hardware bitflip? I guess it does, if I'm reading these correctly - if the second reported "redzone overwritten" is just a false alarm, a consequence of SLUB getting confused as to what the redzone should be after the first redzone corruption was repaired. I still don't have a good way to reproduce these errors. I guess I'll just have to try memtest86 again.
> My main question: does this look like another hardware bitflip? Yes. Sorry for not mension that before, it is better to run DEBUG_PAGEALLOC kernel on non-debug fedora kernel variant, as then it have more chance to catch driver corruption. Anyway, problem unfortunately looks more like hardware issue and removing components (i.e stop using some cpu features or some other hardware by blacklisting modules, remove half of physical memory modules, etc.) can be better strategy to figure what cause corruption.
Other users reported corruption on Gigabyte *87* boards as well, so this really looks like h/w problem. Some people report that using only 2 DIMM slots helped for them: http://www.tonymacx86.com/general-help/121042-solved-4-dimm-crashing-freezing-ga-z87.html I think this bug can be closed as duplicate of https://bugzilla.kernel.org/show_bug.cgi?id=64521
*** This bug has been marked as a duplicate of bug 64521 ***