Bug 68171 - rt2x00: skb_pull panic in crypto_tx_remove_iv
Summary: rt2x00: skb_pull panic in crypto_tx_remove_iv
Status: RESOLVED DUPLICATE of bug 64521
Alias: None
Product: Drivers
Classification: Unclassified
Component: network-wireless (show other bugs)
Hardware: x86-64 Linux
: P1 normal
Assignee: drivers_network-wireless@kernel-bugs.osdl.org
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2014-01-05 08:12 UTC by Alex Outhred
Modified: 2014-03-01 21:00 UTC (History)
5 users (show)

See Also:
Kernel Version: 3.12.6-300.fc20.x86_64
Subsystem:
Regression: No
Bisected commit-id:


Attachments
Dmesg (trimmed) (7.41 KB, application/octet-stream)
2014-01-05 08:12 UTC, Alex Outhred
Details
Backtrace using crash/vmcore (13.86 KB, text/plain)
2014-01-05 08:15 UTC, Alex Outhred
Details

Description Alex Outhred 2014-01-05 08:12:45 UTC
Created attachment 120961 [details]
Dmesg (trimmed)

Captured using kdump/kexec. (ABRT seems unable to process for some reason so I did my best manually.) Apologies if it is a duplicate, a quick google didn't find it.

Info on wireless chip from dmesg:

[    4.538099] ieee80211 phy0: rt2x00_set_rt: Info - RT chipset 3572, rev 0223 detected
[    4.563253] ieee80211 phy0: rt2x00_set_rf: Info - RF chipset 0009 detected
[    4.573946] usbcore: registered new interface driver rt2800usb
[   11.956641] ieee80211 phy0: rt2x00lib_request_firmware: Info - Loading firmware file 'rt2870.bin'
[   11.957474] ieee80211 phy0: rt2x00lib_request_firmware: Info - Firmware detected - version: 0.29

Last part of dmesg including panic (note that I have moved some messages to make the panic more legible, originally they were all interleaved as per the timestamps):

[19459.906320] wlp0s20u6u1u2: disassociated from 00:26:f2:fe:d6:55 (Reason: 7)
[19459.965253] wlp0s20u6u1u2: deauthenticating from 00:26:f2:fe:d6:55 by local choice (reason=3)
[19459.965280] cfg80211: Calling CRDA to update world regulatory domain
[19459.966419] wlp0s20u6u1u2: authenticate with 00:26:f2:fe:d6:55
[19459.990863] wlp0s20u6u1u2: send auth to 00:26:f2:fe:d6:55 (try 1/3)
[19459.991041] cfg80211: World regulatory domain updated:
[19459.991043] cfg80211:   (start_freq - end_freq @ bandwidth), (max_antenna_gain, max_eirp)
[19459.991045] cfg80211:   (2402000 KHz - 2472000 KHz @ 40000 KHz), (300 mBi, 2000 mBm)
[19459.991046] cfg80211:   (2457000 KHz - 2482000 KHz @ 40000 KHz), (300 mBi, 2000 mBm)
[19459.991047] cfg80211:   (2474000 KHz - 2494000 KHz @ 20000 KHz), (300 mBi, 2000 mBm)
[19459.991048] cfg80211:   (5170000 KHz - 5250000 KHz @ 40000 KHz), (300 mBi, 2000 mBm)
[19459.991049] cfg80211:   (5735000 KHz - 5835000 KHz @ 40000 KHz), (300 mBi, 2000 mBm)
[19459.992057] wlp0s20u6u1u2: authenticated
[19459.993080] wlp0s20u6u1u2: associate with 00:26:f2:fe:d6:55 (try 1/3)
[19459.994172] wlp0s20u6u1u2: RX AssocResp from 00:26:f2:fe:d6:55 (capab=0x11 status=0 aid=2)
[19460.004413] wlp0s20u6u1u2: associated
[19460.004645] cfg80211: Calling CRDA for country: AU
[19460.008199] cfg80211: Regulatory domain changed to country: AU
[19460.008200] cfg80211:   (start_freq - end_freq @ bandwidth), (max_antenna_gain, max_eirp)
[19460.008201] cfg80211:   (2402000 KHz - 2482000 KHz @ 40000 KHz), (N/A, 2000 mBm)
[19460.008202] cfg80211:   (5170000 KHz - 5250000 KHz @ 40000 KHz), (300 mBi, 2300 mBm)
[19460.008204] cfg80211:   (5250000 KHz - 5330000 KHz @ 40000 KHz), (300 mBi, 2300 mBm)
[19460.008205] cfg80211:   (5735000 KHz - 5835000 KHz @ 40000 KHz), (300 mBi, 3000 mBm)
[19580.152090] ieee80211 phy2: rt2800usb_entry_txstatus_timeout: Warning - TX status timeout for entry 2 in queue 
[19580.152097] ieee80211 phy2: rt2800usb_entry_txstatus_timeout: Warning - TX status timeout for entry 2 in queue 
[19580.152099] ieee80211 phy2: rt2800usb_entry_txstatus_timeout: Warning - TX status timeout for entry 2 in queue 
[19699.726188] wlp0s20u6u1u2: disassociated from 00:26:f2:fe:d6:55 (Reason: 7)
[19699.784571] wlp0s20u6u1u2: deauthenticating from 00:26:f2:fe:d6:55 by local choice (reason=3)
[19699.784605] cfg80211: Calling CRDA to update world regulatory domain
[19699.785919] wlp0s20u6u1u2: authenticate with 00:26:f2:fe:d6:55
[19699.813369] wlp0s20u6u1u2: send auth to 00:26:f2:fe:d6:55 (try 1/3)
[19699.813562] cfg80211: World regulatory domain updated:
[19699.813563] cfg80211:   (start_freq - end_freq @ bandwidth), (max_antenna_gain, max_eirp)
[19699.813564] cfg80211:   (2402000 KHz - 2472000 KHz @ 40000 KHz), (300 mBi, 2000 mBm)
[19699.813565] cfg80211:   (2457000 KHz - 2482000 KHz @ 40000 KHz), (300 mBi, 2000 mBm)
[19699.813565] cfg80211:   (2474000 KHz - 2494000 KHz @ 20000 KHz), (300 mBi, 2000 mBm)
[19699.813566] cfg80211:   (5170000 KHz - 5250000 KHz @ 40000 KHz), (300 mBi, 2000 mBm)
[19699.813566] cfg80211:   (5735000 KHz - 5835000 KHz @ 40000 KHz), (300 mBi, 2000 mBm)
[19699.814599] wlp0s20u6u1u2: authenticated
[19699.815359] wlp0s20u6u1u2: associate with 00:26:f2:fe:d6:55 (try 1/3)
[19699.816443] wlp0s20u6u1u2: RX AssocResp from 00:26:f2:fe:d6:55 (capab=0x11 status=0 aid=2)
[19699.827322] wlp0s20u6u1u2: associated
[19699.827543] cfg80211: Calling CRDA for country: AU
[19699.831450] cfg80211: Regulatory domain changed to country: AU
[19699.831450] cfg80211:   (start_freq - end_freq @ bandwidth), (max_antenna_gain, max_eirp)
[19699.831451] cfg80211:   (2402000 KHz - 2482000 KHz @ 40000 KHz), (N/A, 2000 mBm)
[19699.831451] cfg80211:   (5170000 KHz - 5250000 KHz @ 40000 KHz), (300 mBi, 2300 mBm)
[19699.831451] cfg80211:   (5250000 KHz - 5330000 KHz @ 40000 KHz), (300 mBi, 2300 mBm)
[19699.831452] cfg80211:   (5735000 KHz - 5835000 KHz @ 40000 KHz), (300 mBi, 3000 mBm)
[19699.784881] ------------[ cut here ]------------
[19699.784905] kernel BUG at include/linux/skbuff.h:1434!
[19699.784927] invalid opcode: 0000 [#1] SMP 
[19699.784944] Modules linked in: rfcomm fuse xt_CHECKSUM tun ipt_MASQUERADE ip6t_REJECT xt_conntrack ebtable_nat 
[19699.785172]  snd_hwdep microcode snd_seq snd_seq_device snd_pcm serio_raw i2c_i801 snd_page_alloc mei_me snd_ti
[19699.785248] CPU: 7 PID: 43 Comm: ksoftirqd/7 Not tainted 3.12.6-300.fc20.x86_64 #1
[19699.785268] Hardware name: Gigabyte Technology Co., Ltd. Z87-HD3/Z87-HD3, BIOS F6 08/03/2013
[19699.785290] task: ffff88081aa7afd0 ti: ffff88081aaac000 task.ti: ffff88081aaac000
[19699.785309] RIP: 0010:[<ffffffff81665536>]  [<ffffffff81665536>] __skb_pull.part.40+0x4/0x6
[19699.785334] RSP: 0018:ffff88081aaadc18  EFLAGS: 00010287
[19699.785348] RAX: 000000002a2058db RBX: ffff88081aaadc58 RCX: 0000000000000000
[19699.785366] RDX: 000000000000001a RSI: 000000000000006b RDI: ffff88081a045f80
[19699.785385] RBP: ffff88081aaadc18 R08: 0000c050faff7f5e R09: 000152e15d046df4
[19699.785404] R10: 52e15d046df455d6 R11: fef2260000004188 R12: ffff88081a045f80
[19699.785423] R13: ffff88081a045f80 R14: ffff88063abec600 R15: ffff88081aaadd38
[19699.785441] FS:  0000000000000000(0000) GS:ffff88083edc0000(0000) knlGS:0000000000000000
[19699.785462] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[19699.785477] CR2: 00007f313b69a000 CR3: 0000000807f9a000 CR4: 00000000001407e0
[19699.785495] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[19699.785513] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[19699.785531] Stack:
[19699.785536]  ffff88081aaadc28 ffffffff815561f3 ffff88081aaadc48 ffffffffa02ce5e4
[19699.785556]  0000000000000000 ffff8807e3ab9140 ffff88081aaadcb0 ffffffffa02cb845
[19699.788586]  0000000000001688 00000007001a01ce 0000000300000000 0000000200000000
[19699.790163] Call Trace:
[19699.791638]  [<ffffffff815561f3>] skb_pull+0x33/0x40
[19699.793297]  [<ffffffffa02ce5e4>] rt2x00crypto_tx_remove_iv+0x54/0x70 [rt2x00lib]
[19699.795214]  [<ffffffffa02cb845>] rt2x00queue_write_tx_frame+0x2a5/0x410 [rt2x00lib]
[19699.797103]  [<ffffffffa02c8d88>] rt2x00mac_tx+0xa8/0x380 [rt2x00lib]
[19699.799039]  [<ffffffff8130b510>] ? timerqueue_add+0x60/0xb0
[19699.800950]  [<ffffffffa05314b9>] __ieee80211_tx+0x249/0x350 [mac80211]
[19699.802899]  [<ffffffffa0534c46>] ieee80211_tx_pending+0x146/0x200 [mac80211]
[19699.804577]  [<ffffffff8106e26e>] tasklet_action+0x6e/0x110
[19699.806557]  [<ffffffff8106e747>] __do_softirq+0xf7/0x240
[19699.808500]  [<ffffffff8106e8c0>] run_ksoftirqd+0x30/0x50
[19699.810428]  [<ffffffff810932ef>] smpboot_thread_fn+0xff/0x1b0
[19699.812348]  [<ffffffff810931f0>] ? lg_local_lock+0x40/0x40
[19699.844129]  [<ffffffff8108b0d0>] kthread+0xc0/0xd0
[19699.845599]  [<ffffffff8108b010>] ? insert_kthread_work+0x40/0x40
[19699.847044]  [<ffffffff8167207c>] ret_from_fork+0x7c/0xb0
[19699.848437]  [<ffffffff8108b010>] ? insert_kthread_work+0x40/0x40
[19699.849857] Code: 10 8b 87 c8 00 00 00 48 89 44 24 08 48 8b 87 d8 00 00 00 48 c7 c7 e8 ca a8 81 48 89 04 24 31 
[19699.851507] RIP  [<ffffffff81665536>] __skb_pull.part.40+0x4/0x6
[19699.852951]  RSP <ffff88081aaadc18>

crash 7.0.3-1.fc20:

      KERNEL: /usr/lib/debug/lib/modules/3.12.6-300.fc20.x86_64/vmlinux
    DUMPFILE: /var/crash/127.0.0.1-2014.01.04-17:40:09/vmcore  [PARTIAL DUMP]
        CPUS: 8
        DATE: Sat Jan  4 17:40:05 2014
      UPTIME: 05:28:36
LOAD AVERAGE: 1.45, 1.42, 1.05
       TASKS: 588
    NODENAME: <snip>
     RELEASE: 3.12.6-300.fc20.x86_64
     VERSION: #1 SMP Mon Dec 23 16:44:31 UTC 2013
     MACHINE: x86_64  (3492 Mhz)
      MEMORY: 32 GB
       PANIC: "kernel BUG at include/linux/skbuff.h:1434!"
         PID: 43
     COMMAND: "ksoftirqd/7"
        TASK: ffff88081aa7afd0  [THREAD_INFO: ffff88081aaac000]
         CPU: 7
       STATE: TASK_RUNNING (PANIC)

crash> bt
PID: 43     TASK: ffff88081aa7afd0  CPU: 7   COMMAND: "ksoftirqd/7"
 #0 [ffff88081aaad900] machine_kexec at ffffffff810495e2
 #1 [ffff88081aaad950] crash_kexec at ffffffff810db133
 #2 [ffff88081aaada18] oops_end at ffffffff8166ae60
 #3 [ffff88081aaada40] die at ffffffff81015c2b
 #4 [ffff88081aaada70] do_trap at ffffffff8166a6f0
 #5 [ffff88081aaadac0] do_invalid_op at ffffffff81012fa5
 #6 [ffff88081aaadb60] invalid_op at ffffffff816737de
    [exception RIP: __skb_pull+4]
    RIP: ffffffff81665536  RSP: ffff88081aaadc18  RFLAGS: 00010287
    RAX: 000000002a2058db  RBX: ffff88081aaadc58  RCX: 0000000000000000
    RDX: 000000000000001a  RSI: 000000000000006b  RDI: ffff88081a045f80
    RBP: ffff88081aaadc18   R8: 0000c050faff7f5e   R9: 000152e15d046df4
    R10: 52e15d046df455d6  R11: fef2260000004188  R12: ffff88081a045f80
    R13: ffff88081a045f80  R14: ffff88063abec600  R15: ffff88081aaadd38
    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
 #7 [ffff88081aaadc20] skb_pull at ffffffff815561f3
 #8 [ffff88081aaadc30] rt2x00crypto_tx_remove_iv at ffffffffa02ce5e4 [rt2x00lib]
 #9 [ffff88081aaadc50] rt2x00queue_write_tx_frame at ffffffffa02cb845 [rt2x00lib]
#10 [ffff88081aaadcb8] rt2x00mac_tx at ffffffffa02c8d88 [rt2x00lib]
#11 [ffff88081aaadd08] __ieee80211_tx at ffffffffa05314b9 [mac80211]
#12 [ffff88081aaadd70] ieee80211_tx_pending at ffffffffa0534c46 [mac80211]
#13 [ffff88081aaaddd8] tasklet_action at ffffffff8106e26e
#14 [ffff88081aaaddf8] __do_softirq at ffffffff8106e747
#15 [ffff88081aaade68] run_ksoftirqd at ffffffff8106e8c0
#16 [ffff88081aaade80] smpboot_thread_fn at ffffffff810932ef
#17 [ffff88081aaaded0] kthread at ffffffff8108b0d0
#18 [ffff88081aaadf50] ret_from_fork at ffffffff8167207c

crash> bt -l
PID: 43     TASK: ffff88081aa7afd0  CPU: 7   COMMAND: "ksoftirqd/7"
 #0 [ffff88081aaad900] machine_kexec at ffffffff810495e2
    /usr/src/debug/kernel-3.12.fc20/linux-3.12.6-300.fc20.x86_64/arch/x86/kernel/machine_kexec_64.c: 266
 #1 [ffff88081aaad950] crash_kexec at ffffffff810db133
    /usr/src/debug/kernel-3.12.fc20/linux-3.12.6-300.fc20.x86_64/kernel/kexec.c: 1106
 #2 [ffff88081aaada18] oops_end at ffffffff8166ae60
    /usr/src/debug/kernel-3.12.fc20/linux-3.12.6-300.fc20.x86_64/arch/x86/kernel/dumpstack.c: 225
 #3 [ffff88081aaada40] die at ffffffff81015c2b
    /usr/src/debug/kernel-3.12.fc20/linux-3.12.6-300.fc20.x86_64/arch/x86/kernel/dumpstack.c: 305
 #4 [ffff88081aaada70] do_trap at ffffffff8166a6f0
    /usr/src/debug/kernel-3.12.fc20/linux-3.12.6-300.fc20.x86_64/arch/x86/kernel/traps.c: 175
 #5 [ffff88081aaadac0] do_invalid_op at ffffffff81012fa5
    /usr/src/debug/kernel-3.12.fc20/linux-3.12.6-300.fc20.x86_64/arch/x86/kernel/traps.c: 218
 #6 [ffff88081aaadb60] invalid_op at ffffffff816737de
    /usr/src/debug/kernel-3.12.fc20/linux-3.12.6-300.fc20.x86_64/arch/x86/kernel/entry_64.S: 1306
    [exception RIP: __skb_pull+4]
    RIP: ffffffff81665536  RSP: ffff88081aaadc18  RFLAGS: 00010287
    RAX: 000000002a2058db  RBX: ffff88081aaadc58  RCX: 0000000000000000
    RDX: 000000000000001a  RSI: 000000000000006b  RDI: ffff88081a045f80
    RBP: ffff88081aaadc18   R8: 0000c050faff7f5e   R9: 000152e15d046df4
    R10: 52e15d046df455d6  R11: fef2260000004188  R12: ffff88081a045f80
    R13: ffff88081a045f80  R14: ffff88063abec600  R15: ffff88081aaadd38
    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
 #7 [ffff88081aaadc20] skb_pull at ffffffff815561f3
    /usr/src/debug/kernel-3.12.fc20/linux-3.12.6-300.fc20.x86_64/net/core/skbuff.c: 1307
 #8 [ffff88081aaadc30] rt2x00crypto_tx_remove_iv at ffffffffa02ce5e4 [rt2x00lib]
 #9 [ffff88081aaadc50] rt2x00queue_write_tx_frame at ffffffffa02cb845 [rt2x00lib]
#10 [ffff88081aaadcb8] rt2x00mac_tx at ffffffffa02c8d88 [rt2x00lib]
#11 [ffff88081aaadd08] __ieee80211_tx at ffffffffa05314b9 [mac80211]
#12 [ffff88081aaadd70] ieee80211_tx_pending at ffffffffa0534c46 [mac80211]
#13 [ffff88081aaaddd8] tasklet_action at ffffffff8106e26e
    /usr/src/debug/kernel-3.12.fc20/linux-3.12.6-300.fc20.x86_64/arch/x86/include/asm/bitops.h: 111
#14 [ffff88081aaaddf8] __do_softirq at ffffffff8106e747
    /usr/src/debug/kernel-3.12.fc20/linux-3.12.6-300.fc20.x86_64/kernel/softirq.c: 251
#15 [ffff88081aaade68] run_ksoftirqd at ffffffff8106e8c0
    /usr/src/debug/kernel-3.12.fc20/linux-3.12.6-300.fc20.x86_64/kernel/softirq.c: 775
#16 [ffff88081aaade80] smpboot_thread_fn at ffffffff810932ef
    /usr/src/debug/kernel-3.12.fc20/linux-3.12.6-300.fc20.x86_64/kernel/smpboot.c: 160
#17 [ffff88081aaaded0] kthread at ffffffff8108b0d0
    /usr/src/debug/kernel-3.12.fc20/linux-3.12.6-300.fc20.x86_64/kernel/kthread.c: 200
#18 [ffff88081aaadf50] ret_from_fork at ffffffff8167207c
    /usr/src/debug/kernel-3.12.fc20/linux-3.12.6-300.fc20.x86_64/arch/x86/kernel/entry_64.S: 555

crash> bt -F
PID: 43     TASK: ffff88081aa7afd0  CPU: 7   COMMAND: "ksoftirqd/7"
 #0 [ffff88081aaad900] machine_kexec at ffffffff810495e2
    ffff88081aaad908: 000000007ffafbff ffff880000000000 
    ffff88081aaad918: 0000000030001000 ffff880030001000 
    ffff88081aaad928: 0000000030000000 0000000000000000 
    ffff88081aaad938: ffff88081aaadb68 ffff88081aaad958 
    ffff88081aaad948: ffff88081aaada10 crash_kexec+99   
 #1 [ffff88081aaad950] crash_kexec at ffffffff810db133
    ffff88081aaad958: ffff88081aaadd38 ffff88063abec600 
    ffff88081aaad968: [skbuff_head_cache] [skbuff_head_cache] 
    ffff88081aaad978: ffff88081aaadc18 ffff88081aaadc58 
    ffff88081aaad988: fef2260000004188 52e15d046df455d6 
    ffff88081aaad998: 000152e15d046df4 0000c050faff7f5e 
    ffff88081aaad9a8: 000000002a2058db 0000000000000000 
    ffff88081aaad9b8: 000000000000001a 000000000000006b 
    ffff88081aaad9c8: [skbuff_head_cache] ffffffffffffffff 
    ffff88081aaad9d8: __skb_pull+4     0000000000000010 
    ffff88081aaad9e8: 0000000000010287 ffff88081aaadc18 
    ffff88081aaad9f8: 0000000000000018 000000000000000b 
    ffff88081aaada08: ffff88081aaadb68 ffff88081aaada38 
    ffff88081aaada18: oops_end+176     
 #2 [ffff88081aaada18] oops_end at ffffffff8166ae60
    ffff88081aaada20: ffff88081aaadb68 0000000000000246 
    ffff88081aaada30: kallsyms_token_index+6277 ffff88081aaada68 
    ffff88081aaada40: die+75           
 #3 [ffff88081aaada40] die at ffffffff81015c2b
    ffff88081aaada48: ffff88081aaadb68 [task_struct]    
    ffff88081aaada58: [skbuff_head_cache] 0000000000000006 
    ffff88081aaada68: ffff88081aaadab8 do_trap+96       
 #4 [ffff88081aaada70] do_trap at ffffffff8166a6f0
    ffff88081aaada78: ffff88063abec600 ffff88081aaadd38 
    ffff88081aaada88: kallsyms_token_index+6277 ffff88081aaadb68 
    ffff88081aaada98: 0000000000000000 [skbuff_head_cache] 
    ffff88081aaadaa8: ffff88063abec600 ffff88081aaadd38 
    ffff88081aaadab8: ffff88081aaadb58 do_invalid_op+149 
 #5 [ffff88081aaadac0] do_invalid_op at ffffffff81012fa5
    ffff88081aaadac8: 0000000000000004 [pid]            
    ffff88081aaadad8: __skb_pull+4     [scsi_cmd_cache] 
    ffff88081aaadae8: ffff88081aaadb20 set_track+97     
    ffff88081aaadaf8: 000000100000000f [scsi_cmd_cache] 
    ffff88081aaadb08: init_object+61   ffffea0020556c00 
    ffff88081aaadb18: [kmem_cache]     ffff88081aaadb78 
    ffff88081aaadb28: free_debug_processing+478 [blkdev_requests] 
    ffff88081aaadb38: ffffea0001e66800 [kmem_cache]     
    ffff88081aaadb48: 0000000000000001 [skbuff_head_cache] 
    ffff88081aaadb58: ffff88081aaadc18 invalid_op+30    
 #6 [ffff88081aaadb60] invalid_op at ffffffff816737de
    [exception RIP: __skb_pull+4]
    RIP: ffffffff81665536  RSP: ffff88081aaadc18  RFLAGS: 00010287
    RAX: 000000002a2058db  RBX: ffff88081aaadc58  RCX: 0000000000000000
    RDX: 000000000000001a  RSI: 000000000000006b  RDI: ffff88081a045f80
    RBP: ffff88081aaadc18   R8: 0000c050faff7f5e   R9: 000152e15d046df4
    R10: 52e15d046df455d6  R11: fef2260000004188  R12: ffff88081a045f80
    R13: ffff88081a045f80  R14: ffff88063abec600  R15: ffff88081aaadd38
    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
    ffff88081aaadb68: ffff88081aaadd38 ffff88063abec600 
    ffff88081aaadb78: [skbuff_head_cache] [skbuff_head_cache] 
    ffff88081aaadb88: ffff88081aaadc18 ffff88081aaadc58 
    ffff88081aaadb98: fef2260000004188 52e15d046df455d6 
    ffff88081aaadba8: 000152e15d046df4 0000c050faff7f5e 
    ffff88081aaadbb8: 000000002a2058db 0000000000000000 
    ffff88081aaadbc8: 000000000000001a 000000000000006b 
    ffff88081aaadbd8: [skbuff_head_cache] ffffffffffffffff 
    ffff88081aaadbe8: __skb_pull+4     0000000000000010 
    ffff88081aaadbf8: 0000000000010287 ffff88081aaadc18
    ffff88081aaadc08: 0000000000000018 ffff88081aaadc48 
    ffff88081aaadc18: ffff88081aaadc28 skb_pull+51      
 #7 [ffff88081aaadc20] skb_pull at ffffffff815561f3
    ffff88081aaadc28: ffff88081aaadc48 rt2x00crypto_tx_remove_iv+84 
 #8 [ffff88081aaadc30] rt2x00crypto_tx_remove_iv at ffffffffa02ce5e4 [rt2x00lib]
    ffff88081aaadc38: 0000000000000000 [kmalloc-1024]   
    ffff88081aaadc48: ffff88081aaadcb0 rt2x00queue_write_tx_frame+677 
 #9 [ffff88081aaadc50] rt2x00queue_write_tx_frame at ffffffffa02cb845 [rt2x00lib]
    ffff88081aaadc58: 0000000000001688 00000007001a01ce 
    ffff88081aaadc68: 0000000300000000 0000000200000000 
    ffff88081aaadc78: 0000000000000002 0000006b001a006b 
    ffff88081aaadc88: [kmalloc-1024]   [skbuff_head_cache] 
    ffff88081aaadc98: ffff88063abed7c0 ffff88063abec600 
    ffff88081aaadca8: ffff88081aaadd38 ffff88081aaadd00 
    ffff88081aaadcb8: rt2x00mac_tx+168 
#10 [ffff88081aaadcb8] rt2x00mac_tx at ffffffffa02c8d88 [rt2x00lib]
    ffff88081aaadcc0: ffff88081aaadce8 ffff88081aaadce8 
    ffff88081aaadcd0: timerqueue_add+96 ffff88063abec720 
    ffff88081aaadce0: ffff88063abec600 0000000000000002 
    ffff88081aaadcf0: [skbuff_head_cache] ffff88081aaadd90 
    ffff88081aaadd00: ffff88081aaadd68 __ieee80211_tx+585 
#11 [ffff88081aaadd08] __ieee80211_tx at ffffffffa05314b9 [mac80211]
    ffff88081aaadd10: 000001ce8109ef40 [kmalloc-8192]   
    ffff88081aaadd20: 0000000000000000 ffff88081aaadd90 
    ffff88081aaadd30: 0000000141880180 0000000000000000 
    ffff88081aaadd40: ffff88063abed0f0 ffff88063abec720 
    ffff88081aaadd50: [skbuff_head_cache] ffff88081aaadd90 
    ffff88081aaadd60: ffff88063abec610 ffff88081aaaddd0 
    ffff88081aaadd70: ieee80211_tx_pending+326 
#12 [ffff88081aaadd70] ieee80211_tx_pending at ffffffffa0534c46 [mac80211]
    ffff88081aaadd78: 0000000000000001 000000021aaaddb8 
    ffff88081aaadd88: ffff88063abec600 ffff88081aaadd90 
    ffff88081aaadd98: ffff88081aaadd90 0000000000000000 
    ffff88081aaadda8: ffff88063abed248 0000000000000000 
    ffff88081aaaddb8: softirq_threads  softirq_vec+48   
    ffff88081aaaddc8: 0000000000000001 ffff88081aaaddf0 
    ffff88081aaaddd8: tasklet_action+110 
#13 [ffff88081aaaddd8] tasklet_action at ffffffff8106e26e
    ffff88081aaadde0: 0000000000000006 0000000000000006 
    ffff88081aaaddf0: ffff88081aaade60 __do_softirq+247 
#14 [ffff88081aaaddf8] __do_softirq at ffffffff8106e747
    ffff88081aaade00: ffff88081aaadfd8 0000000a0420a040 
    ffff88081aaade10: 000000010128452b 0000000000000006 
    ffff88081aaade20: ffff88081aaadfd8 ffff88081aaadfd8 
    ffff88081aaade30: 0000010000000007 [task_struct]    
    ffff88081aaade40: [kmalloc-16]     softirq_threads  
    ffff88081aaade50: [task_struct]    [task_struct]    
    ffff88081aaade60: ffff88081aaade78 run_ksoftirqd+48 
#15 [ffff88081aaade68] run_ksoftirqd at ffffffff8106e8c0
    ffff88081aaade70: 0000000781667d89 ffff88081aaadec8 
    ffff88081aaade80: smpboot_thread_fn+255 
#16 [ffff88081aaade80] smpboot_thread_fn at ffffffff810932ef
    ffff88081aaade88: 0000000000000000 ffff88081aaadea0 
    ffff88081aaade98: 0000000000000001 ffff88081aecdd30 
    ffff88081aaadea8: [kmalloc-16]     smpboot_thread_fn 
    ffff88081aaadeb8: 0000000000000000 0000000000000000 
    ffff88081aaadec8: ffff88081aaadf48 kthread+192      
#17 [ffff88081aaaded0] kthread at ffffffff8108b0d0
    ffff88081aaaded8: 0000000000000001 b48592a000000007 
    ffff88081aaadee8: [kmalloc-16]     ed0ca34e00000000 
    ffff88081aaadef8: ed8e845800030003 ffff88081aaadf00 
    ffff88081aaadf08: ffff88081aaadf00 [anon_vma_chain] 
    ffff88081aaadf18: ffffffff00000000 ffff88081aaadf20 
    ffff88081aaadf28: ffff88081aaadf20 kthread          
    ffff88081aaadf38: 0000000000000000 0000000000000000
    ffff88081aaadf48: ffff88081aecdd30 ret_from_fork+124 
#18 [ffff88081aaadf50] ret_from_fork at ffffffff8167207c
Comment 1 Alex Outhred 2014-01-05 08:15:08 UTC
Created attachment 120971 [details]
Backtrace using crash/vmcore
Comment 2 Alan 2014-01-06 17:22:57 UTC
Moving to drivers/wireless as this seems to be the driver not properly checking if it got a short frame with no iv block and then trying to remove data that was not present.
Comment 3 Stanislaw Gruszka 2014-01-14 18:12:11 UTC
I suspect that txdesc->iv_len has somehow wrong value, but not sure how this could happen.

Is this bug reproducible ? Could you provide vmcore file for download somewhere ?
Comment 4 Alex Outhred 2014-01-16 10:37:38 UTC
I do not know how to reproduce the crash. I have been having sporadic crashes, usually while the machine is unattended, for the past month or so. Since I got kdump working and submitted this bug, I have removed the hardware from the machine, but if it would be useful, I can use the device again and see if I can capture another vmcore.

I have sent the original vmcore corresponding to the trace above to Stanislaw privately.
Comment 5 Stanislaw Gruszka 2014-01-17 09:30:22 UTC
Thanks, I'm looking in vmcore now, but analyzing memory dump can be hard, so it can take some time ...
Comment 6 Stanislaw Gruszka 2014-01-17 15:11:25 UTC
crash> struct sk_buff ffff88081a045f80 | head -20
struct sk_buff {
  next = 0x0, 
  prev = 0x0, 
  tstamp = {
    tv64 = 0
  }, 
  sk = 0xffff8807c7fde880, 
  dev = 0xffff8808039b42a0, 
  cb = "P\000\200@\001\002\000\000\000\000\a(\000\000\000\000\000\000\000\000\000\000\000\000\252\252\003\000\000\000\b\000E\000\001\254lZ@\000\004\021T\256\300\250\003\226", 
  _skb_refdst = 7784309262464843759, 
  sp = 0x49544f4ef6959801, 
  len = 706762971, 
  data_len = 1414809632, 
  mac_len = 12112, 
  hdr_len = 11825, 
  {
    csum = 1208618289, 
    {
      csum_start = 3377, 
      csum_offset = 18442
crash> rd ffff88081a045f80 30
ffff88081a045f80:  0000000000000000 0000000000000000   ................
ffff88081a045f90:  0000000000000000 ffff8807c7fde880   ................
ffff88081a045fa0:  ffff8808039b42a0 0000020140800050   .B......P..@....
ffff88081a045fb0:  0000000028070000 0000000000000000   ...(............
ffff88081a045fc0:  000800000003aaaa 00405a6cac010045   ........E...lZ@.
ffff88081a045fd0:  9603a8c0ae541104 6c076c07faffffef   ..T..........l.l
ffff88081a045fe0:  49544f4ef6959801 545448202a2058db   ....NOTI.X * HTT
ffff88081a045ff0:  480a0d312e312f50 393332203a74736f   P/1.1..Host: 239
ffff88081a046000:  3535322e3535322e 3039313a3035322e   .255.255.250:190
ffff88081a046010:  65686361430a0d30 6c6f72746e6f432d   0..Cache-Control
ffff88081a046020:  67612d78616d203a 0000000000313d65   : max-age=1.....
ffff88081a046030:  0000000000000000 0000000000000000   ................
ffff88081a046040:  0000003e00600074 000002c00000020c   t.`.>...........
ffff88081a046050:  ffff8805399ca4f8 ffff8805399ca536   ...9....6..9....
ffff88081a046060:  0000000100000500 cccccccccccccccc   ................

sk_buff structure is corrupted by network packet (there is HTTP data where actual skb len & data_len values should be).

Alex, please install fedora kernel-debug, try run it for some time and check if it detect some problems. If not, I will provide you kernel compiled with CONFIG_DEBUG_PAGEALLOC , which is even more intensive memory corruption debug method, than are used in kernel-debug, but it slow down performance of the kernel vastly.

If this is software bug, it should be detectable by above methods, but this could be also firmware bug or DMA settings bug, which are not easy detectable. rt2x00usb driver does not set DMA mappings directly, it is done by usb host driver, which one are you using ("lsusb -t" should show that) ?
Comment 7 Alex Outhred 2014-01-18 07:02:11 UTC
Thank you Stanislaw. I have downloaded kernel-debug, and will run that exclusively for a while.

Some other info: 

I've been having sporadic crashes since performing a hardware upgrade at the start of December (new motherboard, CPU, RAM and SSD, but same graphics card and rt2800usb device). Previous machine was fairly stable, hardly any unexplained crashes. One other change is that the boot process on the new hardware is via UEFI rather than legacy BIOS. The machine is not overclocked in any way and memory timing is at default settings. Unfortunately the 32G of RAM is not ECC, and non-Xeon Haswell CPUs don't seem to support ECC RAM. Initial memtest86 with the new hardware detected no errors after 12 hours. 

I've had slub_debug=FZPU on the kernel command line since 2013-12-23. I've had kdump/crashkernel enabled since 2014-01-04, shortly before capturing this crash. The rt2800usb has been physically removed from the computer since 2014-01-04.

The frequency of crashes does seem to have decreased since 2014-01-04, but I have had one more crash with IP in __slab_alloc called from __alloc_skb, unix_stream_sendmsg, reported at:
https://bugzilla.redhat.com/show_bug.cgi?id=1051476

As you can see, Dave Jones suspected a bitflip, so I ran memtest86 again for another 20 hours, with no errors detected. I've also run single and parallel instances of memtester from userspace for another ~24 hours in total, with no errors detected.

---

lsusb -t

/:  Bus 02.Port 1: Dev 1, Class=root_hub, Driver=xhci_hcd/6p, 5000M
    |__ Port 5: Dev 2, If 0, Class=Mass Storage, Driver=usb-storage, 5000M
/:  Bus 01.Port 1: Dev 1, Class=root_hub, Driver=xhci_hcd/14p, 480M
    |__ Port 5: Dev 2, If 0, Class=Audio, Driver=snd-usb-audio, 12M
    |__ Port 5: Dev 2, If 1, Class=Audio, Driver=snd-usb-audio, 12M
    |__ Port 5: Dev 2, If 2, Class=Audio, Driver=snd-usb-audio, 12M
    |__ Port 6: Dev 3, If 0, Class=Hub, Driver=hub/4p, 480M
        |__ Port 1: Dev 4, If 0, Class=Hub, Driver=hub/2p, 480M
            |__ Port 1: Dev 7, If 0, Class=Mass Storage, Driver=usb-storage, 480M
            |__ Port 2: Dev 8, If 0, Class=Wireless, Driver=btusb, 12M
            |__ Port 2: Dev 8, If 1, Class=Wireless, Driver=btusb, 12M
        |__ Port 2: Dev 10, If 0, Class=Vendor Specific Class, Driver=rt2800usb, 480M
        |__ Port 3: Dev 5, If 0, Class=Human Interface Device, Driver=usbhid, 1.5M
        |__ Port 4: Dev 6, If 0, Class=Human Interface Device, Driver=usbhid, 1.5M
        |__ Port 4: Dev 6, If 1, Class=Human Interface Device, Driver=usbhid, 1.5M

---

lspci -v

00:00.0 Host bridge: Intel Corporation 4th Gen Core Processor DRAM Controller (rev 06)
	Subsystem: Gigabyte Technology Co., Ltd Device 5000
	Flags: bus master, fast devsel, latency 0
	Capabilities: [e0] Vendor Specific Information: Len=0c <?>

00:01.0 PCI bridge: Intel Corporation Xeon E3-1200 v3/4th Gen Core Processor PCI Express x16 Controller (rev 06) (prog-if 00 [Normal decode])
	Flags: bus master, fast devsel, latency 0
	Bus: primary=00, secondary=01, subordinate=01, sec-latency=0
	I/O behind bridge: 0000e000-0000efff
	Memory behind bridge: e0000000-f00fffff
	Capabilities: [88] Subsystem: Gigabyte Technology Co., Ltd Device 5000
	Capabilities: [80] Power Management version 3
	Capabilities: [90] MSI: Enable+ Count=1/1 Maskable- 64bit-
	Capabilities: [a0] Express Root Port (Slot+), MSI 00
	Capabilities: [100] Virtual Channel
	Capabilities: [140] Root Complex Link
	Capabilities: [d94] #19
	Kernel driver in use: pcieport

00:14.0 USB controller: Intel Corporation 8 Series/C220 Series Chipset Family USB xHCI (rev 04) (prog-if 30 [XHCI])
	Subsystem: Gigabyte Technology Co., Ltd Device 5007
	Flags: bus master, medium devsel, latency 0, IRQ 43
	Memory at f0200000 (64-bit, non-prefetchable) [size=64K]
	Capabilities: [70] Power Management version 2
	Capabilities: [80] MSI: Enable+ Count=1/8 Maskable- 64bit+
	Kernel driver in use: xhci_hcd

00:16.0 Communication controller: Intel Corporation 8 Series/C220 Series Chipset Family MEI Controller #1 (rev 04)
	Subsystem: Gigabyte Technology Co., Ltd Device 1c3a
	Flags: bus master, fast devsel, latency 0, IRQ 45
	Memory at f0218000 (64-bit, non-prefetchable) [size=16]
	Capabilities: [50] Power Management version 3
	Capabilities: [8c] MSI: Enable+ Count=1/1 Maskable- 64bit+
	Kernel driver in use: mei_me

00:1b.0 Audio device: Intel Corporation 8 Series/C220 Series Chipset High Definition Audio Controller (rev 04)
	Subsystem: Gigabyte Technology Co., Ltd Device a002
	Flags: bus master, fast devsel, latency 0, IRQ 47
	Memory at f0210000 (64-bit, non-prefetchable) [size=16K]
	Capabilities: [50] Power Management version 2
	Capabilities: [60] MSI: Enable+ Count=1/1 Maskable- 64bit+
	Capabilities: [70] Express Root Complex Integrated Endpoint, MSI 00
	Capabilities: [100] Virtual Channel
	Kernel driver in use: snd_hda_intel

00:1c.0 PCI bridge: Intel Corporation 8 Series/C220 Series Chipset Family PCI Express Root Port #1 (rev d4) (prog-if 00 [Normal decode])
	Flags: bus master, fast devsel, latency 0
	Bus: primary=00, secondary=02, subordinate=02, sec-latency=0
	Capabilities: [40] Express Root Port (Slot-), MSI 00
	Capabilities: [80] MSI: Enable- Count=1/1 Maskable- 64bit-
	Capabilities: [90] Subsystem: Gigabyte Technology Co., Ltd Device 5001
	Capabilities: [a0] Power Management version 3
	Kernel driver in use: pcieport

00:1c.2 PCI bridge: Intel Corporation 8 Series/C220 Series Chipset Family PCI Express Root Port #3 (rev d4) (prog-if 00 [Normal decode])
	Flags: bus master, fast devsel, latency 0
	Bus: primary=00, secondary=03, subordinate=03, sec-latency=0
	I/O behind bridge: 0000d000-0000dfff
	Memory behind bridge: f0100000-f01fffff
	Capabilities: [40] Express Root Port (Slot+), MSI 00
	Capabilities: [80] MSI: Enable- Count=1/1 Maskable- 64bit-
	Capabilities: [90] Subsystem: Gigabyte Technology Co., Ltd Device 5001
	Capabilities: [a0] Power Management version 3
	Kernel driver in use: pcieport

00:1c.3 PCI bridge: Intel Corporation 82801 PCI Bridge (rev d4) (prog-if 01 [Subtractive decode])
	Flags: bus master, fast devsel, latency 0
	Bus: primary=00, secondary=04, subordinate=05, sec-latency=0
	Capabilities: [40] Express Root Port (Slot+), MSI 00
	Capabilities: [80] MSI: Enable- Count=1/1 Maskable- 64bit-
	Capabilities: [90] Subsystem: Gigabyte Technology Co., Ltd Device 5001
	Capabilities: [a0] Power Management version 3

00:1f.0 ISA bridge: Intel Corporation Z87 Express LPC Controller (rev 04)
	Subsystem: Gigabyte Technology Co., Ltd Device 5001
	Flags: bus master, medium devsel, latency 0
	Capabilities: [e0] Vendor Specific Information: Len=0c <?>
	Kernel driver in use: lpc_ich

00:1f.2 SATA controller: Intel Corporation 8 Series/C220 Series Chipset Family 6-port SATA Controller 1 [AHCI mode] (rev 04) (prog-if 01 [AHCI 1.0])
	Subsystem: Gigabyte Technology Co., Ltd Device b005
	Flags: bus master, 66MHz, medium devsel, latency 0, IRQ 42
	I/O ports at f070 [size=8]
	I/O ports at f060 [size=4]
	I/O ports at f050 [size=8]
	I/O ports at f040 [size=4]
	I/O ports at f020 [size=32]
	Memory at f0216000 (32-bit, non-prefetchable) [size=2K]
	Capabilities: [80] MSI: Enable+ Count=1/1 Maskable- 64bit-
	Capabilities: [70] Power Management version 3
	Capabilities: [a8] SATA HBA v1.0
	Kernel driver in use: ahci

00:1f.3 SMBus: Intel Corporation 8 Series/C220 Series Chipset Family SMBus Controller (rev 04)
	Subsystem: Gigabyte Technology Co., Ltd Device 5001
	Flags: medium devsel, IRQ 18
	Memory at f0215000 (64-bit, non-prefetchable) [size=256]
	I/O ports at f000 [size=32]

01:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Juniper XT [Radeon HD 6770] (prog-if 00 [VGA controller])
	Subsystem: Gigabyte Technology Co., Ltd Device 220c
	Flags: bus master, fast devsel, latency 0, IRQ 44
	Memory at e0000000 (64-bit, prefetchable) [size=256M]
	Memory at f0020000 (64-bit, non-prefetchable) [size=128K]
	I/O ports at e000 [size=256]
	Expansion ROM at f0000000 [disabled] [size=128K]
	Capabilities: [50] Power Management version 3
	Capabilities: [58] Express Legacy Endpoint, MSI 00
	Capabilities: [a0] MSI: Enable+ Count=1/1 Maskable- 64bit+
	Capabilities: [100] Vendor Specific Information: ID=0001 Rev=1 Len=010 <?>
	Capabilities: [150] Advanced Error Reporting
	Kernel driver in use: radeon

01:00.1 Audio device: Advanced Micro Devices, Inc. [AMD/ATI] Juniper HDMI Audio [Radeon HD 5700 Series]
	Subsystem: Gigabyte Technology Co., Ltd Device aa58
	Flags: bus master, fast devsel, latency 0, IRQ 48
	Memory at f0040000 (64-bit, non-prefetchable) [size=16K]
	Capabilities: [50] Power Management version 3
	Capabilities: [58] Express Legacy Endpoint, MSI 00
	Capabilities: [a0] MSI: Enable+ Count=1/1 Maskable- 64bit+
	Capabilities: [100] Vendor Specific Information: ID=0001 Rev=1 Len=010 <?>
	Capabilities: [150] Advanced Error Reporting
	Kernel driver in use: snd_hda_intel

03:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller (rev 06)
	Subsystem: Gigabyte Technology Co., Ltd Motherboard
	Flags: bus master, fast devsel, latency 0, IRQ 46
	I/O ports at d000 [size=256]
	Memory at f0104000 (64-bit, non-prefetchable) [size=4K]
	Memory at f0100000 (64-bit, prefetchable) [size=16K]
	Capabilities: [40] Power Management version 3
	Capabilities: [50] MSI: Enable+ Count=1/1 Maskable- 64bit+
	Capabilities: [70] Express Endpoint, MSI 01
	Capabilities: [b0] MSI-X: Enable- Count=4 Masked-
	Capabilities: [d0] Vital Product Data
	Capabilities: [100] Advanced Error Reporting
	Capabilities: [140] Virtual Channel
	Capabilities: [160] Device Serial Number 01-00-00-00-68-4c-e0-00
	Kernel driver in use: r8169

04:00.0 PCI bridge: Intel Corporation 82801 PCI Bridge (rev 41) (prog-if 01 [Subtractive decode])
	Flags: bus master, fast devsel, latency 0
	Bus: primary=04, secondary=05, subordinate=05, sec-latency=32
	Capabilities: [90] Power Management version 2
	Capabilities: [a0] Subsystem: Gigabyte Technology Co., Ltd Device 8892

---

cat /proc/interrupts

           CPU0       CPU1       CPU2       CPU3       CPU4       CPU5       CPU6       CPU7       
  0:        134          0          0          0          0          0          0          0  IR-IO-APIC-edge      timer
  1:         15          0          0          0          0          0          0          0  IR-IO-APIC-edge      i8042
  8:          1          0          0          0          0          0          0          0  IR-IO-APIC-edge      rtc0
  9:          0          0          0          0          0          0          0          0  IR-IO-APIC-fasteoi   acpi
 40:          0          0          0          0          0          0          0          0  DMAR_MSI-edge      dmar0
 42:      17331          0          0          0          0          0     183103          0  IR-PCI-MSI-edge      ahci
 43:     212972          0          0          0          0          0          0          0  IR-PCI-MSI-edge      xhci_hcd
 44:        157          0          0          0          0          0          0      51950  IR-PCI-MSI-edge      radeon
 45:         13          0          0          0          0          0          0          0  IR-PCI-MSI-edge      mei_me
 46:      95727          0          0          0          0          0          0          0  IR-PCI-MSI-edge      p4p1
 47:       1579          0          0          0          0          0          0          0  IR-PCI-MSI-edge      snd_hda_intel
 48:       7224          0          0          0          0          0          0          0  IR-PCI-MSI-edge      snd_hda_intel
NMI:      49106     715007      30487      59315      14622      39729     216281      36009   Non-maskable interrupts
LOC:     351949     451316     237676     336484     221566     270926     554846     464766   Local timer interrupts
SPU:          0          0          0          0          0          0          0          0   Spurious interrupts
PMI:      49106     715007      30487      59315      14622      39729     216281      36009   Performance monitoring interrupts
IWI:      16272      10259       7591      10770       7830      14900       9828      23661   IRQ work interrupts
RTR:          0          0          0          0          0          0          0          0   APIC ICR read retries
RES:      15533       8333       5785       5419      30757      32330      21055      21429   Rescheduling interrupts
CAL:       3892       2352       2214       2289        989        883        828        835   Function call interrupts
TLB:      12772      14182       5244      13896      12839      10366       6443      10613   TLB shootdowns
TRM:        113        113        113        113        113        113        113        113   Thermal event interrupts
THR:          0          0          0          0          0          0          0          0   Threshold APIC interrupts
MCE:          0          0          0          0          0          0          0          0   Machine check exceptions
MCP:         26         26         26         26         26         26         26         26   Machine check polls
ERR:          0
MIS:          0

---

Once again, thanks for your help.
Comment 8 Alex Outhred 2014-01-20 11:59:54 UTC
OK, I have a new error message using 3.12.7-300.fc20.x86_64+debug, after more than 34 hours uptime and many more error-free loops of memtester:

=============================================================================
BUG vm_area_struct (Not tainted): Poison overwritten
-----------------------------------------------------------------------------

INFO: 0xffff8802ca728039-0xffff8802ca728039. First byte 0x4b instead of 0x6b
INFO: Allocated in dup_mm+0x230/0x710 age=112633 cpu=0 pid=369
        __slab_alloc+0x3eb/0x4fe
        kmem_cache_alloc+0x294/0x340
        dup_mm+0x230/0x710
        copy_process.part.23+0x12d4/0x1890
        do_fork+0xce/0x450
        SyS_clone+0x16/0x20
        stub_clone+0x69/0x90
INFO: Freed in remove_vma+0x76/0x80 age=109627 cpu=5 pid=28464
        __slab_free+0x3a/0x382
        kmem_cache_free+0x356/0x370
        remove_vma+0x76/0x80
        exit_mmap+0xf4/0x170
        mmput+0x7f/0x110
        do_exit+0x2a5/0xcd0
        do_group_exit+0x4c/0xc0
        SyS_exit_group+0x14/0x20
        system_call_fastpath+0x16/0x1b
INFO: Slab 0xffffea000b29ca00 objects=32 used=32 fp=0x          (null) flags=0x5ff00000004080
INFO: Object 0xffff8802ca728000 @offset=0 fp=0xffff8802ca72aa00

Object ffff8802ca728000: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b  kkkkkkkkkkkkkkkk
Object ffff8802ca728010: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b  kkkkkkkkkkkkkkkk
Object ffff8802ca728020: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b  kkkkkkkkkkkkkkkk
Object ffff8802ca728030: 6b 6b 6b 6b 6b 6b 6b 6b 6b 4b 6b 6b 6b 6b 6b 6b  kkkkkkkkkKkkkkkk
Object ffff8802ca728040: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b  kkkkkkkkkkkkkkkk
Object ffff8802ca728050: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b  kkkkkkkkkkkkkkkk
Object ffff8802ca728060: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b  kkkkkkkkkkkkkkkk
Object ffff8802ca728070: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b  kkkkkkkkkkkkkkkk
Object ffff8802ca728080: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b  kkkkkkkkkkkkkkkk
Object ffff8802ca728090: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b  kkkkkkkkkkkkkkkk
Object ffff8802ca7280a0: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b  kkkkkkkkkkkkkkkk
Object ffff8802ca7280b0: 6b 6b 6b 6b 6b 6b 6b a5                          kkkkkkk.
Redzone ffff8802ca7280b8: bb bb bb bb bb bb bb bb                          ........
Padding ffff8802ca7281f8: 5a 5a 5a 5a 5a 5a 5a 5a                          ZZZZZZZZ
CPU: 7 PID: 5195 Comm: Xorg Tainted: G    B        3.12.7-300.fc20.x86_64+debug #1
Hardware name: Gigabyte Technology Co., Ltd. Z87-HD3/Z87-HD3, BIOS F6 08/03/2013
 ffff8802ca728000 ffff8807ec29bbd8 ffffffff81749d2b ffff88081d019200
 ffff8807ec29bc18 ffffffff811d40fd 0000000000000008 ffff880200000001
 ffff8802ca72803a ffff88081d019200 000000000000006b ffff8802ca728000
Call Trace:
 [<ffffffff81749d2b>] dump_stack+0x54/0x74
 [<ffffffff811d40fd>] print_trailer+0x14d/0x200
 [<ffffffff811d42ef>] check_bytes_and_report+0xcf/0x110
 [<ffffffff811d5177>] check_object+0x1d7/0x250
 [<ffffffff811b28e8>] ? mmap_region+0x348/0x5d0
 [<ffffffff817474b2>] alloc_debug_processing+0x76/0x118
 [<ffffffff817480ed>] __slab_alloc+0x3eb/0x4fe
 [<ffffffffa00171e9>] ? drm_gem_object_lookup+0x29/0x160 [drm]
 [<ffffffff811b28e8>] ? mmap_region+0x348/0x5d0
 [<ffffffff811d6cc4>] kmem_cache_alloc+0x294/0x340
 [<ffffffff811b28e8>] ? mmap_region+0x348/0x5d0
 [<ffffffff811b28e8>] mmap_region+0x348/0x5d0
 [<ffffffff811b2ed0>] do_mmap_pgoff+0x360/0x3f0
 [<ffffffff8119cf50>] vm_mmap_pgoff+0x90/0xc0
 [<ffffffff811b1423>] SyS_mmap_pgoff+0x1d3/0x270
 [<ffffffff8101e7e2>] SyS_mmap+0x22/0x30
 [<ffffffff8175d029>] system_call_fastpath+0x16/0x1b
FIX vm_area_struct: Restoring 0xffff8802ca728039-0xffff8802ca728039=0x6b

FIX vm_area_struct: Marking all objects used

---

Possibly related: afterwards I ran slabinfo --validate, and saw this message:

SLUB: vm_area_struct 1086 slabs counted but counter=1087

Is there enough info here to figure out who scribbled on the slab, or is it another mysterious bitflip? Process 369 is systemd-udevd, but process 28464 had already exited by the time I looked.

I had one previous "BUG" message from the same boot, "MAX_LOCKDEP_ENTRIES too low!", but it didn't seem worthy of a report. Saved details in case anyone is interested. I've also seen various stack depth reports that made me nervous, the latest being:

btrfs (2954) used greatest stack depth: 1856 bytes left

If i understand correctly, that means ~78% of the stack was used. Is 22% an adequate safety margin, or am I at risk of stack overflow? I don't use LVM or network mounts, so there shouldn't be that much layering involved.
Comment 9 Stanislaw Gruszka 2014-01-21 15:18:46 UTC
This is bit flip again (from 0x6b to 0x4b). According to that the problems start to happen after replacing motherboard this looks most likely some hardware (or firmware) fault. There is still possibility of kernel bug, because with H/W you start to use different drivers, but usually kernel/driver memory corruption bugs are not single bit flip, but some memory override. 

I lunched kernel build with DEBUG_PAGEALLOC here:
http://koji.fedoraproject.org/koji/taskinfo?taskID=6434128
When driver or some kernel subsystem start to write to memory which do not belongs to it, that kernel will crash. That allow detect actual broken module. You can use debug_guardpage_minorder=1 (or 2) to increase protected area, hence increase probability to find faulty driver. Please try that kernel, possibly with rt2x00 driver to increase chance to catch corruption.

If it will not detect a fault, or kernel will crash in old way, that will probably mean that problem lies in H/W. In such case you can divide memory into two half then use only first and then second half, and see which one half do not cause corruption.

Other than that you can blacklist various modules you are using (i.e. visualization, networking, sound, ... ) and see if that prevent memory corruption.
Comment 10 Alex Outhred 2014-01-24 22:52:11 UTC
Thank you for the DEBUG_PAGEALLOC kernel, which I've been running.

Here is another instance of slab corruption.

=============================================================================
BUG kmalloc-32 (Not tainted): Redzone overwritten
-----------------------------------------------------------------------------

Disabling lock debugging due to kernel taint
INFO: 0xffff880551f33dab-0xffff880551f33dab. First byte 0x8c instead of 0xcc
INFO: Allocated in radeon_fence_emit+0x2d/0x1c0 [radeon] age=119 cpu=3 pid=15902
        __slab_alloc+0x3eb/0x4fe
        kmem_cache_alloc_trace+0x2a8/0x360
        radeon_fence_emit+0x2d/0x1c0 [radeon]
        radeon_ib_schedule+0x222/0x2b0 [radeon]
        radeon_cs_ioctl+0x929/0xbb0 [radeon]
        drm_ioctl+0x512/0x650 [drm]
        do_vfs_ioctl+0x305/0x530
        SyS_ioctl+0x81/0xa0
        system_call_fastpath+0x16/0x1b
INFO: Freed in radeon_semaphore_free+0x55/0x70 [radeon] age=119 cpu=3 pid=15902
        __slab_free+0x3a/0x382
        kfree+0x2c0/0x2d0
        radeon_semaphore_free+0x55/0x70 [radeon]
        radeon_ib_schedule+0x1ce/0x2b0 [radeon]
        radeon_cs_ioctl+0x929/0xbb0 [radeon]
        drm_ioctl+0x512/0x650 [drm]
        do_vfs_ioctl+0x305/0x530
        SyS_ioctl+0x81/0xa0
        system_call_fastpath+0x16/0x1b
INFO: Slab 0xffffea001547cc80 objects=22 used=12 fp=0xffff880551f32438 flags=0x5ff00000004081
INFO: Object 0xffff880551f33d88 @offset=7560 fp=0xffff880551f325a0

Bytes b4 ffff880551f33d78: 4f e5 cd 01 01 00 00 00 5a 5a 5a 5a 5a 5a 5a 5a  O.......ZZZZZZZZ
Object ffff880551f33d88: 00 80 5f 0d 08 88 ff ff 00 00 00 00 6b 6b 6b 6b  .._.........kkkk
Object ffff880551f33d98: 00 00 00 00 00 00 00 00 00 00 00 00 6b 6b 6b a5  ............kkk.
Redzone ffff880551f33da8: cc cc cc 8c cc cc cc cc                          ........
Padding ffff880551f33ee8: 5a 5a 5a 5a 5a 5a 5a 5a                          ZZZZZZZZ
CPU: 3 PID: 15902 Comm: mplayer Tainted: G    B        3.12.8-300.bz68171.fc20.x86_64+debug #1
Hardware name: Gigabyte Technology Co., Ltd. Z87-HD3/Z87-HD3, BIOS F6 08/03/2013
 ffff880551f33d88 ffff88057b37dae0 ffffffff8174a25b ffff880819004480
 ffff88057b37db20 ffffffff811d452d 0000000000000008 ffff880500000001
 ffff880551f33dac ffff880819004480 00000000000000cc ffff880551f33d88
Call Trace:
 [<ffffffff8174a25b>] dump_stack+0x54/0x74
 [<ffffffff811d452d>] print_trailer+0x14d/0x200
 [<ffffffff811d471f>] check_bytes_and_report+0xcf/0x110
 [<ffffffff811d5562>] check_object+0x192/0x250
 [<ffffffff81747b3d>] free_debug_processing+0xb9/0x22a
 [<ffffffff817534e6>] ? _raw_spin_unlock_irqrestore+0x36/0x70
 [<ffffffffa009e7e4>] ? radeon_fence_unref+0x34/0x40 [radeon]
 [<ffffffffa009e7e4>] ? radeon_fence_unref+0x34/0x40 [radeon]
 [<ffffffff81747ce8>] __slab_free+0x3a/0x382
 [<ffffffff81391e4e>] ? debug_check_no_obj_freed+0x14e/0x250
 [<ffffffff811d6e1c>] ? kfree+0xbc/0x2d0
 [<ffffffffa009e7e4>] ? radeon_fence_unref+0x34/0x40 [radeon]
 [<ffffffff811d7020>] kfree+0x2c0/0x2d0
 [<ffffffffa009e7e4>] radeon_fence_unref+0x34/0x40 [radeon]
 [<ffffffffa009ed5e>] radeon_sync_obj_unref+0xe/0x10 [radeon]
 [<ffffffffa006f87a>] ttm_bo_wait+0x13a/0x190 [ttm]
 [<ffffffffa00a15fe>] radeon_bo_wait+0x9e/0x140 [radeon]
 [<ffffffffa00b3c52>] radeon_gem_busy_ioctl+0x52/0x130 [radeon]
 [<ffffffffa0014e82>] drm_ioctl+0x512/0x650 [drm]
 [<ffffffff8130e1d5>] ? avc_has_perm+0x25/0x350
 [<ffffffff81310773>] ? inode_has_perm.isra.48+0x53/0x80
 [<ffffffff8120c7e5>] do_vfs_ioctl+0x305/0x530
 [<ffffffff81310dab>] ? selinux_file_ioctl+0x5b/0x110
 [<ffffffff8120ca91>] SyS_ioctl+0x81/0xa0
 [<ffffffff8175d569>] system_call_fastpath+0x16/0x1b
FIX kmalloc-32: Restoring 0xffff880551f33dab-0xffff880551f33dab=0xcc

=============================================================================
BUG kmalloc-32 (Tainted: G    B       ): Redzone overwritten
-----------------------------------------------------------------------------

INFO: 0xffff880551f33da8-0xffff880551f33daf. First byte 0xcc instead of 0xbb
INFO: Allocated in radeon_fence_emit+0x2d/0x1c0 [radeon] age=998 cpu=3 pid=15902
        __slab_alloc+0x3eb/0x4fe
        kmem_cache_alloc_trace+0x2a8/0x360
        radeon_fence_emit+0x2d/0x1c0 [radeon]
        radeon_ib_schedule+0x222/0x2b0 [radeon]
        radeon_cs_ioctl+0x929/0xbb0 [radeon]
        drm_ioctl+0x512/0x650 [drm]
        do_vfs_ioctl+0x305/0x530
        SyS_ioctl+0x81/0xa0
        system_call_fastpath+0x16/0x1b
INFO: Freed in radeon_semaphore_free+0x55/0x70 [radeon] age=998 cpu=3 pid=15902
        __slab_free+0x3a/0x382
        kfree+0x2c0/0x2d0
        radeon_semaphore_free+0x55/0x70 [radeon]
        radeon_ib_schedule+0x1ce/0x2b0 [radeon]
        radeon_cs_ioctl+0x929/0xbb0 [radeon]
        drm_ioctl+0x512/0x650 [drm]
        do_vfs_ioctl+0x305/0x530
        SyS_ioctl+0x81/0xa0
        system_call_fastpath+0x16/0x1b
INFO: Slab 0xffffea001547cc80 objects=22 used=21 fp=0xffff880551f32e10 flags=0x5ff00000004080
INFO: Object 0xffff880551f33d88 @offset=7560 fp=0xffff880551f325a0

Bytes b4 ffff880551f33d78: 08 e9 cd 01 01 00 00 00 5a 5a 5a 5a 5a 5a 5a 5a  ........ZZZZZZZZ
Object ffff880551f33d88: 00 80 5f 0d 08 88 ff ff 00 00 00 00 6b 6b 6b 6b  .._.........kkkk
Object ffff880551f33d98: 00 00 00 00 00 00 00 00 00 00 00 00 6b 6b 6b a5  ............kkk.
Redzone ffff880551f33da8: cc cc cc cc cc cc cc cc                          ........
Padding ffff880551f33ee8: 5a 5a 5a 5a 5a 5a 5a 5a                          ZZZZZZZZ
CPU: 1 PID: 16782 Comm: dinoshade Tainted: G    B        3.12.8-300.bz68171.fc20.x86_64+debug #1
Hardware name: Gigabyte Technology Co., Ltd. Z87-HD3/Z87-HD3, BIOS F6 08/03/2013
 ffff880551f33d88 ffff880566ead8e8 ffffffff8174a25b ffff880819004480
 ffff880566ead928 ffffffff811d452d 0000000000000008 ffff880500000001
 ffff880551f33db0 ffff880819004480 00000000000000bb ffff880551f33d88
Call Trace:
 [<ffffffff8174a25b>] dump_stack+0x54/0x74
 [<ffffffff811d452d>] print_trailer+0x14d/0x200
 [<ffffffff811d471f>] check_bytes_and_report+0xcf/0x110
 [<ffffffff811d5562>] check_object+0x192/0x250
 [<ffffffffa00fcf5c>] ? radeon_semaphore_create+0x2c/0xe0 [radeon]
 [<ffffffff817479e2>] alloc_debug_processing+0x76/0x118
 [<ffffffff8174861d>] __slab_alloc+0x3eb/0x4fe
 [<ffffffffa00fcf5c>] ? radeon_semaphore_create+0x2c/0xe0 [radeon]
 [<ffffffffa00fdacb>] ? radeon_sa_bo_new+0x27b/0x480 [radeon]
 [<ffffffff811d7e98>] kmem_cache_alloc_trace+0x2a8/0x360
 [<ffffffffa00fcf5c>] ? radeon_semaphore_create+0x2c/0xe0 [radeon]
 [<ffffffffa00fcf5c>] radeon_semaphore_create+0x2c/0xe0 [radeon]
 [<ffffffffa00b4700>] radeon_ib_get+0x50/0x110 [radeon]
 [<ffffffffa00b749d>] radeon_cs_ioctl+0x82d/0xbb0 [radeon]
 [<ffffffffa0014e82>] drm_ioctl+0x512/0x650 [drm]
 [<ffffffff81310773>] ? inode_has_perm.isra.48+0x53/0x80
 [<ffffffff8120c7e5>] do_vfs_ioctl+0x305/0x530
 [<ffffffff81310dab>] ? selinux_file_ioctl+0x5b/0x110
 [<ffffffff8120ca91>] SyS_ioctl+0x81/0xa0
 [<ffffffff8175d569>] system_call_fastpath+0x16/0x1b
FIX kmalloc-32: Restoring 0xffff880551f33da8-0xffff880551f33daf=0xbb

FIX kmalloc-32: Marking all objects used

---

My main question: does this look like another hardware bitflip?

I guess it does, if I'm reading these correctly - if the second reported "redzone overwritten" is just a false alarm, a consequence of SLUB getting confused as to what the redzone should be after the first redzone corruption was repaired.

I still don't have a good way to reproduce these errors. I guess I'll just have to try memtest86 again.
Comment 11 Stanislaw Gruszka 2014-01-27 12:00:56 UTC
> My main question: does this look like another hardware bitflip?
Yes.

Sorry for not mension that before, it is better to run DEBUG_PAGEALLOC kernel on non-debug fedora kernel variant, as then it have more chance to catch driver corruption.

Anyway, problem unfortunately looks more like hardware issue and removing components (i.e stop using some cpu features or some other hardware by blacklisting modules, remove half of physical memory modules, etc.) can be better strategy to figure what cause corruption.
Comment 12 Stanislaw Gruszka 2014-03-01 16:52:06 UTC
Other users reported corruption on Gigabyte *87* boards as well, so this really looks like h/w problem.

Some people report that using only 2 DIMM slots helped for them:
http://www.tonymacx86.com/general-help/121042-solved-4-dimm-crashing-freezing-ga-z87.html

I think this bug can be closed as duplicate of 
https://bugzilla.kernel.org/show_bug.cgi?id=64521
Comment 13 Alan 2014-03-01 21:00:05 UTC

*** This bug has been marked as a duplicate of bug 64521 ***

Note You need to log in before you can comment on or make changes to this bug.