Most recent kernel where this bug did *NOT* occur: unknown Distribution: gentoo Hardware Environment: AMD-K6, 400MHz, 288MB Ram Software Environment: ip6sic (http://ip6sic.sourceforge.net/) Problem Description: When running ip6sic against the loopback interface i get the following kernel messages: [ 199.514486] Slab corruption: start=d0505554, len=156 [ 199.514704] Redzone: 0x5a2cf071/0x5a2cf071. [ 199.514859] Last user: [<c0465813>](kfree_skbmem+0x33/0x80) [ 199.515301] 080: 6b 6b 6b 6b 6b 6b 6b 6b 6a 6b 6b 6b 6b 6b 6b 6b [ 199.517096] Single bit error detected. Probably bad RAM. [ 199.517255] Run memtest86+ or a similar memory test tool. [ 199.517413] Prev obj: start=d05054ac, len=156 [ 199.517568] Redzone: 0x170fc2a5/0x170fc2a5. [ 199.517719] Last user: [<c0465634>](skb_clone+0x34/0x1e0) [ 199.518088] 000: b4 52 50 d0 0c 52 50 d0 08 c6 4a d0 00 00 00 00 [ 199.519851] 010: 00 00 00 00 00 00 00 00 5a 5a 5a 5a e0 a6 85 d0 [ 199.521614] Next obj: start=d05055fc, len=156 [ 199.521771] Redzone: 0x170fc2a5/0x170fc2a5. [ 199.521922] Last user: [<c0465634>](skb_clone+0x34/0x1e0) [ 199.522327] 000: 04 54 [ 199.868486] Slab corruption: start=d0505554, len=156 [ 199.869836] Redzone: 0x5a2cf071/0x5a2cf071. [ 199.870975] Last user: [<c0465813>](kfree_skbmem+0x33/0x80) [ 199.872820] 080: 6b 6b 6b 6b 6b 6b 6b 6b 6a 6b 6b 6b 6b 6b 6b 6b [ 199.879166] Single bit error detected. Probably bad RAM. [ 199.880347] Run memtest86+ or a similar memory test tool. [ 199.881539] Prev obj: start=d05054ac, len=156 [ 199.882679] Redzone: 0x170fc2a5/0x170fc2a5. [ 199.883983] Last user: [<c0465634>](skb_clone+0x34/0x1e0) [ 199.885807] 000: b4 52 50 d0 0c 52 50 d0 08 c6 4a d0 00 00 00 00 [ 199.892047] 010: 00 00 00 00 00 00 00 00 5a 5a 5a 5a e0 a6 85 d0 [ 199.898369] Next obj: start=d05055fc, len=156 [ 199.899511] Redzone: 0x170fc2a5/0x170fc2a5. [ 199.900642] Last user: [<c0465634>](skb_clone+0x34/0x1e0) [ 199.902471] 000: 04 54 50 d0 b4 52 50 d0 08 c6 4a d0 00 00 00 00 [ 199.908818] 010: 00 00 00 00 00 00 00 00 5a 5a 5a 5a ec a8 85 d0 [ 200.947125] Slab corruption: start=d0505554, len=156 [ 200.948445] Redzone: 0x5a2cf071/0x5a2cf071. [ 200.949578] Last user: [<c0465813>](kfree_skbmem+0x33/0x80) [ 200.951417] 080: 6b 6b 6b 6b 6b 6b 6b 6b 6a 6b 6b 6b 6b 6b 6b 6b [ 200.957775] Single bit error detected. Probably bad RAM. [ 200.958960] Run memtest86+ or a similar memory test tool. [ 200.960155] Prev obj: start=d05054ac, len=156 [ 200.961299] Redzone: 0x170fc2a5/0x170fc2a5. [ 200.962429] Last user: [<c0465634>](skb_clone+0x34/0x1e0) [ 200.964342] 000: b4 52 50 d0 0c 52 50 d0 08 c6 4a d0 00 00 00 00 [ 200.970601] 010: 00 00 00 00 00 00 00 00 5a 5a 5a 5a e0 a6 85 d0 [ 200.976934] Next obj: start=d05055fc, len=156 [ 200.978074] Redzone: 0x170fc2a5/0x170fc2a5. [ 200.979200] Last user: [<c0465634>](skb_clone+0x34/0x1e0) [ 200.981021] 000: 04 54 50 d0 b4 52 50 d0 08 c6 4a d0 00 00 00 00 [ 200.987374] 010: 00 00 00 00 00 00 00 00 5a 5a 5a 5a ec a8 85 d0 I compiled the kernel with network debugging, rebooted and startet ip6sic again: [ 141.573883] Slab corruption: start=d1a30b1c, len=156 [ 141.575258] Redzone: 0x5a2cf071/0x5a2cf071. [ 141.576277] Last user: [<c0465813>](kfree_skbmem+0x33/0x80) [ 141.577909] 080: 6b 6b 6b 6b 6b 6b 6b 6b 6a 6b 6b 6b 6b 6b 6b 6b [ 141.583277] Single bit error detected. Probably bad RAM. [ 141.584321] Run memtest86+ or a similar memory test tool. [ 141.585442] Prev obj: start=d1a30a74, len=156 [ 141.586452] Redzone: 0x170fc2a5/0x170fc2a5. [ 141.587453] Last user: [<c04660cc>](__alloc_skb+0x2c/0x120) [ 141.589059] 000: cc 09 a3 d1 fc fa a2 d1 00 00 00 00 00 00 00 00 [ 141.594442] 010: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 [ 141.600358] Next obj: start=d1a30bc4, len=156 [ 141.601416] Redzone: 0x170fc2a5/0x170fc2a5. [ 141.602420] Last user: [<c0465634>](skb_clone+0x34/0x1e0) [ 141.604032] 000: 6c 0c a3 d1 34 05 a3 d1 40 6e 48 d0 00 00 00 00 [ 141.609495] 010: 00 00 00 00 00 00 00 00 5a 5a 5a 5a e4 7d 24 d1 [ 141.861710] Slab corruption: start=d1a30b1c, len=156 [ 141.862832] Redzone: 0x5a2cf071/0x5a2cf071. [ 141.863844] Last user: [<c0465813>](kfree_skbmem+0x33/0x80) [ 141.865522] 080: 6b 6b 6b 6b 6b 6b 6b 6b 6a 6b 6b 6b 6b 6b 6b 6b [ 141.870941] Single bit error detected. Probably bad RAM. [ 141.871993] Run memtest86+ or a similar memory test tool. [ 141.873052] Prev obj: start=d1a30a74, len=156 [ 141.874068] Redzone: 0x170fc2a5/0x170fc2a5. [ 141.875143] Last user: [<c04660cc>](__alloc_skb+0x2c/0x120) [ 141.876758] 000: cc 09 a3 d1 fc fa a2 d1 00 00 00 00 00 00 00 00 [ 141.882185] 010: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 [ 141.887653] Next obj: start=d1a30bc4, len=156 [ 141.888668] Redzone: 0x170fc2a5/0x170fc2a5. [ 141.889671] Last user: [<c0465634>](skb_clone+0x34/0x1e0) [ 141.891274] 000: 6c 0c a3 d1 34 05 a3 d1 40 6e 48 d0 00 00 00 00 [ 141.896735] 010: 00 00 00 00 00 00 00 00 5a 5a 5a 5a e4 7d 24 d1 [ 142.160467] Slab corruption: start=d1a30b1c, len=156 [ 142.161618] Redzone: 0x5a2cf071/0x5a2cf071. [ 142.162627] Last user: [<c0465813>](kfree_skbmem+0x33/0x80) [ 142.164243] 080: 6b 6b 6b 6b 6b 6b 6b 6b 6a 6b 6b 6b 6b 6b 6b 6b [ 142.169686] Single bit error detected. Probably bad RAM. [ 142.170729] Run memtest86+ or a similar memory test tool. [ 142.171784] Prev obj: start=d1a30a74, len=156 [ 142.172796] Redzone: 0x170fc2a5/0x170fc2a5. [ 142.173798] Last user: [<c04660cc>](__alloc_skb+0x2c/0x120) [ 142.175463] 000: cc 09 a3 d1 fc fa a2 d1 00 00 00 00 00 00 00 00 [ 142.180869] 010: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 [ 142.186327] Next obj: start=d1a30bc4, len=156 [ 142.187343] Redzone: 0x170fc2a5/0x170fc2a5. [ 142.188349] Last user: [<c0465634>](skb_clone+0x34/0x1e0) [ 142.189952] 000: 6c 0c a3 d1 34 05 a3 d1 40 6e 48 d0 00 00 00 00 [ 142.195522] 010: 00 00 00 00 00 00 00 00 5a 5a 5a 5a e4 7d 24 d1 I ran memtester (userspace memtest) which gave no errors, so i did a memtest86+, the first 1 1/2 passes gave no errors, but I'll keep it running for a few more hours. Steps to reproduce: ip6sic -i lo -d ::1 -p 1000 In case there are some config options or patches i should try, just say so. I'll attach a .config in a few hours, but I guess I should let memtest86+ run for a few more passes.
Reply-To: akpm@linux-foundation.org Begin forwarded message: Date: Thu, 22 Feb 2007 07:56:27 -0800 From: bugme-daemon@bugzilla.kernel.org To: bugme-new@lists.osdl.org Subject: [Bugme-new] [Bug 8057] New: slab corruption running ip6sic http://bugzilla.kernel.org/show_bug.cgi?id=8057 Summary: slab corruption running ip6sic Kernel Version: 2.6.21-rc1 Status: NEW Severity: normal Owner: yoshfuji@linux-ipv6.org Submitter: snakebyte@gmx.de Most recent kernel where this bug did *NOT* occur: unknown Distribution: gentoo Hardware Environment: AMD-K6, 400MHz, 288MB Ram Software Environment: ip6sic (http://ip6sic.sourceforge.net/) Problem Description: When running ip6sic against the loopback interface i get the following kernel messages: [ 199.514486] Slab corruption: start=d0505554, len=156 [ 199.514704] Redzone: 0x5a2cf071/0x5a2cf071. [ 199.514859] Last user: [<c0465813>](kfree_skbmem+0x33/0x80) [ 199.515301] 080: 6b 6b 6b 6b 6b 6b 6b 6b 6a 6b 6b 6b 6b 6b 6b 6b [ 199.517096] Single bit error detected. Probably bad RAM. [ 199.517255] Run memtest86+ or a similar memory test tool. [ 199.517413] Prev obj: start=d05054ac, len=156 [ 199.517568] Redzone: 0x170fc2a5/0x170fc2a5. [ 199.517719] Last user: [<c0465634>](skb_clone+0x34/0x1e0) [ 199.518088] 000: b4 52 50 d0 0c 52 50 d0 08 c6 4a d0 00 00 00 00 [ 199.519851] 010: 00 00 00 00 00 00 00 00 5a 5a 5a 5a e0 a6 85 d0 [ 199.521614] Next obj: start=d05055fc, len=156 [ 199.521771] Redzone: 0x170fc2a5/0x170fc2a5. [ 199.521922] Last user: [<c0465634>](skb_clone+0x34/0x1e0) [ 199.522327] 000: 04 54 [ 199.868486] Slab corruption: start=d0505554, len=156 [ 199.869836] Redzone: 0x5a2cf071/0x5a2cf071. [ 199.870975] Last user: [<c0465813>](kfree_skbmem+0x33/0x80) [ 199.872820] 080: 6b 6b 6b 6b 6b 6b 6b 6b 6a 6b 6b 6b 6b 6b 6b 6b [ 199.879166] Single bit error detected. Probably bad RAM. [ 199.880347] Run memtest86+ or a similar memory test tool. [ 199.881539] Prev obj: start=d05054ac, len=156 [ 199.882679] Redzone: 0x170fc2a5/0x170fc2a5. [ 199.883983] Last user: [<c0465634>](skb_clone+0x34/0x1e0) [ 199.885807] 000: b4 52 50 d0 0c 52 50 d0 08 c6 4a d0 00 00 00 00 [ 199.892047] 010: 00 00 00 00 00 00 00 00 5a 5a 5a 5a e0 a6 85 d0 [ 199.898369] Next obj: start=d05055fc, len=156 [ 199.899511] Redzone: 0x170fc2a5/0x170fc2a5. [ 199.900642] Last user: [<c0465634>](skb_clone+0x34/0x1e0) [ 199.902471] 000: 04 54 50 d0 b4 52 50 d0 08 c6 4a d0 00 00 00 00 [ 199.908818] 010: 00 00 00 00 00 00 00 00 5a 5a 5a 5a ec a8 85 d0 [ 200.947125] Slab corruption: start=d0505554, len=156 [ 200.948445] Redzone: 0x5a2cf071/0x5a2cf071. [ 200.949578] Last user: [<c0465813>](kfree_skbmem+0x33/0x80) [ 200.951417] 080: 6b 6b 6b 6b 6b 6b 6b 6b 6a 6b 6b 6b 6b 6b 6b 6b [ 200.957775] Single bit error detected. Probably bad RAM. [ 200.958960] Run memtest86+ or a similar memory test tool. [ 200.960155] Prev obj: start=d05054ac, len=156 [ 200.961299] Redzone: 0x170fc2a5/0x170fc2a5. [ 200.962429] Last user: [<c0465634>](skb_clone+0x34/0x1e0) [ 200.964342] 000: b4 52 50 d0 0c 52 50 d0 08 c6 4a d0 00 00 00 00 [ 200.970601] 010: 00 00 00 00 00 00 00 00 5a 5a 5a 5a e0 a6 85 d0 [ 200.976934] Next obj: start=d05055fc, len=156 [ 200.978074] Redzone: 0x170fc2a5/0x170fc2a5. [ 200.979200] Last user: [<c0465634>](skb_clone+0x34/0x1e0) [ 200.981021] 000: 04 54 50 d0 b4 52 50 d0 08 c6 4a d0 00 00 00 00 [ 200.987374] 010: 00 00 00 00 00 00 00 00 5a 5a 5a 5a ec a8 85 d0 I compiled the kernel with network debugging, rebooted and startet ip6sic again: [ 141.573883] Slab corruption: start=d1a30b1c, len=156 [ 141.575258] Redzone: 0x5a2cf071/0x5a2cf071. [ 141.576277] Last user: [<c0465813>](kfree_skbmem+0x33/0x80) [ 141.577909] 080: 6b 6b 6b 6b 6b 6b 6b 6b 6a 6b 6b 6b 6b 6b 6b 6b [ 141.583277] Single bit error detected. Probably bad RAM. [ 141.584321] Run memtest86+ or a similar memory test tool. [ 141.585442] Prev obj: start=d1a30a74, len=156 [ 141.586452] Redzone: 0x170fc2a5/0x170fc2a5. [ 141.587453] Last user: [<c04660cc>](__alloc_skb+0x2c/0x120) [ 141.589059] 000: cc 09 a3 d1 fc fa a2 d1 00 00 00 00 00 00 00 00 [ 141.594442] 010: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 [ 141.600358] Next obj: start=d1a30bc4, len=156 [ 141.601416] Redzone: 0x170fc2a5/0x170fc2a5. [ 141.602420] Last user: [<c0465634>](skb_clone+0x34/0x1e0) [ 141.604032] 000: 6c 0c a3 d1 34 05 a3 d1 40 6e 48 d0 00 00 00 00 [ 141.609495] 010: 00 00 00 00 00 00 00 00 5a 5a 5a 5a e4 7d 24 d1 [ 141.861710] Slab corruption: start=d1a30b1c, len=156 [ 141.862832] Redzone: 0x5a2cf071/0x5a2cf071. [ 141.863844] Last user: [<c0465813>](kfree_skbmem+0x33/0x80) [ 141.865522] 080: 6b 6b 6b 6b 6b 6b 6b 6b 6a 6b 6b 6b 6b 6b 6b 6b [ 141.870941] Single bit error detected. Probably bad RAM. [ 141.871993] Run memtest86+ or a similar memory test tool. [ 141.873052] Prev obj: start=d1a30a74, len=156 [ 141.874068] Redzone: 0x170fc2a5/0x170fc2a5. [ 141.875143] Last user: [<c04660cc>](__alloc_skb+0x2c/0x120) [ 141.876758] 000: cc 09 a3 d1 fc fa a2 d1 00 00 00 00 00 00 00 00 [ 141.882185] 010: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 [ 141.887653] Next obj: start=d1a30bc4, len=156 [ 141.888668] Redzone: 0x170fc2a5/0x170fc2a5. [ 141.889671] Last user: [<c0465634>](skb_clone+0x34/0x1e0) [ 141.891274] 000: 6c 0c a3 d1 34 05 a3 d1 40 6e 48 d0 00 00 00 00 [ 141.896735] 010: 00 00 00 00 00 00 00 00 5a 5a 5a 5a e4 7d 24 d1 [ 142.160467] Slab corruption: start=d1a30b1c, len=156 [ 142.161618] Redzone: 0x5a2cf071/0x5a2cf071. [ 142.162627] Last user: [<c0465813>](kfree_skbmem+0x33/0x80) [ 142.164243] 080: 6b 6b 6b 6b 6b 6b 6b 6b 6a 6b 6b 6b 6b 6b 6b 6b [ 142.169686] Single bit error detected. Probably bad RAM. [ 142.170729] Run memtest86+ or a similar memory test tool. [ 142.171784] Prev obj: start=d1a30a74, len=156 [ 142.172796] Redzone: 0x170fc2a5/0x170fc2a5. [ 142.173798] Last user: [<c04660cc>](__alloc_skb+0x2c/0x120) [ 142.175463] 000: cc 09 a3 d1 fc fa a2 d1 00 00 00 00 00 00 00 00 [ 142.180869] 010: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 [ 142.186327] Next obj: start=d1a30bc4, len=156 [ 142.187343] Redzone: 0x170fc2a5/0x170fc2a5. [ 142.188349] Last user: [<c0465634>](skb_clone+0x34/0x1e0) [ 142.189952] 000: 6c 0c a3 d1 34 05 a3 d1 40 6e 48 d0 00 00 00 00 [ 142.195522] 010: 00 00 00 00 00 00 00 00 5a 5a 5a 5a e4 7d 24 d1 I ran memtester (userspace memtest) which gave no errors, so i did a memtest86+, the first 1 1/2 passes gave no errors, but I'll keep it running for a few more hours. Steps to reproduce: ip6sic -i lo -d ::1 -p 1000 In case there are some config options or patches i should try, just say so. I'll attach a .config in a few hours, but I guess I should let memtest86+ run for a few more passes. ------- You are receiving this mail because: ------- You are on the CC list for the bug, or are watching someone who is.
Created attachment 10500 [details] .config
Memtest86+ ran for 9 hours (7 passes) and detected no errors, so it looks like the memory is fine. Just to be sure, i retried triggering this, and after ~20 packets of ip6sic generated data the slab got corrupted again: [ 148.087390] Slab corruption: start=d1a301ec, len=156 [ 148.088686] Redzone: 0x5a2cf071/0x5a2cf071. [ 148.089727] Last user: [<c0465813>](kfree_skbmem+0x33/0x80) [ 148.091387] 080: 6b 6b 6b 6b 6b 6b 6b 6b 6a 6b 6b 6b 6b 6b 6b 6b [ 148.096855] Single bit error detected. Probably bad RAM. [ 148.097930] Run memtest86+ or a similar memory test tool. [ 148.099014] Prev obj: start=d1a30144, len=156 [ 148.100054] Redzone: 0x170fc2a5/0x170fc2a5. [ 148.101076] Last user: [<c0465634>](skb_clone+0x34/0x1e0) [ 148.102797] 000: 7c 68 3c d1 8c 04 a3 d1 08 66 4a d0 00 00 00 00 [ 148.108207] 010: 00 00 00 00 00 00 00 00 5a 5a 5a 5a 9c 71 75 d1 [ 148.114101] Next obj: start=d1a30294, len=156 [ 148.115187] Redzone: 0x170fc2a5/0x170fc2a5. [ 148.116221] Last user: [<c0465634>](skb_clone+0x34/0x1e0) [ 148.117853] 000: 9c 00 a3 d1 7c 68 3c d1 08 66 4a d0 00 00 00 00 [ 148.123317] 010: 00 00 00 00 00 00 00 00 5a 5a 5a 5a 30 6e 4a d0
I cannot reproduce this so far. Anyone else?
Created attachment 10505 [details] Boot log
Created attachment 10506 [details] Packet send by ip6sic
Did some more tests today. I was unable to reproduce this with udpsic6, icmpsic6 or tcpsic6 from the ipsic package. To get a clean pcap dump, and be sure the corruption was created by exactly two packages, i rebooted and ran ip6sic -d ::1 -i lo -f 100 -T 0 -I 0 -c 100 -4 100 -6 0 -U 0 -p 2 -w test.pcap which gave me the following messages, the trylock failure is new, never seen that one before. [ 165.734936] Slab corruption: start=d1a30144, len=156 [ 165.735054] BUG: spinlock trylock failure on UP on CPU#0, sshd/3636 [ 165.735073] lock: d1f85180, .magic: dead4ead, .owner: sshd/3636, .owner_cpu: 0 [ 165.735091] [<c01042ba>] show_trace_log_lvl+0x1a/0x40 [ 165.735136] [<c0104a92>] show_trace+0x12/0x20 [ 165.735158] [<c0104b99>] dump_stack+0x19/0x20 [ 165.735180] [<c038542d>] spin_bug+0x8d/0xe0 [ 165.735228] [<c03854bd>] _raw_spin_trylock+0x3d/0x60 [ 165.735253] [<c0539afa>] _spin_trylock+0x1a/0x80 [ 165.735289] [<c0476530>] netpoll_send_skb+0x90/0x160 [ 165.735317] [<c0477504>] netpoll_send_udp+0x204/0x280 [ 165.735342] [<c041bc46>] write_msg+0x46/0x80 [ 165.735382] [<c0119428>] __call_console_drivers+0x48/0x60 [ 165.735412] [<c0119484>] _call_console_drivers+0x44/0x80 [ 165.735433] [<c01195e6>] release_console_sem+0xe6/0x200 [ 165.735456] [<c0119d32>] vprintk+0x1b2/0x380 [ 165.735479] [<c0119f1b>] printk+0x1b/0x20 [ 165.735500] [<c016651f>] check_poison_obj+0x15f/0x1e0 [ 165.735533] [<c0166650>] cache_alloc_debugcheck_after+0xb0/0x180 [ 165.735557] [<c0167b63>] kmem_cache_alloc+0x63/0xe0 [ 165.735579] [<c0465634>] skb_clone+0x34/0x1e0 [ 165.735613] [<c046adf1>] dev_hard_start_xmit+0xb1/0x260 [ 165.735639] [<c047900d>] __qdisc_run+0xad/0x1c0 [ 165.735663] [<c046cab5>] dev_queue_xmit+0x1b5/0x280 [ 165.735687] [<c0489f7b>] ip_output+0x13b/0x240 [ 165.735716] [<c04894c2>] ip_queue_xmit+0x1c2/0x480 [ 165.735736] [<c04990a2>] tcp_transmit_skb+0x482/0x760 [ 165.735771] [<c049a98c>] __tcp_push_pending_frames+0x10c/0x8a0 [ 165.735796] [<c048fd3a>] tcp_sendmsg+0x77a/0xb40 [ 165.735817] [<c04ab72e>] inet_sendmsg+0x2e/0x60 [ 165.735854] [<c045f396>] sock_aio_write+0xf6/0x120 [ 165.735876] [<c016abed>] do_sync_write+0xcd/0x120 [ 165.735901] [<c016b5a5>] vfs_write+0x145/0x160 [ 165.735923] [<c016bb38>] sys_write+0x38/0x80 [ 165.735943] [<c0102cb0>] syscall_call+0x7/0xb [ 165.735964] ======================= [ 165.738660] Redzone: 0x5a2cf071/0x5a2cf071. [ 165.738753] Last user: [<c0465813>](kfree_skbmem+0x33/0x80) [ 165.739032] 080: 6b 6b 6b 6b 6b 6b 6b 6b 6a 6b 6b 6b 6b 6b 6b 6b [ 165.740223] Single bit error detected. Probably bad RAM. 0c 0f a3 d1 00 00 00 [ 165.742061] 010: 00 00 00 00 68 33 37 d1 [ 165.743184] Next obj: start=d1a301ec, len=156 [ 165.743323] Redzone: 0x170fc2a5/0x170fc2a5. [ 165.743410] Last user: [<c04660cc>](__alloc_skb+0x2c/0x120) [ 165.743691] 000: 54 05 2b c1 00 00 00 00 00 00 00 00 [ 165.744809] 010: 00 00 00 00 00 50 f8 d1 00 00 00 00 60 90 a7 d1 So, another reboot, just one package send this time: ip6sic -d ::1 -i lo -f 100 -T 0 -I 0 -c 100 -4 100 -6 0 -U 0 -p 1 -w test2.pcap (test2.pcap is attached) [ 131.597361] Slab corruption: start=d17b7ddc, len=156 [ 131.597503] BUG: spinlock trylock failure on UP on CPU#0, sshd/3642 [ 131.597522] lock: d1dce180, .magic: dead4ead, .owner: sshd/3642, .owner_cpu: 0 [ 131.597540] [<c01042ba>] show_trace_log_lvl+0x1a/0x40 [ 131.597583] [<c0104a92>] show_trace+0x12/0x20 [ 131.597604] [<c0104b99>] dump_stack+0x19/0x20 [ 131.597625] [<c038542d>] spin_bug+0x8d/0xe0 [ 131.597667] [<c03854bd>] _raw_spin_trylock+0x3d/0x60 [ 131.597689] [<c0539afa>] _spin_trylock+0x1a/0x80 [ 131.597727] [<c0476530>] netpoll_send_skb+0x90/0x160 [ 131.597754] [<c0477504>] netpoll_send_udp+0x204/0x280 [ 131.597776] [<c041bc46>] write_msg+0x46/0x80 [ 131.597808] [<c0119428>] __call_console_drivers+0x48/0x60 [ 131.597835] [<c0119484>] _call_console_drivers+0x44/0x80 [ 131.597856] [<c01195e6>] release_console_sem+0xe6/0x200 [ 131.597878] [<c0119d32>] vprintk+0x1b2/0x380 [ 131.597899] [<c0119f1b>] printk+0x1b/0x20 [ 131.597917] [<c016651f>] check_poison_obj+0x15f/0x1e0 [ 131.597948] [<c0166650>] cache_alloc_debugcheck_after+0xb0/0x180 [ 131.597971] [<c0167b63>] kmem_cache_alloc+0x63/0xe0 [ 131.597992] [<c0465634>] skb_clone+0x34/0x1e0 [ 131.598024] [<c046adf1>] dev_hard_start_xmit+0xb1/0x260 [ 131.598050] [<c047900d>] __qdisc_run+0xad/0x1c0 [ 131.598073] [<c046cab5>] dev_queue_xmit+0x1b5/0x280 [ 131.598095] [<c0489f7b>] ip_output+0x13b/0x240 [ 131.598124] [<c04894c2>] ip_queue_xmit+0x1c2/0x480 [ 131.598145] [<c04990a2>] tcp_transmit_skb+0x482/0x760 [ 131.598177] [<c049a98c>] __tcp_push_pending_frames+0x10c/0x8a0 [ 131.598202] [<c048fd3a>] tcp_sendmsg+0x77a/0xb40 [ 131.598223] [<c04ab72e>] inet_sendmsg+0x2e/0x60 [ 131.598257] [<c045f396>] sock_aio_write+0xf6/0x120 [ 131.598278] [<c016abed>] do_sync_write+0xcd/0x120 [ 131.598303] [<c016b5a5>] vfs_write+0x145/0x160 [ 131.598325] [<c016bb38>] sys_write+0x38/0x80 [ 131.598345] [<c0102cb0>] syscall_call+0x7/0xb [ 131.598363] ======================= [ 131.600987] Redzone: 0x5a2cf071/0x5a2cf071. [ 131.601082] Last user: [<c0465813>](kfree_skbmem+0x33/0x80) [ 131.601368] 080: 6b 6b 6b 6b 6b 6b 6b 6b 6a 6b 6b 6b 6b 6b 6b 6b [ 131.602572] Single bit error detected. Probably bad RAM. [ 131.602664] Run memtest86+ or a similar memory test tool. [ 131.602758] Prev obj: start=d17b7d34, len=156 [ 131.602848] Redzone: 0x170fc2a5/0x170fc2a5. [ 131.602936] Last user: [<c0465634>](skb_clone+0x34/0x1e0) [ 131.603214] 000: 8c 7c 7b d1 7c 00 2b c1 08 26 4d d0 00 00 00 00 [ 131.604499] 010: 00 00 00 00 00 00 00 00 5a 5a 5a 5a d8 ba a7 d1 [ 131.605628] Next obj: start=d17b7e84, len=156 [ 131.605721] Redzone: 0x170fc2a5/0x170fc2a5. [ 131.605811] Last user: [<c04660cc>](__alloc_skb+0x2c/0x120) [ 131.606103] 000: 4c 77 7b d1 2c 7f 7b d1 00 00 00 00 00 00 00 00 [ 131.607244] 010: 00 00 00 00 00 e0 dc d1 00 00 00 00 fc 98 52 d0
To provide some more information, i ran memtester again which gave no errors. I then updated from gcc 4.1.1 (Gentoo 4.1.1-r3) to 4.1.2 (Gentoo 4.1.2). Made make clean, make mrproper and completely rebuild the kernel (same config). I neither got slab warnings during the gcc build nor the kernel build. Doing a replay of the captured traffic (using tcpreplay) and got slab corruption again.
I just set up a UML on another computer, to _really_ make sure it is not the ram, and i also get slab corruption when running ip6sic Slab corruption: start=0e0c2124, len=152 Redzone: 0x5a2cf071/0x5a2cf071. Last user: [<0819a473>](kfree_skbmem+0x43/0xa0) 080: 6b 6b 6b 6b 6a 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b Single bit error detected. Probably bad RAM. Run a memory test tool. Prev obj: start=0e0c2080, len=152 Redzone: 0x170fc2a5/0x170fc2a5. Last user: [<0819a61f>](skb_clone+0x3f/0x1b0) 000: 3c 2f 0c 0e c8 21 0c 0e 60 56 0c 0e 00 00 00 00 010: 00 00 00 00 00 00 00 00 5a 5a 5a 5a b0 2c 39 08 Next obj: start=0e0c21c8, len=152 Redzone: 0x170fc2a5/0x170fc2a5. Last user: [<0819a61f>](skb_clone+0x3f/0x1b0) 000: 80 20 0c 0e 78 29 0c 0e 60 56 0c 0e 00 00 00 00 010: 00 00 00 00 00 00 00 00 5a 5a 5a 5a 8c 20 39 08 Slab corruption: start=0e0c2124, len=152 Redzone: 0x5a2cf071/0x5a2cf071. Last user: [<0819a473>](kfree_skbmem+0x43/0xa0) 080: 6b 6b 6b 6b 6a 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b Single bit error detected. Probably bad RAM. Run a memory test tool. Prev obj: start=0e0c2080, len=152 Redzone: 0x170fc2a5/0x170fc2a5. Last user: [<0819a61f>](skb_clone+0x3f/0x1b0) 000: 3c 2f 0c 0e c8 21 0c 0e 60 56 0c 0e 00 00 00 00 010: 00 00 00 00 00 00 00 00 5a 5a 5a 5a b0 2c 39 08 Next obj: start=0e0c21c8, len=152 Redzone: 0x170fc2a5/0x170fc2a5. Last user: [<0819a61f>](skb_clone+0x3f/0x1b0) 000: 80 20 0c 0e 78 29 0c 0e 60 56 0c 0e 00 00 00 00 010: 00 00 00 00 00 00 00 00 5a 5a 5a 5a 8c 20 39 08
Created attachment 10533 [details] stripped down config played around with the .config and it turns out, with this config switching INET6_IPCOMP off makes this disappear, switching it back on makes it appear again.
Created attachment 10538 [details] user mode .config Just tought I should add the UML .config too, maybe that helps you reproducing this. Btw: did you have slab debugging enabled when trying?
Since the toggled bit is always at the same offset of the slab memory block, and the memory was alwayslast touched by kfree_skbmem(), I guess it makes sense that this is an allocated struct sk_buff. pahole tells me that offset 0x84 matches the atomic_t users with my .config. Can this be a double kfree_skb(struct sk_buff *skb)? We free it the first time since the users count got decreased to zero, slab sets his magic bytes, we pass it to kfree_skb() a second time, which does his atomic_dec_and_test(&skb->users) check, which returns false, since the byte just got decreased to 0x6a from the 0x6b slab magic, and we wont free it a second time.
To verify my thesis in the previous entry, i applied the following patch, which basically just checks if we try to kfree_skb() a previously freed chunk: --- linux/net/core/skbuff.c.orig 2007-02-28 11:34:13.865540564 +0100 +++ linux/net/core/skbuff.c 2007-02-28 11:52:45.437717125 +0100 @@ -407,6 +407,10 @@ void kfree_skb(struct sk_buff *skb) { if (unlikely(!skb)) return; +#ifdef CONFIG_DEBUG_SLAB + WARN_ON(unlikely((skb->users.counter & 0xFFFF) == 0x6b6b)); +#endif + if (likely(atomic_read(&skb->users) == 1)) smp_rmb(); else if (likely(!atomic_dec_and_test(&skb->users))) With this i get the following: [ 153.609611] BUG: at net/core/skbuff.c:411 kfree_skb() [ 153.609786] [<c01042ba>] show_trace_log_lvl+0x1a/0x40 [ 153.610018] [<c0104a92>] show_trace+0x12/0x20 [ 153.610229] [<c0104b99>] dump_stack+0x19/0x20 [ 153.610438] [<c0465917>] kfree_skb+0x57/0x80 [ 153.610657] [<c04e4144>] tunnel46_rcv+0x64/0xa0 [ 153.610873] [<c04bf3ca>] ip6_input+0xca/0x2e0 [ 153.611093] [<c04bf898>] ipv6_rcv+0x218/0x320 [ 153.611304] [<c046a997>] netif_receive_skb+0x197/0x2e0 [ 153.611516] [<c046c526>] process_backlog+0x86/0x100 [ 153.611725] [<c046c789>] net_rx_action+0xa9/0x1e0 [ 153.611930] [<c011e1fb>] __do_softirq+0x5b/0xc0 [ 153.612147] [<c0105e48>] do_softirq+0x88/0xe0 [ 153.612361] [<c011e4a4>] local_bh_enable+0xa4/0x160 [ 153.612572] [<c046c958>] dev_queue_xmit+0x98/0x280 [ 153.612782] [<c04e5d8a>] packet_sendmsg+0x1ea/0x240 [ 153.612991] [<c045ffac>] sock_sendmsg+0xcc/0x100 [ 153.613202] [<c0460995>] sys_sendto+0xb5/0xe0 [ 153.613412] [<c04612a2>] sys_socketcall+0x1a2/0x260 [ 153.613623] [<c0102cb0>] syscall_call+0x7/0xb [ 153.613827] ======================= Is it possible that the handler frees the skb even if it is not supposed to do so?
the ipcomp handler is xfrm6_rcv(), which calls xfrm6_rcv_spi(), which contrary to all other handlers returns -1 instead of 0 after calling kfree_skb() on the skb. Changing the return value to 0 in xfrm6_input.c:xfrm6_rcv_spi() fixes the problem. But I got no clue at all if this would be a correct fix
On 22-02-2007 22:49, Andrew Morton wrote: > > Begin forwarded message: > > Date: Thu, 22 Feb 2007 07:56:27 -0800 > From: bugme-daemon@bugzilla.kernel.org > To: bugme-new@lists.osdl.org > Subject: [Bugme-new] [Bug 8057] New: slab corruption running ip6sic > > > http://bugzilla.kernel.org/show_bug.cgi?id=8057 > > Summary: slab corruption running ip6sic > Kernel Version: 2.6.21-rc1 > Status: NEW > Severity: normal > Owner: yoshfuji@linux-ipv6.org > Submitter: snakebyte@gmx.de > > > Most recent kernel where this bug did *NOT* occur: unknown > Distribution: gentoo > Hardware Environment: AMD-K6, 400MHz, 288MB Ram > Software Environment: ip6sic (http://ip6sic.sourceforge.net/) > Problem Description: > > When running ip6sic against the loopback interface i get the following kernel > messages: > > [ 199.514486] Slab corruption: start=d0505554, len=156 > [ 199.514704] Redzone: 0x5a2cf071/0x5a2cf071. > [ 199.514859] Last user: [<c0465813>](kfree_skbmem+0x33/0x80) ... From bugzilla: ... > Is it possible that the handler frees the skb even if it is not supposed to do so? > > > ------- Additional Comment #14 From Eric Sesterhenn 2007-02-28 04:33 ------- > > the ipcomp handler is xfrm6_rcv(), which calls xfrm6_rcv_spi(), which contrary > to all other handlers returns -1 instead of 0 after calling kfree_skb() on the > skb. Changing the return value to 0 in xfrm6_input.c:xfrm6_rcv_spi() fixes the > problem. > But I got no clue at all if this would be a correct fix I think your diagnose is correct (all "return -1" should be changed to "return 0" in xfrm6_input.c). Regards, Jarek P.
On Mon, Mar 12, 2007 at 11:24:03AM +0100, Jarek Poplawski wrote: ... > I think your diagnose is correct (all "return -1" should be > changed to "return 0" in xfrm6_input.c). Sorry! Of course should be: I think your diagnose is correct (all "return -1" should be changed to "return 0" in xfrm6_rcv_spi()). Jarek P.
Created attachment 10715 [details] patch to change the return statements Here is a proper patch for this. Signed-off-by: Eric Sesterhenn <snakebyte@gmx.de>
Created attachment 10717 [details] corrected patch forgot a return statement Signed-off-by: Eric Sesterhenn <snakebyte@gmx.de>
Created attachment 11274 [details] fix-slab-corruption-running-ip6sic.patch to keep bugzilla in sync with the netdev discussion, a patch for this went into -mm http://marc.info/?t=117218115200001&r=1&w=2