Bug 8057 - slab corruption running ip6sic
Summary: slab corruption running ip6sic
Status: CLOSED PATCH_ALREADY_AVAILABLE
Alias: None
Product: Networking
Classification: Unclassified
Component: IPV6 (show other bugs)
Hardware: i386 Linux
: P2 normal
Assignee: Hideaki YOSHIFUJI
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2007-02-22 07:56 UTC by Eric Sesterhenn
Modified: 2007-04-25 15:40 UTC (History)
0 users

See Also:
Kernel Version: 2.6.21-rc1
Subsystem:
Regression: ---
Bisected commit-id:


Attachments
.config (35.23 KB, text/plain)
2007-02-22 15:13 UTC, Eric Sesterhenn
Details
Boot log (12.43 KB, text/plain)
2007-02-23 05:42 UTC, Eric Sesterhenn
Details
Packet send by ip6sic (1.03 KB, text/plain)
2007-02-23 05:43 UTC, Eric Sesterhenn
Details
stripped down config (33.33 KB, text/plain)
2007-02-25 15:39 UTC, Eric Sesterhenn
Details
user mode .config (13.78 KB, text/plain)
2007-02-26 05:49 UTC, Eric Sesterhenn
Details
patch to change the return statements (583 bytes, patch)
2007-03-12 04:46 UTC, Eric Sesterhenn
Details | Diff
corrected patch (751 bytes, patch)
2007-03-12 06:28 UTC, Eric Sesterhenn
Details | Diff
fix-slab-corruption-running-ip6sic.patch (455 bytes, patch)
2007-04-25 15:40 UTC, Eric Sesterhenn
Details | Diff

Description Eric Sesterhenn 2007-02-22 07:56:25 UTC
Most recent kernel where this bug did *NOT* occur: unknown
Distribution: gentoo
Hardware Environment: AMD-K6, 400MHz, 288MB Ram
Software Environment: ip6sic (http://ip6sic.sourceforge.net/)
Problem Description:

When running ip6sic against the loopback interface i get the following kernel
messages:

[  199.514486] Slab corruption: start=d0505554, len=156
[  199.514704] Redzone: 0x5a2cf071/0x5a2cf071.
[  199.514859] Last user: [<c0465813>](kfree_skbmem+0x33/0x80)
[  199.515301] 080: 6b 6b 6b 6b 6b 6b 6b 6b 6a 6b 6b 6b 6b 6b 6b 6b
[  199.517096] Single bit error detected. Probably bad RAM.
[  199.517255] Run memtest86+ or a similar memory test tool.
[  199.517413] Prev obj: start=d05054ac, len=156
[  199.517568] Redzone: 0x170fc2a5/0x170fc2a5.
[  199.517719] Last user: [<c0465634>](skb_clone+0x34/0x1e0)
[  199.518088] 000: b4 52 50 d0 0c 52 50 d0 08 c6 4a d0 00 00 00 00
[  199.519851] 010: 00 00 00 00 00 00 00 00 5a 5a 5a 5a e0 a6 85 d0
[  199.521614] Next obj: start=d05055fc, len=156
[  199.521771] Redzone: 0x170fc2a5/0x170fc2a5.
[  199.521922] Last user: [<c0465634>](skb_clone+0x34/0x1e0)
[  199.522327] 000: 04 54
[  199.868486] Slab corruption: start=d0505554, len=156
[  199.869836] Redzone: 0x5a2cf071/0x5a2cf071.
[  199.870975] Last user: [<c0465813>](kfree_skbmem+0x33/0x80)
[  199.872820] 080: 6b 6b 6b 6b 6b 6b 6b 6b 6a 6b 6b 6b 6b 6b 6b 6b
[  199.879166] Single bit error detected. Probably bad RAM.
[  199.880347] Run memtest86+ or a similar memory test tool.
[  199.881539] Prev obj: start=d05054ac, len=156
[  199.882679] Redzone: 0x170fc2a5/0x170fc2a5.
[  199.883983] Last user: [<c0465634>](skb_clone+0x34/0x1e0)
[  199.885807] 000: b4 52 50 d0 0c 52 50 d0 08 c6 4a d0 00 00 00 00
[  199.892047] 010: 00 00 00 00 00 00 00 00 5a 5a 5a 5a e0 a6 85 d0
[  199.898369] Next obj: start=d05055fc, len=156
[  199.899511] Redzone: 0x170fc2a5/0x170fc2a5.
[  199.900642] Last user: [<c0465634>](skb_clone+0x34/0x1e0)
[  199.902471] 000: 04 54 50 d0 b4 52 50 d0 08 c6 4a d0 00 00 00 00
[  199.908818] 010: 00 00 00 00 00 00 00 00 5a 5a 5a 5a ec a8 85 d0
[  200.947125] Slab corruption: start=d0505554, len=156
[  200.948445] Redzone: 0x5a2cf071/0x5a2cf071.
[  200.949578] Last user: [<c0465813>](kfree_skbmem+0x33/0x80)
[  200.951417] 080: 6b 6b 6b 6b 6b 6b 6b 6b 6a 6b 6b 6b 6b 6b 6b 6b
[  200.957775] Single bit error detected. Probably bad RAM.
[  200.958960] Run memtest86+ or a similar memory test tool.
[  200.960155] Prev obj: start=d05054ac, len=156
[  200.961299] Redzone: 0x170fc2a5/0x170fc2a5.
[  200.962429] Last user: [<c0465634>](skb_clone+0x34/0x1e0)
[  200.964342] 000: b4 52 50 d0 0c 52 50 d0 08 c6 4a d0 00 00 00 00
[  200.970601] 010: 00 00 00 00 00 00 00 00 5a 5a 5a 5a e0 a6 85 d0
[  200.976934] Next obj: start=d05055fc, len=156
[  200.978074] Redzone: 0x170fc2a5/0x170fc2a5.
[  200.979200] Last user: [<c0465634>](skb_clone+0x34/0x1e0)
[  200.981021] 000: 04 54 50 d0 b4 52 50 d0 08 c6 4a d0 00 00 00 00
[  200.987374] 010: 00 00 00 00 00 00 00 00 5a 5a 5a 5a ec a8 85 d0

I compiled the kernel with network debugging, rebooted and startet ip6sic again:

[  141.573883] Slab corruption: start=d1a30b1c, len=156
[  141.575258] Redzone: 0x5a2cf071/0x5a2cf071.
[  141.576277] Last user: [<c0465813>](kfree_skbmem+0x33/0x80)
[  141.577909] 080: 6b 6b 6b 6b 6b 6b 6b 6b 6a 6b 6b 6b 6b 6b 6b 6b
[  141.583277] Single bit error detected. Probably bad RAM.
[  141.584321] Run memtest86+ or a similar memory test tool.
[  141.585442] Prev obj: start=d1a30a74, len=156
[  141.586452] Redzone: 0x170fc2a5/0x170fc2a5.
[  141.587453] Last user: [<c04660cc>](__alloc_skb+0x2c/0x120)
[  141.589059] 000: cc 09 a3 d1 fc fa a2 d1 00 00 00 00 00 00 00 00
[  141.594442] 010: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[  141.600358] Next obj: start=d1a30bc4, len=156
[  141.601416] Redzone: 0x170fc2a5/0x170fc2a5.
[  141.602420] Last user: [<c0465634>](skb_clone+0x34/0x1e0)
[  141.604032] 000: 6c 0c a3 d1 34 05 a3 d1 40 6e 48 d0 00 00 00 00
[  141.609495] 010: 00 00 00 00 00 00 00 00 5a 5a 5a 5a e4 7d 24 d1
[  141.861710] Slab corruption: start=d1a30b1c, len=156
[  141.862832] Redzone: 0x5a2cf071/0x5a2cf071.
[  141.863844] Last user: [<c0465813>](kfree_skbmem+0x33/0x80)
[  141.865522] 080: 6b 6b 6b 6b 6b 6b 6b 6b 6a 6b 6b 6b 6b 6b 6b 6b
[  141.870941] Single bit error detected. Probably bad RAM.
[  141.871993] Run memtest86+ or a similar memory test tool.
[  141.873052] Prev obj: start=d1a30a74, len=156
[  141.874068] Redzone: 0x170fc2a5/0x170fc2a5.
[  141.875143] Last user: [<c04660cc>](__alloc_skb+0x2c/0x120)
[  141.876758] 000: cc 09 a3 d1 fc fa a2 d1 00 00 00 00 00 00 00 00
[  141.882185] 010: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[  141.887653] Next obj: start=d1a30bc4, len=156
[  141.888668] Redzone: 0x170fc2a5/0x170fc2a5.
[  141.889671] Last user: [<c0465634>](skb_clone+0x34/0x1e0)
[  141.891274] 000: 6c 0c a3 d1 34 05 a3 d1 40 6e 48 d0 00 00 00 00
[  141.896735] 010: 00 00 00 00 00 00 00 00 5a 5a 5a 5a e4 7d 24 d1
[  142.160467] Slab corruption: start=d1a30b1c, len=156
[  142.161618] Redzone: 0x5a2cf071/0x5a2cf071.
[  142.162627] Last user: [<c0465813>](kfree_skbmem+0x33/0x80)
[  142.164243] 080: 6b 6b 6b 6b 6b 6b 6b 6b 6a 6b 6b 6b 6b 6b 6b 6b
[  142.169686] Single bit error detected. Probably bad RAM.
[  142.170729] Run memtest86+ or a similar memory test tool.
[  142.171784] Prev obj: start=d1a30a74, len=156
[  142.172796] Redzone: 0x170fc2a5/0x170fc2a5.
[  142.173798] Last user: [<c04660cc>](__alloc_skb+0x2c/0x120)
[  142.175463] 000: cc 09 a3 d1 fc fa a2 d1 00 00 00 00 00 00 00 00
[  142.180869] 010: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[  142.186327] Next obj: start=d1a30bc4, len=156
[  142.187343] Redzone: 0x170fc2a5/0x170fc2a5.
[  142.188349] Last user: [<c0465634>](skb_clone+0x34/0x1e0)
[  142.189952] 000: 6c 0c a3 d1 34 05 a3 d1 40 6e 48 d0 00 00 00 00
[  142.195522] 010: 00 00 00 00 00 00 00 00 5a 5a 5a 5a e4 7d 24 d1

I ran memtester (userspace memtest) which gave no errors, so i did a memtest86+,
the first 1 1/2 passes gave no errors, but I'll keep it running for a few more
hours.

Steps to reproduce:
ip6sic -i lo -d ::1 -p 1000

In case there are some config options or patches i should try, just say so. I'll
attach a .config in a few hours, but I guess I should let memtest86+ run for a
few more passes.
Comment 1 Anonymous Emailer 2007-02-22 13:49:22 UTC
Reply-To: akpm@linux-foundation.org



Begin forwarded message:

Date: Thu, 22 Feb 2007 07:56:27 -0800
From: bugme-daemon@bugzilla.kernel.org
To: bugme-new@lists.osdl.org
Subject: [Bugme-new] [Bug 8057] New: slab corruption running ip6sic


http://bugzilla.kernel.org/show_bug.cgi?id=8057

           Summary: slab corruption running ip6sic
    Kernel Version: 2.6.21-rc1
            Status: NEW
          Severity: normal
             Owner: yoshfuji@linux-ipv6.org
         Submitter: snakebyte@gmx.de


Most recent kernel where this bug did *NOT* occur: unknown
Distribution: gentoo
Hardware Environment: AMD-K6, 400MHz, 288MB Ram
Software Environment: ip6sic (http://ip6sic.sourceforge.net/)
Problem Description:

When running ip6sic against the loopback interface i get the following kernel
messages:

[  199.514486] Slab corruption: start=d0505554, len=156
[  199.514704] Redzone: 0x5a2cf071/0x5a2cf071.
[  199.514859] Last user: [<c0465813>](kfree_skbmem+0x33/0x80)
[  199.515301] 080: 6b 6b 6b 6b 6b 6b 6b 6b 6a 6b 6b 6b 6b 6b 6b 6b
[  199.517096] Single bit error detected. Probably bad RAM.
[  199.517255] Run memtest86+ or a similar memory test tool.
[  199.517413] Prev obj: start=d05054ac, len=156
[  199.517568] Redzone: 0x170fc2a5/0x170fc2a5.
[  199.517719] Last user: [<c0465634>](skb_clone+0x34/0x1e0)
[  199.518088] 000: b4 52 50 d0 0c 52 50 d0 08 c6 4a d0 00 00 00 00
[  199.519851] 010: 00 00 00 00 00 00 00 00 5a 5a 5a 5a e0 a6 85 d0
[  199.521614] Next obj: start=d05055fc, len=156
[  199.521771] Redzone: 0x170fc2a5/0x170fc2a5.
[  199.521922] Last user: [<c0465634>](skb_clone+0x34/0x1e0)
[  199.522327] 000: 04 54
[  199.868486] Slab corruption: start=d0505554, len=156
[  199.869836] Redzone: 0x5a2cf071/0x5a2cf071.
[  199.870975] Last user: [<c0465813>](kfree_skbmem+0x33/0x80)
[  199.872820] 080: 6b 6b 6b 6b 6b 6b 6b 6b 6a 6b 6b 6b 6b 6b 6b 6b
[  199.879166] Single bit error detected. Probably bad RAM.
[  199.880347] Run memtest86+ or a similar memory test tool.
[  199.881539] Prev obj: start=d05054ac, len=156
[  199.882679] Redzone: 0x170fc2a5/0x170fc2a5.
[  199.883983] Last user: [<c0465634>](skb_clone+0x34/0x1e0)
[  199.885807] 000: b4 52 50 d0 0c 52 50 d0 08 c6 4a d0 00 00 00 00
[  199.892047] 010: 00 00 00 00 00 00 00 00 5a 5a 5a 5a e0 a6 85 d0
[  199.898369] Next obj: start=d05055fc, len=156
[  199.899511] Redzone: 0x170fc2a5/0x170fc2a5.
[  199.900642] Last user: [<c0465634>](skb_clone+0x34/0x1e0)
[  199.902471] 000: 04 54 50 d0 b4 52 50 d0 08 c6 4a d0 00 00 00 00
[  199.908818] 010: 00 00 00 00 00 00 00 00 5a 5a 5a 5a ec a8 85 d0
[  200.947125] Slab corruption: start=d0505554, len=156
[  200.948445] Redzone: 0x5a2cf071/0x5a2cf071.
[  200.949578] Last user: [<c0465813>](kfree_skbmem+0x33/0x80)
[  200.951417] 080: 6b 6b 6b 6b 6b 6b 6b 6b 6a 6b 6b 6b 6b 6b 6b 6b
[  200.957775] Single bit error detected. Probably bad RAM.
[  200.958960] Run memtest86+ or a similar memory test tool.
[  200.960155] Prev obj: start=d05054ac, len=156
[  200.961299] Redzone: 0x170fc2a5/0x170fc2a5.
[  200.962429] Last user: [<c0465634>](skb_clone+0x34/0x1e0)
[  200.964342] 000: b4 52 50 d0 0c 52 50 d0 08 c6 4a d0 00 00 00 00
[  200.970601] 010: 00 00 00 00 00 00 00 00 5a 5a 5a 5a e0 a6 85 d0
[  200.976934] Next obj: start=d05055fc, len=156
[  200.978074] Redzone: 0x170fc2a5/0x170fc2a5.
[  200.979200] Last user: [<c0465634>](skb_clone+0x34/0x1e0)
[  200.981021] 000: 04 54 50 d0 b4 52 50 d0 08 c6 4a d0 00 00 00 00
[  200.987374] 010: 00 00 00 00 00 00 00 00 5a 5a 5a 5a ec a8 85 d0

I compiled the kernel with network debugging, rebooted and startet ip6sic again:

[  141.573883] Slab corruption: start=d1a30b1c, len=156
[  141.575258] Redzone: 0x5a2cf071/0x5a2cf071.
[  141.576277] Last user: [<c0465813>](kfree_skbmem+0x33/0x80)
[  141.577909] 080: 6b 6b 6b 6b 6b 6b 6b 6b 6a 6b 6b 6b 6b 6b 6b 6b
[  141.583277] Single bit error detected. Probably bad RAM.
[  141.584321] Run memtest86+ or a similar memory test tool.
[  141.585442] Prev obj: start=d1a30a74, len=156
[  141.586452] Redzone: 0x170fc2a5/0x170fc2a5.
[  141.587453] Last user: [<c04660cc>](__alloc_skb+0x2c/0x120)
[  141.589059] 000: cc 09 a3 d1 fc fa a2 d1 00 00 00 00 00 00 00 00
[  141.594442] 010: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[  141.600358] Next obj: start=d1a30bc4, len=156
[  141.601416] Redzone: 0x170fc2a5/0x170fc2a5.
[  141.602420] Last user: [<c0465634>](skb_clone+0x34/0x1e0)
[  141.604032] 000: 6c 0c a3 d1 34 05 a3 d1 40 6e 48 d0 00 00 00 00
[  141.609495] 010: 00 00 00 00 00 00 00 00 5a 5a 5a 5a e4 7d 24 d1
[  141.861710] Slab corruption: start=d1a30b1c, len=156
[  141.862832] Redzone: 0x5a2cf071/0x5a2cf071.
[  141.863844] Last user: [<c0465813>](kfree_skbmem+0x33/0x80)
[  141.865522] 080: 6b 6b 6b 6b 6b 6b 6b 6b 6a 6b 6b 6b 6b 6b 6b 6b
[  141.870941] Single bit error detected. Probably bad RAM.
[  141.871993] Run memtest86+ or a similar memory test tool.
[  141.873052] Prev obj: start=d1a30a74, len=156
[  141.874068] Redzone: 0x170fc2a5/0x170fc2a5.
[  141.875143] Last user: [<c04660cc>](__alloc_skb+0x2c/0x120)
[  141.876758] 000: cc 09 a3 d1 fc fa a2 d1 00 00 00 00 00 00 00 00
[  141.882185] 010: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[  141.887653] Next obj: start=d1a30bc4, len=156
[  141.888668] Redzone: 0x170fc2a5/0x170fc2a5.
[  141.889671] Last user: [<c0465634>](skb_clone+0x34/0x1e0)
[  141.891274] 000: 6c 0c a3 d1 34 05 a3 d1 40 6e 48 d0 00 00 00 00
[  141.896735] 010: 00 00 00 00 00 00 00 00 5a 5a 5a 5a e4 7d 24 d1
[  142.160467] Slab corruption: start=d1a30b1c, len=156
[  142.161618] Redzone: 0x5a2cf071/0x5a2cf071.
[  142.162627] Last user: [<c0465813>](kfree_skbmem+0x33/0x80)
[  142.164243] 080: 6b 6b 6b 6b 6b 6b 6b 6b 6a 6b 6b 6b 6b 6b 6b 6b
[  142.169686] Single bit error detected. Probably bad RAM.
[  142.170729] Run memtest86+ or a similar memory test tool.
[  142.171784] Prev obj: start=d1a30a74, len=156
[  142.172796] Redzone: 0x170fc2a5/0x170fc2a5.
[  142.173798] Last user: [<c04660cc>](__alloc_skb+0x2c/0x120)
[  142.175463] 000: cc 09 a3 d1 fc fa a2 d1 00 00 00 00 00 00 00 00
[  142.180869] 010: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[  142.186327] Next obj: start=d1a30bc4, len=156
[  142.187343] Redzone: 0x170fc2a5/0x170fc2a5.
[  142.188349] Last user: [<c0465634>](skb_clone+0x34/0x1e0)
[  142.189952] 000: 6c 0c a3 d1 34 05 a3 d1 40 6e 48 d0 00 00 00 00
[  142.195522] 010: 00 00 00 00 00 00 00 00 5a 5a 5a 5a e4 7d 24 d1

I ran memtester (userspace memtest) which gave no errors, so i did a memtest86+,
the first 1 1/2 passes gave no errors, but I'll keep it running for a few more
hours.

Steps to reproduce:
ip6sic -i lo -d ::1 -p 1000

In case there are some config options or patches i should try, just say so. I'll
attach a .config in a few hours, but I guess I should let memtest86+ run for a
few more passes.

------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.

Comment 2 Eric Sesterhenn 2007-02-22 15:13:31 UTC
Created attachment 10500 [details]
.config
Comment 3 Eric Sesterhenn 2007-02-22 15:15:02 UTC
Memtest86+ ran for 9 hours (7 passes) and detected no errors, so it looks like
the memory is fine. Just to be sure, i retried triggering this, and after ~20
packets of ip6sic generated data the slab got corrupted again:

[  148.087390] Slab corruption: start=d1a301ec, len=156
[  148.088686] Redzone: 0x5a2cf071/0x5a2cf071.
[  148.089727] Last user: [<c0465813>](kfree_skbmem+0x33/0x80)
[  148.091387] 080: 6b 6b 6b 6b 6b 6b 6b 6b 6a 6b 6b 6b 6b 6b 6b 6b
[  148.096855] Single bit error detected. Probably bad RAM.
[  148.097930] Run memtest86+ or a similar memory test tool.
[  148.099014] Prev obj: start=d1a30144, len=156
[  148.100054] Redzone: 0x170fc2a5/0x170fc2a5.
[  148.101076] Last user: [<c0465634>](skb_clone+0x34/0x1e0)
[  148.102797] 000: 7c 68 3c d1 8c 04 a3 d1 08 66 4a d0 00 00 00 00
[  148.108207] 010: 00 00 00 00 00 00 00 00 5a 5a 5a 5a 9c 71 75 d1
[  148.114101] Next obj: start=d1a30294, len=156
[  148.115187] Redzone: 0x170fc2a5/0x170fc2a5.
[  148.116221] Last user: [<c0465634>](skb_clone+0x34/0x1e0)
[  148.117853] 000: 9c 00 a3 d1 7c 68 3c d1 08 66 4a d0 00 00 00 00
[  148.123317] 010: 00 00 00 00 00 00 00 00 5a 5a 5a 5a 30 6e 4a d0
Comment 4 Hideaki YOSHIFUJI 2007-02-22 18:40:39 UTC
I cannot reproduce this so far.
Anyone else?
Comment 5 Eric Sesterhenn 2007-02-23 05:42:15 UTC
Created attachment 10505 [details]
Boot log
Comment 6 Eric Sesterhenn 2007-02-23 05:43:14 UTC
Created attachment 10506 [details]
Packet send by ip6sic
Comment 7 Eric Sesterhenn 2007-02-23 05:44:46 UTC
Did some more tests today. I was unable to reproduce this with udpsic6, icmpsic6
or tcpsic6 from the ipsic package. 

To get a clean pcap dump, and be sure the corruption was created by exactly two
packages, i rebooted and ran

ip6sic -d ::1 -i lo -f 100 -T 0 -I 0 -c 100 -4 100 -6 0 -U 0 -p 2 -w test.pcap

which gave me the following messages, the trylock failure is new, never seen
that one before.

[  165.734936] Slab corruption: start=d1a30144, len=156
[  165.735054] BUG: spinlock trylock failure on UP on CPU#0, sshd/3636
[  165.735073]  lock: d1f85180, .magic: dead4ead, .owner: sshd/3636, .owner_cpu: 0
[  165.735091]  [<c01042ba>] show_trace_log_lvl+0x1a/0x40
[  165.735136]  [<c0104a92>] show_trace+0x12/0x20
[  165.735158]  [<c0104b99>] dump_stack+0x19/0x20
[  165.735180]  [<c038542d>] spin_bug+0x8d/0xe0
[  165.735228]  [<c03854bd>] _raw_spin_trylock+0x3d/0x60
[  165.735253]  [<c0539afa>] _spin_trylock+0x1a/0x80
[  165.735289]  [<c0476530>] netpoll_send_skb+0x90/0x160
[  165.735317]  [<c0477504>] netpoll_send_udp+0x204/0x280
[  165.735342]  [<c041bc46>] write_msg+0x46/0x80
[  165.735382]  [<c0119428>] __call_console_drivers+0x48/0x60
[  165.735412]  [<c0119484>] _call_console_drivers+0x44/0x80
[  165.735433]  [<c01195e6>] release_console_sem+0xe6/0x200
[  165.735456]  [<c0119d32>] vprintk+0x1b2/0x380
[  165.735479]  [<c0119f1b>] printk+0x1b/0x20
[  165.735500]  [<c016651f>] check_poison_obj+0x15f/0x1e0
[  165.735533]  [<c0166650>] cache_alloc_debugcheck_after+0xb0/0x180
[  165.735557]  [<c0167b63>] kmem_cache_alloc+0x63/0xe0
[  165.735579]  [<c0465634>] skb_clone+0x34/0x1e0
[  165.735613]  [<c046adf1>] dev_hard_start_xmit+0xb1/0x260
[  165.735639]  [<c047900d>] __qdisc_run+0xad/0x1c0
[  165.735663]  [<c046cab5>] dev_queue_xmit+0x1b5/0x280
[  165.735687]  [<c0489f7b>] ip_output+0x13b/0x240
[  165.735716]  [<c04894c2>] ip_queue_xmit+0x1c2/0x480
[  165.735736]  [<c04990a2>] tcp_transmit_skb+0x482/0x760
[  165.735771]  [<c049a98c>] __tcp_push_pending_frames+0x10c/0x8a0
[  165.735796]  [<c048fd3a>] tcp_sendmsg+0x77a/0xb40
[  165.735817]  [<c04ab72e>] inet_sendmsg+0x2e/0x60
[  165.735854]  [<c045f396>] sock_aio_write+0xf6/0x120
[  165.735876]  [<c016abed>] do_sync_write+0xcd/0x120
[  165.735901]  [<c016b5a5>] vfs_write+0x145/0x160
[  165.735923]  [<c016bb38>] sys_write+0x38/0x80
[  165.735943]  [<c0102cb0>] syscall_call+0x7/0xb
[  165.735964]  =======================
[  165.738660] Redzone: 0x5a2cf071/0x5a2cf071.
[  165.738753] Last user: [<c0465813>](kfree_skbmem+0x33/0x80)
[  165.739032] 080: 6b 6b 6b 6b 6b 6b 6b 6b 6a 6b 6b 6b 6b 6b 6b 6b
[  165.740223] Single bit error detected. Probably bad RAM.
 0c 0f a3 d1 00 00 00
[  165.742061] 010: 00 00 00 00 68 33 37 d1
[  165.743184] Next obj: start=d1a301ec, len=156
[  165.743323] Redzone: 0x170fc2a5/0x170fc2a5.
[  165.743410] Last user: [<c04660cc>](__alloc_skb+0x2c/0x120)
[  165.743691] 000: 54 05 2b c1 00 00 00 00 00 00 00 00
[  165.744809] 010: 00 00 00 00 00 50 f8 d1 00 00 00 00 60 90 a7 d1


So, another reboot, just one package send this time:
ip6sic -d ::1 -i lo -f 100 -T 0 -I 0 -c 100 -4 100 -6 0 -U 0 -p 1 -w test2.pcap
(test2.pcap is attached) 

[  131.597361] Slab corruption: start=d17b7ddc, len=156
[  131.597503] BUG: spinlock trylock failure on UP on CPU#0, sshd/3642
[  131.597522]  lock: d1dce180, .magic: dead4ead, .owner: sshd/3642, .owner_cpu: 0
[  131.597540]  [<c01042ba>] show_trace_log_lvl+0x1a/0x40
[  131.597583]  [<c0104a92>] show_trace+0x12/0x20
[  131.597604]  [<c0104b99>] dump_stack+0x19/0x20
[  131.597625]  [<c038542d>] spin_bug+0x8d/0xe0
[  131.597667]  [<c03854bd>] _raw_spin_trylock+0x3d/0x60
[  131.597689]  [<c0539afa>] _spin_trylock+0x1a/0x80
[  131.597727]  [<c0476530>] netpoll_send_skb+0x90/0x160
[  131.597754]  [<c0477504>] netpoll_send_udp+0x204/0x280
[  131.597776]  [<c041bc46>] write_msg+0x46/0x80
[  131.597808]  [<c0119428>] __call_console_drivers+0x48/0x60
[  131.597835]  [<c0119484>] _call_console_drivers+0x44/0x80
[  131.597856]  [<c01195e6>] release_console_sem+0xe6/0x200
[  131.597878]  [<c0119d32>] vprintk+0x1b2/0x380
[  131.597899]  [<c0119f1b>] printk+0x1b/0x20
[  131.597917]  [<c016651f>] check_poison_obj+0x15f/0x1e0
[  131.597948]  [<c0166650>] cache_alloc_debugcheck_after+0xb0/0x180
[  131.597971]  [<c0167b63>] kmem_cache_alloc+0x63/0xe0
[  131.597992]  [<c0465634>] skb_clone+0x34/0x1e0
[  131.598024]  [<c046adf1>] dev_hard_start_xmit+0xb1/0x260
[  131.598050]  [<c047900d>] __qdisc_run+0xad/0x1c0
[  131.598073]  [<c046cab5>] dev_queue_xmit+0x1b5/0x280
[  131.598095]  [<c0489f7b>] ip_output+0x13b/0x240
[  131.598124]  [<c04894c2>] ip_queue_xmit+0x1c2/0x480
[  131.598145]  [<c04990a2>] tcp_transmit_skb+0x482/0x760
[  131.598177]  [<c049a98c>] __tcp_push_pending_frames+0x10c/0x8a0
[  131.598202]  [<c048fd3a>] tcp_sendmsg+0x77a/0xb40
[  131.598223]  [<c04ab72e>] inet_sendmsg+0x2e/0x60
[  131.598257]  [<c045f396>] sock_aio_write+0xf6/0x120
[  131.598278]  [<c016abed>] do_sync_write+0xcd/0x120
[  131.598303]  [<c016b5a5>] vfs_write+0x145/0x160
[  131.598325]  [<c016bb38>] sys_write+0x38/0x80
[  131.598345]  [<c0102cb0>] syscall_call+0x7/0xb
[  131.598363]  =======================
[  131.600987] Redzone: 0x5a2cf071/0x5a2cf071.
[  131.601082] Last user: [<c0465813>](kfree_skbmem+0x33/0x80)
[  131.601368] 080: 6b 6b 6b 6b 6b 6b 6b 6b 6a 6b 6b 6b 6b 6b 6b 6b
[  131.602572] Single bit error detected. Probably bad RAM.
[  131.602664] Run memtest86+ or a similar memory test tool.
[  131.602758] Prev obj: start=d17b7d34, len=156
[  131.602848] Redzone: 0x170fc2a5/0x170fc2a5.
[  131.602936] Last user: [<c0465634>](skb_clone+0x34/0x1e0)
[  131.603214] 000: 8c 7c 7b d1 7c 00 2b c1 08 26 4d d0 00 00 00 00
[  131.604499] 010: 00 00 00 00 00 00 00 00 5a 5a 5a 5a d8 ba a7 d1
[  131.605628] Next obj: start=d17b7e84, len=156
[  131.605721] Redzone: 0x170fc2a5/0x170fc2a5.
[  131.605811] Last user: [<c04660cc>](__alloc_skb+0x2c/0x120)
[  131.606103] 000: 4c 77 7b d1 2c 7f 7b d1 00 00 00 00 00 00 00 00
[  131.607244] 010: 00 00 00 00 00 e0 dc d1 00 00 00 00 fc 98 52 d0
Comment 8 Eric Sesterhenn 2007-02-23 15:05:40 UTC
To provide some more information, i ran memtester again which gave no errors. I
then updated from gcc 4.1.1 (Gentoo 4.1.1-r3) to 4.1.2 (Gentoo 4.1.2). Made make
clean, make mrproper and completely rebuild the kernel (same config). I neither
got slab warnings during the gcc build nor the kernel build. Doing a replay of
the captured traffic (using tcpreplay) and got slab corruption again. 
Comment 9 Eric Sesterhenn 2007-02-23 18:18:37 UTC
I just set up a UML on another computer, to _really_ make sure it is not the
ram, and i also get slab corruption when running ip6sic

Slab corruption: start=0e0c2124, len=152
Redzone: 0x5a2cf071/0x5a2cf071.
Last user: [<0819a473>](kfree_skbmem+0x43/0xa0)
080: 6b 6b 6b 6b 6a 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b
Single bit error detected. Probably bad RAM.
Run a memory test tool.
Prev obj: start=0e0c2080, len=152
Redzone: 0x170fc2a5/0x170fc2a5.
Last user: [<0819a61f>](skb_clone+0x3f/0x1b0)
000: 3c 2f 0c 0e c8 21 0c 0e 60 56 0c 0e 00 00 00 00
010: 00 00 00 00 00 00 00 00 5a 5a 5a 5a b0 2c 39 08
Next obj: start=0e0c21c8, len=152
Redzone: 0x170fc2a5/0x170fc2a5.
Last user: [<0819a61f>](skb_clone+0x3f/0x1b0)
000: 80 20 0c 0e 78 29 0c 0e 60 56 0c 0e 00 00 00 00
010: 00 00 00 00 00 00 00 00 5a 5a 5a 5a 8c 20 39 08
Slab corruption: start=0e0c2124, len=152
Redzone: 0x5a2cf071/0x5a2cf071.
Last user: [<0819a473>](kfree_skbmem+0x43/0xa0)
080: 6b 6b 6b 6b 6a 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b
Single bit error detected. Probably bad RAM.
Run a memory test tool.
Prev obj: start=0e0c2080, len=152
Redzone: 0x170fc2a5/0x170fc2a5.
Last user: [<0819a61f>](skb_clone+0x3f/0x1b0)
000: 3c 2f 0c 0e c8 21 0c 0e 60 56 0c 0e 00 00 00 00
010: 00 00 00 00 00 00 00 00 5a 5a 5a 5a b0 2c 39 08
Next obj: start=0e0c21c8, len=152
Redzone: 0x170fc2a5/0x170fc2a5.
Last user: [<0819a61f>](skb_clone+0x3f/0x1b0)
000: 80 20 0c 0e 78 29 0c 0e 60 56 0c 0e 00 00 00 00
010: 00 00 00 00 00 00 00 00 5a 5a 5a 5a 8c 20 39 08
Comment 10 Eric Sesterhenn 2007-02-25 15:39:17 UTC
Created attachment 10533 [details]
stripped down config

played around with the .config and it turns out, with this config switching
INET6_IPCOMP off makes this disappear, switching it back on makes it appear
again.
Comment 11 Eric Sesterhenn 2007-02-26 05:49:40 UTC
Created attachment 10538 [details]
user mode .config

Just tought I should add the UML .config too, maybe that helps you reproducing
this. Btw: did you have slab debugging enabled when trying?
Comment 12 Eric Sesterhenn 2007-02-27 18:21:07 UTC
Since the toggled bit is always at the same offset of the slab memory block, and
the memory was alwayslast touched by kfree_skbmem(), I guess it makes sense that
this is an allocated struct sk_buff. pahole tells me that offset 0x84 matches 
the atomic_t users with my .config.

Can this be a double kfree_skb(struct sk_buff *skb)?

We free it the first time since the users count got decreased to zero, slab sets
his magic bytes, we pass it to kfree_skb() a second time, which does his
atomic_dec_and_test(&skb->users) check, which returns false, since the byte just
got decreased to 0x6a from the 0x6b slab magic, and we wont
free it a second time.
Comment 13 Eric Sesterhenn 2007-02-28 03:21:03 UTC
To verify my thesis in the previous entry, i applied the following patch,
which basically just checks if we try to kfree_skb() a previously freed chunk:

--- linux/net/core/skbuff.c.orig        2007-02-28 11:34:13.865540564 +0100
+++ linux/net/core/skbuff.c     2007-02-28 11:52:45.437717125 +0100
@@ -407,6 +407,10 @@ void kfree_skb(struct sk_buff *skb)
 {
        if (unlikely(!skb))
                return;
+#ifdef CONFIG_DEBUG_SLAB
+       WARN_ON(unlikely((skb->users.counter & 0xFFFF) == 0x6b6b));
+#endif
+
        if (likely(atomic_read(&skb->users) == 1))
                smp_rmb();
        else if (likely(!atomic_dec_and_test(&skb->users)))


With this i get the following:

[  153.609611] BUG: at net/core/skbuff.c:411 kfree_skb()
[  153.609786]  [<c01042ba>] show_trace_log_lvl+0x1a/0x40
[  153.610018]  [<c0104a92>] show_trace+0x12/0x20
[  153.610229]  [<c0104b99>] dump_stack+0x19/0x20
[  153.610438]  [<c0465917>] kfree_skb+0x57/0x80
[  153.610657]  [<c04e4144>] tunnel46_rcv+0x64/0xa0
[  153.610873]  [<c04bf3ca>] ip6_input+0xca/0x2e0
[  153.611093]  [<c04bf898>] ipv6_rcv+0x218/0x320
[  153.611304]  [<c046a997>] netif_receive_skb+0x197/0x2e0
[  153.611516]  [<c046c526>] process_backlog+0x86/0x100
[  153.611725]  [<c046c789>] net_rx_action+0xa9/0x1e0
[  153.611930]  [<c011e1fb>] __do_softirq+0x5b/0xc0
[  153.612147]  [<c0105e48>] do_softirq+0x88/0xe0
[  153.612361]  [<c011e4a4>] local_bh_enable+0xa4/0x160
[  153.612572]  [<c046c958>] dev_queue_xmit+0x98/0x280
[  153.612782]  [<c04e5d8a>] packet_sendmsg+0x1ea/0x240
[  153.612991]  [<c045ffac>] sock_sendmsg+0xcc/0x100
[  153.613202]  [<c0460995>] sys_sendto+0xb5/0xe0
[  153.613412]  [<c04612a2>] sys_socketcall+0x1a2/0x260
[  153.613623]  [<c0102cb0>] syscall_call+0x7/0xb
[  153.613827]  =======================

Is it possible that the handler frees the skb even if it is not supposed to do so?
Comment 14 Eric Sesterhenn 2007-02-28 04:33:10 UTC
the ipcomp handler is xfrm6_rcv(), which calls xfrm6_rcv_spi(), which contrary
to all other handlers returns -1 instead of 0 after calling kfree_skb() on the
skb. Changing the return value to 0 in xfrm6_input.c:xfrm6_rcv_spi() fixes the
problem.
But I got no clue at all if this would be a correct fix
Comment 15 Jarek Poplawski 2007-03-12 03:19:59 UTC
On 22-02-2007 22:49, Andrew Morton wrote:
> 
> Begin forwarded message:
> 
> Date: Thu, 22 Feb 2007 07:56:27 -0800
> From: bugme-daemon@bugzilla.kernel.org
> To: bugme-new@lists.osdl.org
> Subject: [Bugme-new] [Bug 8057] New: slab corruption running ip6sic
> 
> 
> http://bugzilla.kernel.org/show_bug.cgi?id=8057
> 
>            Summary: slab corruption running ip6sic
>     Kernel Version: 2.6.21-rc1
>             Status: NEW
>           Severity: normal
>              Owner: yoshfuji@linux-ipv6.org
>          Submitter: snakebyte@gmx.de
> 
> 
> Most recent kernel where this bug did *NOT* occur: unknown
> Distribution: gentoo
> Hardware Environment: AMD-K6, 400MHz, 288MB Ram
> Software Environment: ip6sic (http://ip6sic.sourceforge.net/)
> Problem Description:
> 
> When running ip6sic against the loopback interface i get the following kernel
> messages:
> 
> [  199.514486] Slab corruption: start=d0505554, len=156
> [  199.514704] Redzone: 0x5a2cf071/0x5a2cf071.
> [  199.514859] Last user: [<c0465813>](kfree_skbmem+0x33/0x80)
...

From bugzilla:
...
> Is it possible that the handler frees the skb even if it is not supposed to do so?
> 
> 
> ------- Additional Comment #14 From Eric Sesterhenn 2007-02-28 04:33 -------
> 
> the ipcomp handler is xfrm6_rcv(), which calls xfrm6_rcv_spi(), which contrary
> to all other handlers returns -1 instead of 0 after calling kfree_skb() on the
> skb. Changing the return value to 0 in xfrm6_input.c:xfrm6_rcv_spi() fixes the
> problem.
> But I got no clue at all if this would be a correct fix

I think your diagnose is correct (all "return -1" should be
changed to "return 0" in xfrm6_input.c).

Regards,
Jarek P.

Comment 16 Jarek Poplawski 2007-03-12 03:25:49 UTC
On Mon, Mar 12, 2007 at 11:24:03AM +0100, Jarek Poplawski wrote:
...
> I think your diagnose is correct (all "return -1" should be
> changed to "return 0" in xfrm6_input.c).

Sorry! Of course should be:

I think your diagnose is correct (all "return -1" should be
changed to "return 0" in xfrm6_rcv_spi()).

Jarek P.

Comment 17 Eric Sesterhenn 2007-03-12 04:46:14 UTC
Created attachment 10715 [details]
patch to change the return statements

Here is a proper patch for this.
Signed-off-by: Eric Sesterhenn <snakebyte@gmx.de>
Comment 18 Eric Sesterhenn 2007-03-12 06:28:41 UTC
Created attachment 10717 [details]
corrected patch

forgot a return statement
Signed-off-by: Eric Sesterhenn <snakebyte@gmx.de>
Comment 19 Eric Sesterhenn 2007-04-25 15:40:31 UTC
Created attachment 11274 [details]
fix-slab-corruption-running-ip6sic.patch

to keep bugzilla in sync with the netdev discussion, a patch for this went into
-mm
http://marc.info/?t=117218115200001&r=1&w=2

Note You need to log in before you can comment on or make changes to this bug.