Bug 6802
Summary: | pktgen cause kernel oops with transmit load balanced bonding | ||
---|---|---|---|
Product: | Networking | Reporter: | Chen-Li Tien (cltien) |
Component: | Other | Assignee: | Arnaldo Carvalho de Melo (acme) |
Status: | RESOLVED PATCH_ALREADY_AVAILABLE | ||
Severity: | high | ||
Priority: | P2 | ||
Hardware: | i386 | ||
OS: | Linux | ||
Kernel Version: | 2.4.32, 2.6.17.2 | Subsystem: | |
Regression: | --- | Bisected commit-id: |
Description
Chen-Li Tien
2006-07-07 07:35:30 UTC
I forgot to mention that the kernel I used has been modified so only 127.0.x.x are in loopback address, 127.1.16.1 is the destination machine in the LAN. I run tcpdump in the destination machine, it shows as following: 17:40:57.801701 127.1.18.1.discard > 127.1.16.1.discard: udp 18 17:40:57.801740 127.1.18.1.discard > 127.1.16.1.discard: udp 18 17:40:57.801802 127.1.18.1.discard > 127.1.16.1.discard: udp 18 17:40:57.801841 127.1.18.1.discard > 127.1.16.1.discard: udp 18 17:40:57.801880 127.1.18.1.discard > 127.1.16.1.discard: udp 18 17:40:57.801919 127.1.18.1.discard > 127.1.16.1.discard: udp 18 17:40:57.801958 127.1.18.1.discard > 127.1.16.1.discard: udp 18 17:40:57.801997 127.1.18.1.discard > 127.1.16.1.discard: udp 18 17:40:57.802036 127.1.18.1.discard > 127.1.16.1.discard: udp 18 17:40:57.802075 127.1.18.1.discard > 127.1.16.1.discard: udp 18 17:40:57.802114 127.1.18.1.discard > 127.1.16.1.discard: udp 18 Is this normal to get packets as discard? This is identical to the result if packets sent to eth0, however. Chen-Li Tien Before step 2, you need to load the bonding driver using transmit load balance: modprobe bonding mode=balance-tlb The balance-alb will have the same problem, but it depends on ethernet device driver. The default balance-rr mode has no such a problem. Chen-Li Tien On Fri, 7 Jul 2006 07:37:52 -0700 bugme-daemon@bugzilla.kernel.org wrote: > http://bugzilla.kernel.org/show_bug.cgi?id=6802 > > Summary: pktgen cause kernel oops with transmit load balanced > bonding > Kernel Version: 2.4.32, 2.6.17.2 > Status: NEW > Severity: high > Owner: acme@conectiva.com.br > Submitter: cltien@gmail.com > > > Most recent kernel where this bug did not occur: > not found > Distribution: > Fedora Core 4 > > Hardware Environment: > i386, ppc > > Software Environment: > gcc > > Problem Description: > If running the kernel packet generator (pktgen), and the output device is set > to bonding interface with mode balance-tlb or balance-alb, then there will be > kernel oops. > > I only set the odev, dst and count (as 0 for infinite test) for the pktgen. I > wonder if I made mistake for the pktgen parameters but it doesn't cause problem > if the odev set to physical device such as eth0, etc. > > My investigations shows that the problem happen when the bond_alb_xmit tries to > access the daddr fields of IP header in skb->nh.iph. If I did the same in round- > robin mode, it can generate oops too. > > Steps to reproduce: > 1. Build kernel with pktgen (CONFIG_NET_PKTGEN) and bonding driver > (CONFIG_BONDING). > 2. Setup bonding interface. > ifenslave bond0 eth0 > 3. Create a script for starting packet generator, > the script I start packet generator for kernel 2.4 series is like following: > -----------cut here -------- > #! /bin/sh > > modprobe pktgen > > PGDEV=/proc/net/pktgen/pg0 > > function pgset() { > local result > > echo $1 > $PGDEV > > result=`cat $PGDEV | fgrep "Result: OK:"` > if [ "$result" = "" ]; then > cat $PGDEV | fgrep Result: > fi > } > > function pg() { > echo inject > $PGDEV > cat $PGDEV > } > > pgset "odev bond0" > pgset "dst 127.1.16.1" > pgset "count 0" > pg > -----------cut here -------- > > 4. If the odev is set to eth0, the this script will not have problem, problem > only happen when it is set to bond0. > Please send (via an emailed reply-to-all) a copy of the oops output. Thanks. Output of kernel 2.6.17.4: BUG: unable to handle kernel NULL pointer dereference at virtual address 00000010 printing eip: e0988e2e *pde = 00000000 Oops: 0000 [#1] Modules linked in: pktgen bonding ip_tables x_tables microcode ata_piix libata CPU: 0 EIP: 0060:[<e0988e2e>] Not tainted VLI EFLAGS: 00010213 (2.6.17.4 #1) EIP is at bond_alb_xmit+0xb2/0x1c9 [bonding] eax: 00000000 ebx: df526ba0 ecx: 00000005 edx: dd8f0ee8 esi: df53b6a3 edi: e098ad51 ebp: dca23f18 esp: dca23f00 ds: 007b es: 007b ss: 0068 Process pktgen/0 (pid: 2104, threadinfo=dca22000 task=ddd10520) Stack: df53b6a2 00000001 df526cb8 df526940 dde8e16c df526940 dca23fe4 e097a324 dd8f0ee8 df526940 5a5a5a5a 5a5a5a5a 5a5a5a5a 5a5a5a5a 5a5a5a5a 5a5a5a5a 00000000 00000000 dca22000 5a5a5a5a 5a5a5a5a 5a5a5a5a 5a5a5a5a bc5e5016 Call Trace: <c0102a82> show_stack_log_lvl+0x87/0x8f <c0102bd3> show_registers+0x112/0x17b <c0102d8e> die+0xda/0x19f <c010b228> do_page_fault+0x467/0x551 <c0102727> error_code+0x4f/0x54 <e097a324> pktgen_thread_worker+0x3b1/0x790 [pktgen] <c0100d3d> kernel_thread_helper+0x5/0xb Code: 74 69 81 fa dd 86 00 00 74 3f e9 b5 00 00 00 fc 8b 75 e8 bf 50 ad 98 e0 b9 06 00 00 00 f3 a6 0f 84 a3 00 00 00 8b 55 08 8b 42 20 <83> 78 10 ff 0f 84 93 00 00 00 80 78 09 02 0f 84 89 00 00 00 8d EIP: [<e0988e2e>] bond_alb_xmit+0xb2/0x1c9 [bonding] SS:ESP 0068:dca23f00 <0>Kernel panic - not syncing: Fatal exception in interrupt The script I ran with kernel 2.6 is: #! /bin/sh modprobe pktgen PGDEV=/proc/net/pktgen/bond0 PGCTL=/proc/net/pktgen/pgctrl function pgset() { local result echo $1 > $PGDEV result=`cat $PGDEV | fgrep "Result: OK:"` if [ "$result" = "" ]; then cat $PGDEV | fgrep Result: fi } function pg() { echo start > $PGCTL cat $PGDEV } echo "add_device bond0" > /proc/net/pktgen/kpktgend_0 pgset "frags 5" # packet will consist of 5 fragments pgset "dst 192.168.0.1" pgset "count 0" # sets number of packets to send, set to zero # for continious sends untill explicitly # stopped. pg 2006/7/7, Andrew Morton <akpm@osdl.org>: > On Fri, 7 Jul 2006 07:37:52 -0700 > bugme-daemon@bugzilla.kernel.org wrote: > > > http://bugzilla.kernel.org/show_bug.cgi?id=6802 > > > > Summary: pktgen cause kernel oops with transmit load balanced > > bonding > > Kernel Version: 2.4.32, 2.6.17.2 > > Status: NEW > > Severity: high > > Owner: acme@conectiva.com.br > > Submitter: cltien@gmail.com > > > > > > Most recent kernel where this bug did not occur: > > not found > > Distribution: > > Fedora Core 4 > > > > Hardware Environment: > > i386, ppc > > > > Software Environment: > > gcc > > > > Problem Description: > > If running the kernel packet generator (pktgen), and the output device is set > > to bonding interface with mode balance-tlb or balance-alb, then there will be > > kernel oops. > > > > I only set the odev, dst and count (as 0 for infinite test) for the pktgen. I > > wonder if I made mistake for the pktgen parameters but it doesn't cause problem > > if the odev set to physical device such as eth0, etc. > > > > My investigations shows that the problem happen when the bond_alb_xmit tries to > > access the daddr fields of IP header in skb->nh.iph. If I did the same in round- > > robin mode, it can generate oops too. > > > > Steps to reproduce: > > 1. Build kernel with pktgen (CONFIG_NET_PKTGEN) and bonding driver > > (CONFIG_BONDING). > > 2. Setup bonding interface. > > ifenslave bond0 eth0 > > 3. Create a script for starting packet generator, > > the script I start packet generator for kernel 2.4 series is like following: > > -----------cut here -------- > > #! /bin/sh > > > > modprobe pktgen > > > > PGDEV=/proc/net/pktgen/pg0 > > > > function pgset() { > > local result > > > > echo $1 > $PGDEV > > > > result=`cat $PGDEV | fgrep "Result: OK:"` > > if [ "$result" = "" ]; then > > cat $PGDEV | fgrep Result: > > fi > > } > > > > function pg() { > > echo inject > $PGDEV > > cat $PGDEV > > } > > > > pgset "odev bond0" > > pgset "dst 127.1.16.1" > > pgset "count 0" > > pg > > -----------cut here -------- > > > > 4. If the odev is set to eth0, the this script will not have problem, problem > > only happen when it is set to bond0. > > > > Please send (via an emailed reply-to-all) a copy of the oops output. > > Thanks. > It seems to happen in following line (line 1679 in 2.6.17.4) of bond_alb_xmit(): (skb->nh.iph->daddr == ip_bcast) || This is caused by pktgen, which doesn't initialize skb->nh, witch is used by bonding to check destination address. I made a patch for 2.6.17.4, 2.4.32 can also be fixed in the same way. --- linux-2.6.17.4/net/core/pktgen.c.orig 2006-07-06 16:02:28.000000000 -0 400 +++ linux-2.6.17.4/net/core/pktgen.c 2006-07-10 16:40:47.000000000 -0400 @@ -2149,6 +2149,9 @@ skb->mac.raw = ((u8 *) iph) - 14 - pkt_dev->nr_labels*sizeof(u32); skb->dev = odev; skb->pkt_type = PACKET_HOST; + skb->mac.raw = eth; + skb->nh.iph = iph; + skb->h.uh = udph; if (pkt_dev->nfrags <= 0) pgh = (struct pktgen_hdr *)skb_put(skb, datalen); Please help review the code, thanks! |