Bug 6421
Summary: | kernel 2.6.10-2.6.16 on alpha: arch/alpha/kernel/io.c, iowrite16_rep() BUG_ON((unsigned long)src & 0x1) triggered | ||
---|---|---|---|
Product: | Platform Specific/Hardware | Reporter: | Thomas Steudten (tomri) |
Component: | Alpha | Assignee: | Richard Henderson (rth) |
Status: | REJECTED UNREPRODUCIBLE | ||
Severity: | blocking | CC: | akpm, bunk, protasnb |
Priority: | P2 | ||
Hardware: | i386 | ||
OS: | Linux | ||
Kernel Version: | 2.6.16 | Subsystem: | |
Regression: | --- | Bisected commit-id: | |
Attachments: |
tcpdump output at time for unaligned packet
tcpdump output at time for unaligned packet |
Description
Thomas Steudten
2006-04-21 02:35:44 UTC
Still open in 2.6.17-rc2. I'll send more information, when it's available. May 3 09:50:49 merlin kernel: ei_start_xmit(): skb->data unaligned fffffc0010995a3d align to fffffc000069b7b0 length 60 May 4 19:29:29 merlin kernel: ei_start_xmit(): skb->data unaligned fffffc0002f7a58f align to fffffc0006283db8 length 60 May 9 13:09:39 merlin kernel: ei_start_xmit(): skb->data unaligned fffffc001f7c2647 align to fffffc000069ba30 length 60 May 9 20:29:15 merlin kernel: ei_start_xmit(): skb->data unaligned fffffc001fe9418b align to fffffc00135d3b28 length 66 May 10 19:18:29 merlin kernel: ei_start_xmit(): skb->data unaligned fffffc00048d1c65 align to fffffc001ea4f8f0 length 60 May 10 19:18:30 merlin kernel: ei_start_xmit(): skb->data unaligned fffffc00048d1c65 align to fffffc000069ba30 length 60 May 10 19:18:32 merlin kernel: ei_start_xmit(): skb->data unaligned fffffc0003044465 align to fffffc000069ba30 length 60 May 10 19:18:36 merlin kernel: ei_start_xmit(): skb->data unaligned fffffc001ce6cc65 align to fffffc00107e79d0 length 60 May 10 19:18:40 merlin kernel: ei_start_xmit(): skb->data unaligned fffffc00023fdcf3 align to fffffc000069ba30 length 60 May 10 19:18:41 merlin kernel: ei_start_xmit(): skb->data unaligned fffffc000a7a7cf3 align to fffffc000ff93b58 length 60 May 10 19:18:43 merlin kernel: ei_start_xmit(): skb->data unaligned fffffc00038384f3 align to fffffc000069ba30 length 60 May 10 19:32:31 merlin kernel: ei_start_xmit(): skb->data unaligned fffffc000b90ba0f align to fffffc000069ba30 length 60 May 10 21:29:20 merlin kernel: ei_start_xmit(): skb->data unaligned fffffc001a96c18b align to fffffc0002c37b58 length 66 On Thu, May 11, 2006 at 05:39:18AM -0700, bugme-daemon@bugzilla.kernel.org wrote: > ------- Additional Comments From tomri@gmx.net 2006-05-11 05:38 ------- > Still open in 2.6.17-rc2. > I'll send more information, when it's available. > > May 3 09:50:49 merlin kernel: ei_start_xmit(): skb->data unaligned > fffffc0010995a3d align to fffffc000069b7b0 length 60 Please answer my questions from netdev: <tomri@gmx.net> wrote: > > And the output in the kernel ring buffer is: > dmesg | grep xmit > ei_start_xmit(): skb->data unaligned fffffc0019be55d5 align to > fffffc001ef37620 > length 60 Very weird. The length seems to indicate that this packet went through skb_padto. However, I can't see how skb_padto can produce an unaligned skb if the input was aligned (the input coming out of TCP should definitely be aligned). Can you please put some printks in to find out if 1) skb_padto is being called. 2) Is the input to skb_padto aligned? > Here some stack traces with BUG_ON/ WARN_ON added by me in more places to > trace > down the problem: > Kernel bug at net/ipv4/ip_output.c:297 > cc1(2841): Kernel Bug 1 > pc = [<fffffc000062a92c>] ra = [<fffffc00006407c8>] ps = 0000 Not > tainted > pc is at ip_queue_xmit+0x59c/0x690 > ra is at tcp_transmit_skb+0x588/0xbb0 This is pretty meaningless if you don't show us the check that you added to produce this. Thanks, >Can you please put some printks in to find out if >1) skb_padto is being called. >2) Is the input to skb_padto aligned? I got some system freeze without any messages. So please give me some time, to check this. As I said this is happen randomly, but reproducable. Some time more messages per day, and once one message in 1-4 days. > Here some stack traces with BUG_ON/ WARN_ON added by me in more places to trace > down the problem: > Kernel bug at net/ipv4/ip_output.c:297 > cc1(2841): Kernel Bug 1 > pc = [<fffffc000062a92c>] ra = [<fffffc00006407c8>] ps = 0000 Not tainted > pc is at ip_queue_xmit+0x59c/0x690 > ra is at tcp_transmit_skb+0x588/0xbb0 >This is pretty meaningless if you don't show us the check that you added >to produce this. Well, the simple check is: WARN_ON(skb->data & 0x01); BUG_ON(skb->data & 0x01); So, I think this is much better: #uptime 09:33:50 up 1 day, 18:58 drivers/net/8390.c (patch lines): static int ei_start_xmit(struct sk_buff *skb, struct net_device *dev) { long e8390_base = dev->base_addr; struct ei_device *ei_local = (struct ei_device *) netdev_priv(dev); int send_length = skb->len, output_page; unsigned long flags; char scratch[ 128 ]; if ( (unsigned long) skb->data & 0x1) { printk(KERN_WARNING "ei_start_xmit(): entry: skb->data unaligned %p skb %p length %i\n", skb->data, skb, send_length); WARN_ON( (unsigned long) skb->data & 0x1); } if (skb->len < ETH_ZLEN) { if ( (unsigned long) skb->data & 0x1) printk(KERN_WARNING "ei_start_xmit(): padto0: skb->data unaligned %p skb %p length %i\n", skb->data, skb, send_length); skb = skb_padto(skb, ETH_ZLEN); if ( (unsigned long) skb->data & 0x1) printk(KERN_WARNING "ei_start_xmit(): padto1: skb->data unaligned %p skb %p length %i\n", skb->data, skb, send_length); if (skb == NULL) return 0; send_length = ETH_ZLEN; } [..] if ( (unsigned long) skb->data & 0x1) { printk(KERN_WARNING "ei_start_xmit(): skb->data unaligned %p align to %p length %i\n", skb->data, scratch, send_length); if ( send_length >= 128 ) goto normal; memset( scratch, 0, 128); memcpy( scratch, skb->data, send_length < 128 ? send_length : 128); ei_block_output(dev, send_length, scratch, output_page); } [..] This gives: May 14 21:16:11 merlin kernel: ei_start_xmit(): entry: skb->data unaligned fffffc0005c01b61 skb fffffc0019ccf1c8 length 54 May 14 21:16:11 merlin kernel: Badness in ei_start_xmit at drivers/net/8390.c:283 May 14 21:16:11 merlin kernel: fffffc000069b760 0000000000000001 fffffc00005670f8 fffffc000493d800 May 14 21:16:11 merlin kernel: fffffc000d5b7000 0000000000000000 fffffc0019ccf1c8 0000000000000000 May 14 21:16:11 merlin kernel: 0000000000000014 fffffc0004858800 fffffc00005d64b4 fffffc000cf1d438 May 14 21:16:11 merlin kernel: fffffc00005566c0 fffffc000cf1d3e0 fffffc0004858a18 fffffc0019ccf1c8 May 14 21:16:11 merlin kernel: fffffc000493d800 fffffc000493d800 fffffc0000567250 fffffc000493d800 May 14 21:16:11 merlin kernel: fffffc000d5b7000 0000000000000000 fffffc0019ccf1c8 0000000000000000 May 14 21:16:11 merlin kernel: Trace: May 14 21:16:11 merlin kernel: [<fffffc00005670f8>] qdisc_restart+0x88/0x2b0 May 14 21:16:11 merlin kernel: [<fffffc00005d64b4>] packet_rcv_spkt+0x1a4/0x3b0 May 14 21:16:11 merlin kernel: [<fffffc00005566c0>] dev_queue_xmit_nit+0x1a0/0x1f0 May 14 21:16:11 merlin kernel: [<fffffc0000567250>] qdisc_restart+0x1e0/0x2b0 May 14 21:16:11 merlin kernel: [<fffffc0000558f48>] dev_queue_xmit+0xb8/0x3b0 May 14 21:16:11 merlin kernel: [<fffffc0000560508>] neigh_resolve_output+0x128/0x370 May 14 21:16:11 merlin kernel: [<fffffc00005815b4>] ip_output+0x224/0x430 May 14 21:16:11 merlin kernel: [<fffffc00005800bc>] ip_queue_xmit+0x2dc/0x630 May 14 21:16:11 merlin kernel: [<fffffc0000596068>] tcp_transmit_skb+0x588/0xa30 May 14 21:16:11 merlin kernel: [<fffffc000059bb6c>] tcp_v4_send_check+0x11c/0x150 May 14 21:16:11 merlin kernel: [<fffffc0000595f68>] tcp_transmit_skb+0x488/0xa30 May 14 21:16:11 merlin kernel: [<fffffc0000597ab0>] tcp_retransmit_skb+0x730/0x7b0 May 14 21:16:11 merlin kernel: [<fffffc0000597498>] tcp_retransmit_skb+0x118/0x7b0 May 14 21:16:11 merlin kernel: [<fffffc0000597ffc>] tcp_xmit_retransmit_queue+0x1bc/0x3c0 May 14 21:16:11 merlin kernel: [<fffffc00005910e4>] tcp_ack+0x1134/0x1cf0 May 14 21:16:11 merlin kernel: [<fffffc00005941f8>] tcp_rcv_state_process+0x638/0x11d0 May 14 21:16:11 merlin kernel: [<fffffc000059cdb8>] tcp_v4_do_rcv+0x128/0x490 May 14 21:16:11 merlin kernel: [<fffffc000059de00>] tcp_v4_rcv+0xce0/0xd00 May 14 21:16:11 merlin kernel: [<fffffc000057a5dc>] ip_local_deliver+0x1bc/0x410 May 14 21:16:11 merlin kernel: [<fffffc000057a170>] ip_rcv+0x480/0x730 May 14 21:16:11 merlin kernel: [<fffffc0000556ad4>] netif_receive_skb+0x174/0x300 May 14 21:16:11 merlin kernel: [<fffffc0000310000>] __start+0x0/0x20 May 14 21:16:11 merlin kernel: [<fffffc0000556d2c>] process_backlog+0xcc/0x1b0 May 14 21:16:11 merlin kernel: [<fffffc00005556d4>] net_rx_action+0xb4/0x1a0 May 14 21:16:11 merlin kernel: [<fffffc000032ffa0>] __do_softirq+0x90/0x130 May 14 21:16:11 merlin kernel: [<fffffc00003300a4>] do_softirq+0x64/0x70 May 14 21:16:11 merlin kernel: [<fffffc0000310000>] __start+0x0/0x20 May 14 21:16:11 merlin kernel: [<fffffc0000316130>] handle_irq+0x70/0xd0 May 14 21:16:11 merlin kernel: [<fffffc000031f048>] isa_device_interrupt+0x28/0x40 May 14 21:16:11 merlin kernel: [<fffffc000031ef58>] pyxis_device_interrupt+0x68/0x130 May 14 21:16:11 merlin kernel: [<fffffc0000310000>] __start+0x0/0x20 May 14 21:16:11 merlin kernel: [<fffffc00003164c8>] do_entInt+0x118/0x190 May 14 21:16:11 merlin kernel: [<fffffc00003112b0>] ret_from_sys_call+0x0/0x10 May 14 21:16:11 merlin kernel: [<fffffc00003131b8>] cpu_idle+0x48/0x60 May 14 21:16:11 merlin kernel: [<fffffc0000311610>] alpha_switch_to+0x0/0x20 May 14 21:16:11 merlin kernel: [<fffffc00003131a4>] cpu_idle+0x34/0x60 May 14 21:16:11 merlin kernel: [<fffffc0000310098>] rest_init+0x48/0x60 May 14 21:16:11 merlin kernel: [<fffffc000031001c>] __start+0x1c/0x20 May 14 21:16:11 merlin kernel: ei_start_xmit(): padto0: skb->data unaligned fffffc0005c01b61 skb fffffc0019ccf1c8 length 54 May 14 21:16:11 merlin kernel: ei_start_xmit(): padto1: skb->data unaligned fffffc0005c01b61 skb fffffc0019ccf1c8 length 54 May 14 21:16:11 merlin kernel: ei_start_xmit(): skb->data unaligned fffffc0005c01b61 align to fffffc000069b7b0 length 60 May 14 21:16:11 merlin kernel: ei_start_xmit(): entry: skb->data unaligned fffffc0005c01b61 skb fffffc0019ccf1c8 length 54 May 14 21:16:11 merlin kernel: Badness in ei_start_xmit at drivers/net/8390.c:283 May 14 21:16:11 merlin kernel: fffffc000069b9e0 0000000000000001 fffffc00005670f8 fffffc000493d800 May 14 21:16:11 merlin kernel: fffffc000d5b7000 0000000000000000 fffffc0019ccf1c8 0000000000000000 May 14 21:16:11 merlin kernel: 0000000000000014 fffffc0004858800 fffffc00005d64b4 fffffc000c9b3218 May 14 21:16:11 merlin kernel: fffffc00005566c0 fffffc000c9b31c0 fffffc0004858a18 fffffc0019ccf1c8 May 14 21:16:11 merlin kernel: fffffc000493d800 fffffc000493d800 fffffc0000567250 fffffc000493d800 May 14 21:16:11 merlin kernel: fffffc000d5b7000 0000000000000000 fffffc0019ccf1c8 0000000000000000 May 14 21:16:11 merlin kernel: Trace: May 14 21:16:11 merlin kernel: [<fffffc00005670f8>] qdisc_restart+0x88/0x2b0 May 14 21:16:11 merlin kernel: [<fffffc00005d64b4>] packet_rcv_spkt+0x1a4/0x3b0 May 14 21:16:11 merlin kernel: [<fffffc00005566c0>] dev_queue_xmit_nit+0x1a0/0x1f0 May 14 21:16:11 merlin kernel: [<fffffc0000567250>] qdisc_restart+0x1e0/0x2b0 May 14 21:16:11 merlin kernel: [<fffffc0000558f48>] dev_queue_xmit+0xb8/0x3b0 May 14 21:16:11 merlin kernel: [<fffffc0000560508>] neigh_resolve_output+0x128/0x370 May 14 21:16:11 merlin kernel: [<fffffc00005815b4>] ip_output+0x224/0x430 May 14 21:16:11 merlin kernel: [<fffffc00005800bc>] ip_queue_xmit+0x2dc/0x630 May 14 21:16:11 merlin kernel: [<fffffc0000596068>] tcp_transmit_skb+0x588/0xa30 May 14 21:16:11 merlin kernel: [<fffffc000059bb6c>] tcp_v4_send_check+0x11c/0x150 May 14 21:16:11 merlin kernel: [<fffffc0000595f68>] tcp_transmit_skb+0x488/0xa30 May 14 21:16:11 merlin kernel: [<fffffc0000597498>] tcp_retransmit_skb+0x118/0x7b0 May 14 21:16:11 merlin kernel: [<fffffc000059a5fc>] tcp_write_timer+0x2bc/0x7a0 May 14 21:16:11 merlin kernel: [<fffffc0000335634>] run_timer_softirq+0x34/0x260 May 14 21:16:11 merlin kernel: [<fffffc0000335738>] run_timer_softirq+0x138/0x260 May 14 21:16:11 merlin kernel: [<fffffc000059a340>] tcp_write_timer+0x0/0x7a0 May 14 21:16:11 merlin kernel: [<fffffc000032ffa0>] __do_softirq+0x90/0x130 May 14 21:16:11 merlin kernel: [<fffffc00003300a4>] do_softirq+0x64/0x70 May 14 21:16:11 merlin kernel: [<fffffc0000310000>] __start+0x0/0x20 May 14 21:16:11 merlin kernel: [<fffffc0000316130>] handle_irq+0x70/0xd0 May 14 21:16:11 merlin kernel: [<fffffc0000316524>] do_entInt+0x174/0x190 May 14 21:16:11 merlin kernel: [<fffffc00003112b0>] ret_from_sys_call+0x0/0x10 May 14 21:16:11 merlin kernel: [<fffffc00003131b8>] cpu_idle+0x48/0x60 May 14 21:16:11 merlin kernel: [<fffffc0000311610>] alpha_switch_to+0x0/0x20 May 14 21:16:11 merlin kernel: [<fffffc00003131a4>] cpu_idle+0x34/0x60 May 14 21:16:11 merlin kernel: [<fffffc0000310098>] rest_init+0x48/0x60 May 14 21:16:11 merlin kernel: [<fffffc000031001c>] __start+0x1c/0x20 May 14 21:16:11 merlin kernel: May 14 21:16:11 merlin kernel: ei_start_xmit(): padto0: skb->data unaligned fffffc0005c01b61 skb fffffc0019ccf1c8 length 54 May 14 21:16:11 merlin kernel: ei_start_xmit(): padto1: skb->data unaligned fffffc0005c01b61 skb fffffc0019ccf1c8 length 54 May 14 21:16:11 merlin kernel: ei_start_xmit(): skb->data unaligned fffffc0005c01b61 align to fffffc000069ba30 length 60 You see: 1) skb is aligned, skb->data not when ei_start_xmit() is called. 2) skb_padto() is called, but this causes not the problem here. 3) the two stack traces are different. Feel free to contact me, for more details. Another stack trace. skb_padto() is not called. May 16 23:50:34 merlin kernel: ei_start_xmit(): entry: skb->data unaligned fffffc0009b5098b skb fffffc001e763a08 length 66 May 16 23:50:34 merlin kernel: Badness in ei_start_xmit at drivers/net/8390.c:283 May 16 23:50:34 merlin kernel: fffffc0009007600 0000000000000001 fffffc00005670f8 fffffc00199ed800 May 16 23:50:34 merlin kernel: fffffc000ebd4000 0000000000000000 fffffc001e763a08 0000000000000000 May 16 23:50:34 merlin kernel: 0000000000000020 fffffc001a62dc00 fffffc00005d64b4 fffffc000ad5b018 May 16 23:50:34 merlin kernel: fffffc00005566c0 fffffc000ad5afc0 fffffc001a62de18 fffffc001e763a08 May 16 23:50:34 merlin kernel: fffffc00199ed800 fffffc00199ed800 fffffc0000567250 fffffc00199ed800 May 16 23:50:34 merlin kernel: fffffc000ebd4000 0000000000000000 fffffc001e763a08 0000000000000000 May 16 23:50:34 merlin kernel: Trace: May 16 23:50:34 merlin kernel: [<fffffc00005670f8>] qdisc_restart+0x88/0x2b0 May 16 23:50:34 merlin kernel: [<fffffc00005d64b4>] packet_rcv_spkt+0x1a4/0x3b0 May 16 23:50:34 merlin kernel: [<fffffc00005566c0>] dev_queue_xmit_nit+0x1a0/0x1f0 May 16 23:50:34 merlin kernel: [<fffffc0000567250>] qdisc_restart+0x1e0/0x2b0 May 16 23:50:34 merlin kernel: [<fffffc0000558f48>] dev_queue_xmit+0xb8/0x3b0 May 16 23:50:34 merlin kernel: [<fffffc0000560508>] neigh_resolve_output+0x128/0x370 May 16 23:50:34 merlin kernel: [<fffffc00005815b4>] ip_output+0x224/0x430 May 16 23:50:34 merlin kernel: [<fffffc00005800bc>] ip_queue_xmit+0x2dc/0x630 May 16 23:50:34 merlin kernel: [<fffffc0000596068>] tcp_transmit_skb+0x588/0xa30 May 16 23:50:34 merlin kernel: [<fffffc000059bb6c>] tcp_v4_send_check+0x11c/0x150 May 16 23:50:34 merlin kernel: [<fffffc0000595f68>] tcp_transmit_skb+0x488/0xa30 May 16 23:50:34 merlin kernel: [<fffffc0000597ab0>] tcp_retransmit_skb+0x730/0x7b0 May 16 23:50:34 merlin kernel: [<fffffc0000597498>] tcp_retransmit_skb+0x118/0x7b0 May 16 23:50:34 merlin kernel: [<fffffc000059a5fc>] tcp_write_timer+0x2bc/0x7a0 May 16 23:50:34 merlin kernel: [<fffffc0000335634>] run_timer_softirq+0x34/0x260 May 16 23:50:34 merlin kernel: [<fffffc0000335738>] run_timer_softirq+0x138/0x260 May 16 23:50:34 merlin kernel: [<fffffc000059a340>] tcp_write_timer+0x0/0x7a0 May 16 23:50:34 merlin kernel: [<fffffc000032ffa0>] __do_softirq+0x90/0x130 May 16 23:50:34 merlin kernel: [<fffffc00003300a4>] do_softirq+0x64/0x70 May 16 23:50:34 merlin kernel: [<fffffc0000316130>] handle_irq+0x70/0xd0 May 16 23:50:34 merlin kernel: [<fffffc0000316524>] do_entInt+0x174/0x190 May 16 23:50:34 merlin kernel: [<fffffc0000310eec>] entMM+0x9c/0xc0 May 16 23:50:34 merlin kernel: [<fffffc00003112b0>] ret_from_sys_call+0x0/0x10 May 16 23:50:34 merlin kernel: [<fffffc0000453ed0>] number+0x210/0x330 May 16 23:50:34 merlin kernel: [<fffffc0000454438>] vsnprintf+0x448/0x820 May 16 23:50:34 merlin kernel: [<fffffc0000453f50>] number+0x290/0x330 May 16 23:50:34 merlin kernel: [<fffffc00004549cc>] sprintf+0x5c/0x70 May 16 23:50:34 merlin kernel: [<fffffc000039e228>] __d_lookup+0xa8/0x1a0 May 16 23:50:34 merlin kernel: [<fffffc00003c7dc8>] do_task_stat+0x538/0x6b0 May 16 23:50:34 merlin kernel: [<fffffc0000391490>] do_lookup+0x50/0x280 May 16 23:50:34 merlin kernel: [<fffffc0000313544>] get_wchan+0x74/0xb0 May 16 23:50:34 merlin kernel: [<fffffc000040e8e0>] sys_semtimedop+0x810/0x880 May 16 23:50:34 merlin kernel: [<fffffc00003c36b8>] proc_info_read+0xb8/0x120 May 16 23:50:34 merlin kernel: [<fffffc000040e8e0>] sys_semtimedop+0x810/0x880 May 16 23:50:34 merlin kernel: [<fffffc000037d38c>] vfs_read+0x10c/0x200 May 16 23:50:34 merlin kernel: [<fffffc000037d5c8>] sys_read+0x68/0xb0 May 16 23:50:34 merlin kernel: [<fffffc0000311294>] entSys+0xa4/0xc0 May 16 23:50:34 merlin kernel: May 16 23:50:34 merlin kernel: ei_start_xmit(): skb->data unaligned fffffc0009b5098b align to fffffc0009007650 length 66 On Tue, May 16, 2006 at 11:44:17PM -0700, bugme-daemon@bugzilla.kernel.org wrote: > http://bugzilla.kernel.org/show_bug.cgi?id=6421 > > ------- Additional Comments From tomri@gmx.net 2006-05-16 23:43 ------- > Another stack trace. skb_padto() is not called. > > May 16 23:50:34 merlin kernel: ei_start_xmit(): entry: skb->data unaligned > fffffc0009b5098b skb fffffc001e763a08 length 66 > May 16 23:50:34 merlin kernel: Badness in ei_start_xmit at > drivers/net/8390.c:283 ... > May 16 23:50:34 merlin kernel: Trace: > May 16 23:50:34 merlin kernel: [<fffffc00005670f8>] qdisc_restart+0x88/0x2b0 > May 16 23:50:34 merlin kernel: [<fffffc00005d64b4>] > packet_rcv_spkt+0x1a4/0x3b0 > May 16 23:50:34 merlin kernel: [<fffffc00005566c0>] > dev_queue_xmit_nit+0x1a0/0x1f0 > May 16 23:50:34 merlin kernel: [<fffffc0000567250>] qdisc_restart+0x1e0/0x2b0 > May 16 23:50:34 merlin kernel: [<fffffc0000558f48>] dev_queue_xmit+0xb8/0x3b0 > May 16 23:50:34 merlin kernel: [<fffffc0000560508>] > neigh_resolve_output+0x128/0x370 > May 16 23:50:34 merlin kernel: [<fffffc00005815b4>] ip_output+0x224/0x430 > May 16 23:50:34 merlin kernel: [<fffffc00005800bc>] ip_queue_xmit+0x2dc/0x630 > May 16 23:50:34 merlin kernel: [<fffffc0000596068>] > tcp_transmit_skb+0x588/0xa30 > May 16 23:50:34 merlin kernel: [<fffffc000059bb6c>] > tcp_v4_send_check+0x11c/0x150 > May 16 23:50:34 merlin kernel: [<fffffc0000595f68>] > tcp_transmit_skb+0x488/0xa30 > May 16 23:50:34 merlin kernel: [<fffffc0000597ab0>] > tcp_retransmit_skb+0x730/0x7b0 OK, I think I have a suspect in sight. The unaligned packet is probably caused by tcp_trim_head. To fix it we could just copy the data instead of trimming if it is not paged. However, I'd like you to confirm that this is indeed the cause with a tcpdump. So please take a tcpdump and show us the relevant packets that show up about the time when you get the unaligned warnings. What I'm expecting is a partial odd ACK just before the error. Thanks, > So please take a tcpdump and show us the relevant packets that
> show up about the time when you get the unaligned warnings.
> What I'm expecting is a partial odd ACK just before the error.
Ok, see the attachment (ALL.txt.gz) for the tcpdump output. If you would like
the hexdump or more, please give the tcpdump filter rule to me.
Thanks.
Created attachment 8214 [details]
tcpdump output at time for unaligned packet
Created attachment 8215 [details]
tcpdump output at time for unaligned packet
On Sat, May 27, 2006 at 10:31:19AM -0700, bugme-daemon@bugzilla.kernel.org wrote: > > Ok, see the attachment (ALL.txt.gz) for the tcpdump output. If you would like > the hexdump or more, please give the tcpdump filter rule to me. Thanks a lot. So it wasn't as uncommon as I thought. In fact, partial odd acks like yours happen all the time as part of FIN handling. So this is something that we want to address. BTW, we should also fix arch/alpha/kernel/io.c to not crash on unaligned data. Most other common architectures will grin and bear it. [TCP]: Avoid skb_pull if possible when trimming head Trimming the head of an skb by calling skb_pull can cause the packet to become unaligned if the length pulled is odd. Since the length is entirely arbitrary for a FIN packet carrying data, this is actually quite common. Unaligned data is not the end of the world, but we should avoid it if it's easily done. In this case it is trivial. Since we're discarding all of the head data it doesn't matter whether we move skb->data forward or back. However, it is still possible to have unaligned skb->data in general. So network drivers should be prepared to handle it instead of crashing. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Cheers, In article <20060603095442.GA373@gondor.apana.org.au> (at Sat, 3 Jun 2006 19:54:42 +1000), Herbert Xu <herbert@gondor.apana.org.au> says: > [TCP]: Avoid skb_pull if possible when trimming head > > Trimming the head of an skb by calling skb_pull can cause the packet > to become unaligned if the length pulled is odd. Since the length is > entirely arbitrary for a FIN packet carrying data, this is actually > quite common. I think that people will start thinking why we cannot skb_pull(skb, len) if skb_headlen(skb) == len; some comment needed... --yoshfuji On Sat, Jun 03, 2006 at 07:51:23PM +0900, YOSHIFUJI Hideaki / ?$B5HF#1QL@ wrote: > > I think that people will start thinking why we cannot > skb_pull(skb, len) if skb_headlen(skb) == len; some comment needed... Good idea. Here is a better one. [TCP]: Avoid skb_pull if possible when trimming head Trimming the head of an skb by calling skb_pull can cause the packet to become unaligned if the length pulled is odd. Since the length is entirely arbitrary for a FIN packet carrying data, this is actually quite common. Unaligned data is not the end of the world, but we should avoid it if it's easily done. In this case it is trivial. Since we're discarding all of the head data it doesn't matter whether we move skb->data forward or back. However, it is still possible to have unaligned skb->data in general. So network drivers should be prepared to handle it instead of crashing. This patch also adds an unlikely marking on len < headlen since partial ACKs on head data are extremely rare in the wild. As the return value of __pskb_trim_head is no longer ever NULL that has been removed. Signed-off-by: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au> Thanks, Reply-To: davem@davemloft.net From: Herbert Xu <herbert@gondor.apana.org.au> Date: Sat, 3 Jun 2006 21:07:11 +1000 > On Sat, Jun 03, 2006 at 07:51:23PM +0900, YOSHIFUJI Hideaki / ?$B5HF#1QL@ wrote: > > > > I think that people will start thinking why we cannot > > skb_pull(skb, len) if skb_headlen(skb) == len; some comment needed... > > Good idea. Here is a better one. > > [TCP]: Avoid skb_pull if possible when trimming head Applied, thanks a lot Herbert. arch/alpha/kernel/io.c iowrite16_rep() should handle unaligned input as users such as ne2000 may feed it unaligned input that could come from the network stack. In general, the network stack tries to output aligned data as much as possible, but unaligned data is sometimes unavoidable for performance reasons. It is preferable for iowrite16_rep to do some shuffling for odd-aligned data rather than for its user to copy the data beforehand. Thomas, is this bug now resolved? Hi Andrew I hope this is fixed. I need to switch to another NIC card due to this bug, so I haven't track this bug any more. However linux on alpha seems to be end-of-support in context to the x86 features. Regards Thomas Thomas, Any chance you can test the patch with your old card and confirm that the problem is gone? If this is too much trouble we can close this bug I guess until someone runs into the problem. Thanks! Hi Natalie Oh, the alpha is gone, so there's no way for me, to confirm this. Regards Thomas Oh well... rest in peace poor alpha :( We can close the bug I guess. Thanks. |