Bug 6421

Summary: kernel 2.6.10-2.6.16 on alpha: arch/alpha/kernel/io.c, iowrite16_rep() BUG_ON((unsigned long)src & 0x1) triggered
Product: Platform Specific/Hardware Reporter: Thomas Steudten (tomri)
Component: AlphaAssignee: Richard Henderson (rth)
Status: REJECTED UNREPRODUCIBLE    
Severity: blocking CC: akpm, bunk, protasnb
Priority: P2    
Hardware: i386   
OS: Linux   
Kernel Version: 2.6.16 Subsystem:
Regression: --- Bisected commit-id:
Attachments: tcpdump output at time for unaligned packet
tcpdump output at time for unaligned packet

Description Thomas Steudten 2006-04-21 02:35:44 UTC
Most recent kernel where this bug did not occur: 2.6.10 (i think so)


Distribution: self build (RH9, FC4/5 on alpha mix) 
Hardware Environment: alpha sx164pc board with ne200 ISA card.
Software Environment:
Problem Description:

Since 2.6.10 the BUG_ON() in arch/alpha/kernel/io.c iowrite16_rep() 
is triggered randomly, but reproducable on linux alpha platform with
ne2000/8390 ISA network card.

ne.c:v1.10 9/23/94 Donald Becker (becker@scyld.com)
Last modified Nov 1, 2000 by Paul Gortmaker
NE*000 ethercard probe at 0x320: 00 80 29 63 4a f6
eth%d: NE2000 found at 0x320, using IRQ 5.

I
Comment 1 Thomas Steudten 2006-05-11 05:38:36 UTC
Still open in 2.6.17-rc2.
I'll send more information, when it's available.

May  3 09:50:49 merlin kernel: ei_start_xmit(): skb->data unaligned
fffffc0010995a3d align to fffffc000069b7b0 length 60
May  4 19:29:29 merlin kernel: ei_start_xmit(): skb->data unaligned
fffffc0002f7a58f align to fffffc0006283db8 length 60
May  9 13:09:39 merlin kernel: ei_start_xmit(): skb->data unaligned
fffffc001f7c2647 align to fffffc000069ba30 length 60
May  9 20:29:15 merlin kernel: ei_start_xmit(): skb->data unaligned
fffffc001fe9418b align to fffffc00135d3b28 length 66
May 10 19:18:29 merlin kernel: ei_start_xmit(): skb->data unaligned
fffffc00048d1c65 align to fffffc001ea4f8f0 length 60
May 10 19:18:30 merlin kernel: ei_start_xmit(): skb->data unaligned
fffffc00048d1c65 align to fffffc000069ba30 length 60
May 10 19:18:32 merlin kernel: ei_start_xmit(): skb->data unaligned
fffffc0003044465 align to fffffc000069ba30 length 60
May 10 19:18:36 merlin kernel: ei_start_xmit(): skb->data unaligned
fffffc001ce6cc65 align to fffffc00107e79d0 length 60
May 10 19:18:40 merlin kernel: ei_start_xmit(): skb->data unaligned
fffffc00023fdcf3 align to fffffc000069ba30 length 60
May 10 19:18:41 merlin kernel: ei_start_xmit(): skb->data unaligned
fffffc000a7a7cf3 align to fffffc000ff93b58 length 60
May 10 19:18:43 merlin kernel: ei_start_xmit(): skb->data unaligned
fffffc00038384f3 align to fffffc000069ba30 length 60
May 10 19:32:31 merlin kernel: ei_start_xmit(): skb->data unaligned
fffffc000b90ba0f align to fffffc000069ba30 length 60
May 10 21:29:20 merlin kernel: ei_start_xmit(): skb->data unaligned
fffffc001a96c18b align to fffffc0002c37b58 length 66
Comment 2 Herbert Xu 2006-05-11 06:40:58 UTC
On Thu, May 11, 2006 at 05:39:18AM -0700, bugme-daemon@bugzilla.kernel.org wrote:
> ------- Additional Comments From tomri@gmx.net  2006-05-11 05:38 -------
> Still open in 2.6.17-rc2.
> I'll send more information, when it's available.
> 
> May  3 09:50:49 merlin kernel: ei_start_xmit(): skb->data unaligned
> fffffc0010995a3d align to fffffc000069b7b0 length 60

Please answer my questions from netdev:

<tomri@gmx.net> wrote:
>
> And the output in the kernel ring buffer is:
> dmesg | grep xmit
> ei_start_xmit(): skb->data unaligned fffffc0019be55d5 align to
> fffffc001ef37620
> length 60

Very weird.  The length seems to indicate that this packet went through
skb_padto.  However, I can't see how skb_padto can produce an unaligned
skb if the input was aligned (the input coming out of TCP should definitely
be aligned).

Can you please put some printks in to find out if

1) skb_padto is being called.
2) Is the input to skb_padto aligned?

> Here some stack traces with BUG_ON/ WARN_ON added by me in more places to
> trace
> down the problem:
> Kernel bug at net/ipv4/ip_output.c:297
> cc1(2841): Kernel Bug 1
> pc = [<fffffc000062a92c>]  ra = [<fffffc00006407c8>]  ps = 0000    Not
> tainted
> pc is at ip_queue_xmit+0x59c/0x690
> ra is at tcp_transmit_skb+0x588/0xbb0

This is pretty meaningless if you don't show us the check that you added
to produce this.

Thanks,
Comment 3 Thomas Steudten 2006-05-11 06:58:48 UTC
>Can you please put some printks in to find out if

>1) skb_padto is being called.
>2) Is the input to skb_padto aligned?

I got some system freeze without any messages. So please give me some time, to
check this. As I said this is happen randomly, but reproducable. Some time more
messages per day, and once one message in 1-4 days.

> Here some stack traces with BUG_ON/ WARN_ON added by me in more places to trace
> down the problem:
> Kernel bug at net/ipv4/ip_output.c:297
> cc1(2841): Kernel Bug 1
> pc = [<fffffc000062a92c>]  ra = [<fffffc00006407c8>]  ps = 0000    Not tainted
> pc is at ip_queue_xmit+0x59c/0x690
> ra is at tcp_transmit_skb+0x588/0xbb0

>This is pretty meaningless if you don't show us the check that you added
>to produce this.

Well, the simple check is:
WARN_ON(skb->data & 0x01);
BUG_ON(skb->data & 0x01);
Comment 4 Thomas Steudten 2006-05-15 00:41:49 UTC
So, I think this is much better:
#uptime
09:33:50 up 1 day, 18:58

drivers/net/8390.c (patch lines):
static int ei_start_xmit(struct sk_buff *skb, struct net_device *dev)
{
        long e8390_base = dev->base_addr;
        struct ei_device *ei_local = (struct ei_device *) netdev_priv(dev);
        int send_length = skb->len, output_page;
        unsigned long flags;
        char scratch[ 128 ];

            if ( (unsigned long) skb->data & 0x1)
            {
                    printk(KERN_WARNING "ei_start_xmit(): entry: skb->data
unaligned %p skb %p length %i\n", skb->data, skb, send_length);
                   WARN_ON( (unsigned long) skb->data & 0x1);
            }
        if (skb->len < ETH_ZLEN) {
            if ( (unsigned long) skb->data & 0x1)
                    printk(KERN_WARNING "ei_start_xmit(): padto0: skb->data
unaligned %p skb %p length %i\n", skb->data, skb, send_length);
                skb = skb_padto(skb, ETH_ZLEN);
            if ( (unsigned long) skb->data & 0x1)
                    printk(KERN_WARNING "ei_start_xmit(): padto1: skb->data
unaligned %p skb %p length %i\n", skb->data, skb, send_length);
                if (skb == NULL)
                        return 0;
                send_length = ETH_ZLEN;
        }

[..]
  if ( (unsigned long) skb->data & 0x1)
        {
                printk(KERN_WARNING "ei_start_xmit(): skb->data unaligned %p
align to %p length %i\n", skb->data, scratch, send_length);
                if ( send_length >= 128 ) goto normal;
                memset( scratch, 0, 128);
                memcpy( scratch, skb->data, send_length < 128 ? send_length : 128);
                ei_block_output(dev, send_length, scratch, output_page);
        }
[..]

 This gives:

May 14 21:16:11 merlin kernel: ei_start_xmit(): entry: skb->data unaligned
fffffc0005c01b61 skb fffffc0019ccf1c8 length 54
May 14 21:16:11 merlin kernel: Badness in ei_start_xmit at drivers/net/8390.c:283
May 14 21:16:11 merlin kernel: fffffc000069b760 0000000000000001
fffffc00005670f8 fffffc000493d800
May 14 21:16:11 merlin kernel:        fffffc000d5b7000 0000000000000000
fffffc0019ccf1c8 0000000000000000
May 14 21:16:11 merlin kernel:        0000000000000014 fffffc0004858800
fffffc00005d64b4 fffffc000cf1d438
May 14 21:16:11 merlin kernel:        fffffc00005566c0 fffffc000cf1d3e0
fffffc0004858a18 fffffc0019ccf1c8
May 14 21:16:11 merlin kernel:        fffffc000493d800 fffffc000493d800
fffffc0000567250 fffffc000493d800
May 14 21:16:11 merlin kernel:        fffffc000d5b7000 0000000000000000
fffffc0019ccf1c8 0000000000000000
May 14 21:16:11 merlin kernel: Trace:
May 14 21:16:11 merlin kernel: [<fffffc00005670f8>] qdisc_restart+0x88/0x2b0
May 14 21:16:11 merlin kernel: [<fffffc00005d64b4>] packet_rcv_spkt+0x1a4/0x3b0
May 14 21:16:11 merlin kernel: [<fffffc00005566c0>] dev_queue_xmit_nit+0x1a0/0x1f0
May 14 21:16:11 merlin kernel: [<fffffc0000567250>] qdisc_restart+0x1e0/0x2b0
May 14 21:16:11 merlin kernel: [<fffffc0000558f48>] dev_queue_xmit+0xb8/0x3b0
May 14 21:16:11 merlin kernel: [<fffffc0000560508>] neigh_resolve_output+0x128/0x370
May 14 21:16:11 merlin kernel: [<fffffc00005815b4>] ip_output+0x224/0x430
May 14 21:16:11 merlin kernel: [<fffffc00005800bc>] ip_queue_xmit+0x2dc/0x630
May 14 21:16:11 merlin kernel: [<fffffc0000596068>] tcp_transmit_skb+0x588/0xa30
May 14 21:16:11 merlin kernel: [<fffffc000059bb6c>] tcp_v4_send_check+0x11c/0x150
May 14 21:16:11 merlin kernel: [<fffffc0000595f68>] tcp_transmit_skb+0x488/0xa30
May 14 21:16:11 merlin kernel: [<fffffc0000597ab0>] tcp_retransmit_skb+0x730/0x7b0
May 14 21:16:11 merlin kernel: [<fffffc0000597498>] tcp_retransmit_skb+0x118/0x7b0
May 14 21:16:11 merlin kernel: [<fffffc0000597ffc>]
tcp_xmit_retransmit_queue+0x1bc/0x3c0
May 14 21:16:11 merlin kernel: [<fffffc00005910e4>] tcp_ack+0x1134/0x1cf0
May 14 21:16:11 merlin kernel: [<fffffc00005941f8>]
tcp_rcv_state_process+0x638/0x11d0
May 14 21:16:11 merlin kernel: [<fffffc000059cdb8>] tcp_v4_do_rcv+0x128/0x490
May 14 21:16:11 merlin kernel: [<fffffc000059de00>] tcp_v4_rcv+0xce0/0xd00
May 14 21:16:11 merlin kernel: [<fffffc000057a5dc>] ip_local_deliver+0x1bc/0x410
May 14 21:16:11 merlin kernel: [<fffffc000057a170>] ip_rcv+0x480/0x730
May 14 21:16:11 merlin kernel: [<fffffc0000556ad4>] netif_receive_skb+0x174/0x300
May 14 21:16:11 merlin kernel: [<fffffc0000310000>] __start+0x0/0x20
May 14 21:16:11 merlin kernel: [<fffffc0000556d2c>] process_backlog+0xcc/0x1b0
May 14 21:16:11 merlin kernel: [<fffffc00005556d4>] net_rx_action+0xb4/0x1a0
May 14 21:16:11 merlin kernel: [<fffffc000032ffa0>] __do_softirq+0x90/0x130
May 14 21:16:11 merlin kernel: [<fffffc00003300a4>] do_softirq+0x64/0x70
May 14 21:16:11 merlin kernel: [<fffffc0000310000>] __start+0x0/0x20
May 14 21:16:11 merlin kernel: [<fffffc0000316130>] handle_irq+0x70/0xd0
May 14 21:16:11 merlin kernel: [<fffffc000031f048>] isa_device_interrupt+0x28/0x40
May 14 21:16:11 merlin kernel: [<fffffc000031ef58>]
pyxis_device_interrupt+0x68/0x130
May 14 21:16:11 merlin kernel: [<fffffc0000310000>] __start+0x0/0x20
May 14 21:16:11 merlin kernel: [<fffffc00003164c8>] do_entInt+0x118/0x190
May 14 21:16:11 merlin kernel: [<fffffc00003112b0>] ret_from_sys_call+0x0/0x10
May 14 21:16:11 merlin kernel: [<fffffc00003131b8>] cpu_idle+0x48/0x60
May 14 21:16:11 merlin kernel: [<fffffc0000311610>] alpha_switch_to+0x0/0x20
May 14 21:16:11 merlin kernel: [<fffffc00003131a4>] cpu_idle+0x34/0x60
May 14 21:16:11 merlin kernel: [<fffffc0000310098>] rest_init+0x48/0x60
May 14 21:16:11 merlin kernel: [<fffffc000031001c>] __start+0x1c/0x20

May 14 21:16:11 merlin kernel: ei_start_xmit(): padto0: skb->data unaligned
fffffc0005c01b61 skb fffffc0019ccf1c8 length 54
May 14 21:16:11 merlin kernel: ei_start_xmit(): padto1: skb->data unaligned
fffffc0005c01b61 skb fffffc0019ccf1c8 length 54
May 14 21:16:11 merlin kernel: ei_start_xmit(): skb->data unaligned
fffffc0005c01b61 align to fffffc000069b7b0 length 60
May 14 21:16:11 merlin kernel: ei_start_xmit(): entry: skb->data unaligned
fffffc0005c01b61 skb fffffc0019ccf1c8 length 54
May 14 21:16:11 merlin kernel: Badness in ei_start_xmit at drivers/net/8390.c:283
May 14 21:16:11 merlin kernel: fffffc000069b9e0 0000000000000001
fffffc00005670f8 fffffc000493d800
May 14 21:16:11 merlin kernel:        fffffc000d5b7000 0000000000000000
fffffc0019ccf1c8 0000000000000000
May 14 21:16:11 merlin kernel:        0000000000000014 fffffc0004858800
fffffc00005d64b4 fffffc000c9b3218
May 14 21:16:11 merlin kernel:        fffffc00005566c0 fffffc000c9b31c0
fffffc0004858a18 fffffc0019ccf1c8
May 14 21:16:11 merlin kernel:        fffffc000493d800 fffffc000493d800
fffffc0000567250 fffffc000493d800
May 14 21:16:11 merlin kernel:        fffffc000d5b7000 0000000000000000
fffffc0019ccf1c8 0000000000000000
May 14 21:16:11 merlin kernel: Trace:
May 14 21:16:11 merlin kernel: [<fffffc00005670f8>] qdisc_restart+0x88/0x2b0
May 14 21:16:11 merlin kernel: [<fffffc00005d64b4>] packet_rcv_spkt+0x1a4/0x3b0
May 14 21:16:11 merlin kernel: [<fffffc00005566c0>] dev_queue_xmit_nit+0x1a0/0x1f0
May 14 21:16:11 merlin kernel: [<fffffc0000567250>] qdisc_restart+0x1e0/0x2b0
May 14 21:16:11 merlin kernel: [<fffffc0000558f48>] dev_queue_xmit+0xb8/0x3b0
May 14 21:16:11 merlin kernel: [<fffffc0000560508>] neigh_resolve_output+0x128/0x370
May 14 21:16:11 merlin kernel: [<fffffc00005815b4>] ip_output+0x224/0x430
May 14 21:16:11 merlin kernel: [<fffffc00005800bc>] ip_queue_xmit+0x2dc/0x630
May 14 21:16:11 merlin kernel: [<fffffc0000596068>] tcp_transmit_skb+0x588/0xa30
May 14 21:16:11 merlin kernel: [<fffffc000059bb6c>] tcp_v4_send_check+0x11c/0x150
May 14 21:16:11 merlin kernel: [<fffffc0000595f68>] tcp_transmit_skb+0x488/0xa30
May 14 21:16:11 merlin kernel: [<fffffc0000597498>] tcp_retransmit_skb+0x118/0x7b0
May 14 21:16:11 merlin kernel: [<fffffc000059a5fc>] tcp_write_timer+0x2bc/0x7a0
May 14 21:16:11 merlin kernel: [<fffffc0000335634>] run_timer_softirq+0x34/0x260
May 14 21:16:11 merlin kernel: [<fffffc0000335738>] run_timer_softirq+0x138/0x260
May 14 21:16:11 merlin kernel: [<fffffc000059a340>] tcp_write_timer+0x0/0x7a0
May 14 21:16:11 merlin kernel: [<fffffc000032ffa0>] __do_softirq+0x90/0x130
May 14 21:16:11 merlin kernel: [<fffffc00003300a4>] do_softirq+0x64/0x70
May 14 21:16:11 merlin kernel: [<fffffc0000310000>] __start+0x0/0x20
May 14 21:16:11 merlin kernel: [<fffffc0000316130>] handle_irq+0x70/0xd0
May 14 21:16:11 merlin kernel: [<fffffc0000316524>] do_entInt+0x174/0x190
May 14 21:16:11 merlin kernel: [<fffffc00003112b0>] ret_from_sys_call+0x0/0x10
May 14 21:16:11 merlin kernel: [<fffffc00003131b8>] cpu_idle+0x48/0x60
May 14 21:16:11 merlin kernel: [<fffffc0000311610>] alpha_switch_to+0x0/0x20
May 14 21:16:11 merlin kernel: [<fffffc00003131a4>] cpu_idle+0x34/0x60
May 14 21:16:11 merlin kernel: [<fffffc0000310098>] rest_init+0x48/0x60
May 14 21:16:11 merlin kernel: [<fffffc000031001c>] __start+0x1c/0x20
May 14 21:16:11 merlin kernel:
May 14 21:16:11 merlin kernel: ei_start_xmit(): padto0: skb->data unaligned
fffffc0005c01b61 skb fffffc0019ccf1c8 length 54
May 14 21:16:11 merlin kernel: ei_start_xmit(): padto1: skb->data unaligned
fffffc0005c01b61 skb fffffc0019ccf1c8 length 54
May 14 21:16:11 merlin kernel: ei_start_xmit(): skb->data unaligned
fffffc0005c01b61 align to fffffc000069ba30 length 60


You see:
1) skb is aligned, skb->data not when ei_start_xmit() is called.
2) skb_padto() is called, but this causes not the problem here.
3) the two stack traces are different.

Feel free to contact me, for more details.
Comment 5 Thomas Steudten 2006-05-16 23:43:00 UTC
Another stack trace. skb_padto() is not called.

May 16 23:50:34 merlin kernel: ei_start_xmit(): entry: skb->data unaligned
fffffc0009b5098b skb fffffc001e763a08 length 66
May 16 23:50:34 merlin kernel: Badness in ei_start_xmit at drivers/net/8390.c:283
May 16 23:50:34 merlin kernel: fffffc0009007600 0000000000000001
fffffc00005670f8 fffffc00199ed800
May 16 23:50:34 merlin kernel:        fffffc000ebd4000 0000000000000000
fffffc001e763a08 0000000000000000
May 16 23:50:34 merlin kernel:        0000000000000020 fffffc001a62dc00
fffffc00005d64b4 fffffc000ad5b018
May 16 23:50:34 merlin kernel:        fffffc00005566c0 fffffc000ad5afc0
fffffc001a62de18 fffffc001e763a08
May 16 23:50:34 merlin kernel:        fffffc00199ed800 fffffc00199ed800
fffffc0000567250 fffffc00199ed800
May 16 23:50:34 merlin kernel:        fffffc000ebd4000 0000000000000000
fffffc001e763a08 0000000000000000
May 16 23:50:34 merlin kernel: Trace:
May 16 23:50:34 merlin kernel: [<fffffc00005670f8>] qdisc_restart+0x88/0x2b0
May 16 23:50:34 merlin kernel: [<fffffc00005d64b4>] packet_rcv_spkt+0x1a4/0x3b0
May 16 23:50:34 merlin kernel: [<fffffc00005566c0>] dev_queue_xmit_nit+0x1a0/0x1f0
May 16 23:50:34 merlin kernel: [<fffffc0000567250>] qdisc_restart+0x1e0/0x2b0
May 16 23:50:34 merlin kernel: [<fffffc0000558f48>] dev_queue_xmit+0xb8/0x3b0
May 16 23:50:34 merlin kernel: [<fffffc0000560508>] neigh_resolve_output+0x128/0x370
May 16 23:50:34 merlin kernel: [<fffffc00005815b4>] ip_output+0x224/0x430
May 16 23:50:34 merlin kernel: [<fffffc00005800bc>] ip_queue_xmit+0x2dc/0x630
May 16 23:50:34 merlin kernel: [<fffffc0000596068>] tcp_transmit_skb+0x588/0xa30
May 16 23:50:34 merlin kernel: [<fffffc000059bb6c>] tcp_v4_send_check+0x11c/0x150
May 16 23:50:34 merlin kernel: [<fffffc0000595f68>] tcp_transmit_skb+0x488/0xa30
May 16 23:50:34 merlin kernel: [<fffffc0000597ab0>] tcp_retransmit_skb+0x730/0x7b0
May 16 23:50:34 merlin kernel: [<fffffc0000597498>] tcp_retransmit_skb+0x118/0x7b0
May 16 23:50:34 merlin kernel: [<fffffc000059a5fc>] tcp_write_timer+0x2bc/0x7a0
May 16 23:50:34 merlin kernel: [<fffffc0000335634>] run_timer_softirq+0x34/0x260
May 16 23:50:34 merlin kernel: [<fffffc0000335738>] run_timer_softirq+0x138/0x260
May 16 23:50:34 merlin kernel: [<fffffc000059a340>] tcp_write_timer+0x0/0x7a0
May 16 23:50:34 merlin kernel: [<fffffc000032ffa0>] __do_softirq+0x90/0x130
May 16 23:50:34 merlin kernel: [<fffffc00003300a4>] do_softirq+0x64/0x70
May 16 23:50:34 merlin kernel: [<fffffc0000316130>] handle_irq+0x70/0xd0
May 16 23:50:34 merlin kernel: [<fffffc0000316524>] do_entInt+0x174/0x190
May 16 23:50:34 merlin kernel: [<fffffc0000310eec>] entMM+0x9c/0xc0
May 16 23:50:34 merlin kernel: [<fffffc00003112b0>] ret_from_sys_call+0x0/0x10
May 16 23:50:34 merlin kernel: [<fffffc0000453ed0>] number+0x210/0x330
May 16 23:50:34 merlin kernel: [<fffffc0000454438>] vsnprintf+0x448/0x820
May 16 23:50:34 merlin kernel: [<fffffc0000453f50>] number+0x290/0x330
May 16 23:50:34 merlin kernel: [<fffffc00004549cc>] sprintf+0x5c/0x70
May 16 23:50:34 merlin kernel: [<fffffc000039e228>] __d_lookup+0xa8/0x1a0
May 16 23:50:34 merlin kernel: [<fffffc00003c7dc8>] do_task_stat+0x538/0x6b0
May 16 23:50:34 merlin kernel: [<fffffc0000391490>] do_lookup+0x50/0x280
May 16 23:50:34 merlin kernel: [<fffffc0000313544>] get_wchan+0x74/0xb0
May 16 23:50:34 merlin kernel: [<fffffc000040e8e0>] sys_semtimedop+0x810/0x880
May 16 23:50:34 merlin kernel: [<fffffc00003c36b8>] proc_info_read+0xb8/0x120
May 16 23:50:34 merlin kernel: [<fffffc000040e8e0>] sys_semtimedop+0x810/0x880
May 16 23:50:34 merlin kernel: [<fffffc000037d38c>] vfs_read+0x10c/0x200
May 16 23:50:34 merlin kernel: [<fffffc000037d5c8>] sys_read+0x68/0xb0
May 16 23:50:34 merlin kernel: [<fffffc0000311294>] entSys+0xa4/0xc0
May 16 23:50:34 merlin kernel:
May 16 23:50:34 merlin kernel: ei_start_xmit(): skb->data unaligned
fffffc0009b5098b align to fffffc0009007650 length 66
Comment 6 Herbert Xu 2006-05-17 06:29:58 UTC
On Tue, May 16, 2006 at 11:44:17PM -0700, bugme-daemon@bugzilla.kernel.org wrote:
> http://bugzilla.kernel.org/show_bug.cgi?id=6421
>
> ------- Additional Comments From tomri@gmx.net  2006-05-16 23:43 -------
> Another stack trace. skb_padto() is not called.
> 
> May 16 23:50:34 merlin kernel: ei_start_xmit(): entry: skb->data unaligned
> fffffc0009b5098b skb fffffc001e763a08 length 66
> May 16 23:50:34 merlin kernel: Badness in ei_start_xmit at
> drivers/net/8390.c:283

...

> May 16 23:50:34 merlin kernel: Trace:
> May 16 23:50:34 merlin kernel: [<fffffc00005670f8>] qdisc_restart+0x88/0x2b0
> May 16 23:50:34 merlin kernel: [<fffffc00005d64b4>]
> packet_rcv_spkt+0x1a4/0x3b0
> May 16 23:50:34 merlin kernel: [<fffffc00005566c0>]
> dev_queue_xmit_nit+0x1a0/0x1f0
> May 16 23:50:34 merlin kernel: [<fffffc0000567250>] qdisc_restart+0x1e0/0x2b0
> May 16 23:50:34 merlin kernel: [<fffffc0000558f48>] dev_queue_xmit+0xb8/0x3b0
> May 16 23:50:34 merlin kernel: [<fffffc0000560508>]
> neigh_resolve_output+0x128/0x370
> May 16 23:50:34 merlin kernel: [<fffffc00005815b4>] ip_output+0x224/0x430
> May 16 23:50:34 merlin kernel: [<fffffc00005800bc>] ip_queue_xmit+0x2dc/0x630
> May 16 23:50:34 merlin kernel: [<fffffc0000596068>]
> tcp_transmit_skb+0x588/0xa30
> May 16 23:50:34 merlin kernel: [<fffffc000059bb6c>]
> tcp_v4_send_check+0x11c/0x150
> May 16 23:50:34 merlin kernel: [<fffffc0000595f68>]
> tcp_transmit_skb+0x488/0xa30
> May 16 23:50:34 merlin kernel: [<fffffc0000597ab0>]
> tcp_retransmit_skb+0x730/0x7b0

OK, I think I have a suspect in sight.  The unaligned packet is
probably caused by tcp_trim_head.  To fix it we could just copy
the data instead of trimming if it is not paged.

However, I'd like you to confirm that this is indeed the cause
with a tcpdump.

So please take a tcpdump and show us the relevant packets that
show up about the time when you get the unaligned warnings.
What I'm expecting is a partial odd ACK just before the error.

Thanks,
Comment 7 Thomas Steudten 2006-05-27 10:29:02 UTC
> So please take a tcpdump and show us the relevant packets that
> show up about the time when you get the unaligned warnings.
> What I'm expecting is a partial odd ACK just before the error.

Ok, see the attachment (ALL.txt.gz) for the tcpdump output. If you would like
the hexdump or more, please give the tcpdump filter rule to me.

Thanks.
Comment 8 Thomas Steudten 2006-05-27 10:30:18 UTC
Created attachment 8214 [details]
tcpdump output at time for unaligned packet
Comment 9 Thomas Steudten 2006-05-27 10:32:40 UTC
Created attachment 8215 [details]
tcpdump output at time for unaligned packet
Comment 10 Herbert Xu 2006-06-03 02:52:11 UTC
On Sat, May 27, 2006 at 10:31:19AM -0700, bugme-daemon@bugzilla.kernel.org wrote:
> 
> Ok, see the attachment (ALL.txt.gz) for the tcpdump output. If you would like
> the hexdump or more, please give the tcpdump filter rule to me.

Thanks a lot.  So it wasn't as uncommon as I thought.  In fact, partial
odd acks like yours happen all the time as part of FIN handling.  So this
is something that we want to address.

BTW, we should also fix arch/alpha/kernel/io.c to not crash on
unaligned data.  Most other common architectures will grin and
bear it.

[TCP]: Avoid skb_pull if possible when trimming head

Trimming the head of an skb by calling skb_pull can cause the packet
to become unaligned if the length pulled is odd.  Since the length is
entirely arbitrary for a FIN packet carrying data, this is actually
quite common.

Unaligned data is not the end of the world, but we should avoid it if
it's easily done.  In this case it is trivial.  Since we're discarding
all of the head data it doesn't matter whether we move skb->data forward
or back.

However, it is still possible to have unaligned skb->data in general.
So network drivers should be prepared to handle it instead of crashing.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

Cheers,
Comment 11 Hideaki YOSHIFUJI 2006-06-03 03:48:06 UTC
In article <20060603095442.GA373@gondor.apana.org.au> (at Sat, 3 Jun 2006 19:54:42 +1000), Herbert Xu <herbert@gondor.apana.org.au> says:

> [TCP]: Avoid skb_pull if possible when trimming head
> 
> Trimming the head of an skb by calling skb_pull can cause the packet
> to become unaligned if the length pulled is odd.  Since the length is
> entirely arbitrary for a FIN packet carrying data, this is actually
> quite common.

I think that people will start thinking why we cannot
skb_pull(skb, len) if skb_headlen(skb) == len; some comment needed...

--yoshfuji
Comment 12 Herbert Xu 2006-06-03 04:04:31 UTC
On Sat, Jun 03, 2006 at 07:51:23PM +0900, YOSHIFUJI Hideaki / ?$B5HF#1QL@ wrote:
> 
> I think that people will start thinking why we cannot
> skb_pull(skb, len) if skb_headlen(skb) == len; some comment needed...

Good idea.  Here is a better one.

[TCP]: Avoid skb_pull if possible when trimming head

Trimming the head of an skb by calling skb_pull can cause the packet
to become unaligned if the length pulled is odd.  Since the length is
entirely arbitrary for a FIN packet carrying data, this is actually
quite common.

Unaligned data is not the end of the world, but we should avoid it if
it's easily done.  In this case it is trivial.  Since we're discarding
all of the head data it doesn't matter whether we move skb->data forward
or back.

However, it is still possible to have unaligned skb->data in general.
So network drivers should be prepared to handle it instead of crashing.

This patch also adds an unlikely marking on len < headlen since partial
ACKs on head data are extremely rare in the wild.  As the return value
of __pskb_trim_head is no longer ever NULL that has been removed.

Signed-off-by: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>

Thanks,
Comment 13 Anonymous Emailer 2006-06-04 21:45:48 UTC
Reply-To: davem@davemloft.net

From: Herbert Xu <herbert@gondor.apana.org.au>
Date: Sat, 3 Jun 2006 21:07:11 +1000

> On Sat, Jun 03, 2006 at 07:51:23PM +0900, YOSHIFUJI Hideaki / ?$B5HF#1QL@ wrote:
> > 
> > I think that people will start thinking why we cannot
> > skb_pull(skb, len) if skb_headlen(skb) == len; some comment needed...
> 
> Good idea.  Here is a better one.
> 
> [TCP]: Avoid skb_pull if possible when trimming head

Applied, thanks a lot Herbert.

Comment 14 Herbert Xu 2006-06-09 23:04:58 UTC
arch/alpha/kernel/io.c iowrite16_rep() should handle unaligned input as users
such as ne2000 may feed it unaligned input that could come from the network
stack.  In general, the network stack tries to output aligned data as much as
possible, but unaligned data is sometimes unavoidable for performance reasons.

It is preferable for iowrite16_rep to do some shuffling for odd-aligned data
rather than for its user to copy the data beforehand.
Comment 15 Andrew Morton 2007-01-30 23:56:49 UTC
Thomas, is this bug now resolved?
Comment 16 Thomas Steudten 2007-02-08 02:30:05 UTC
Hi Andrew

I hope this is fixed. I need to switch to another NIC card due to this bug, so I
haven't track this bug any more.
However linux on alpha seems to be end-of-support in context to the x86 features.

Regards
Thomas 

Comment 17 Natalie Protasevich 2007-06-16 09:38:31 UTC
Thomas,
Any chance you can test the patch with your old card and confirm that the problem is gone? If this is too much trouble we can close this bug I guess until someone runs into the problem.
Thanks!
Comment 18 Thomas Steudten 2007-06-16 10:31:06 UTC
Hi Natalie

Oh, the alpha is gone, so there's no way for me, to confirm this.

Regards
Thomas
Comment 19 Natalie Protasevich 2007-06-16 10:36:58 UTC
Oh well... rest in peace poor alpha :(
We can close the bug I guess.
Thanks.