We are hiting kernel panics on Dell R610 servers with e1000e NICs; it apears usualy under a high network trafic ( around 100Mbit/s) but it is not a rule it has happened even on low trafic. Servers are used as reverse http proxy (varnish). On 6 equal servers this panic happens aprox 2 times a day depending on network load. Machine completly freezes till the management watchdog reboots. We had to put serial console on these servers to catch the oops. Is there anything else We can do to debug this? The RIP is always the same: RIP: 0010:[<ffffffff814203cc>] [<ffffffff814203cc>] tcp_xmit_retransmit_queue+0x8c/0x290 rest of the oops always differs a litle ... here is an example: RIP: 0010:[<ffffffff814203cc>] [<ffffffff814203cc>] tcp_xmit_retransmit_queue+0x8c/0x290 RSP: 0018:ffffc90000003a40 EFLAGS: 00010246 RAX: ffff8807e7420678 RBX: ffff8807e74205c0 RCX: 0000000000000000 RDX: 000000004598a105 RSI: 0000000000000000 RDI: ffff8807e74205c0 RBP: ffffc90000003a80 R08: 0000000000000003 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000 R13: ffff8807e74205c0 R14: ffff8807e7420678 R15: 0000000000000000 FS: 0000000000000000(0000) GS:ffffc90000000000(0000) knlGS:0000000000000000 CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b CR2: 0000000000000000 CR3: 0000000001001000 CR4: 00000000000006f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Process swapper (pid: 0, threadinfo ffffffff81608000, task ffffffff81631440) Stack: ffffc90000003a60 0000000000000000 4598a105e74205c0 000000004598a101 <0> 000000000000050e ffff8807e74205c0 0000000000000003 0000000000000000 <0> ffffc90000003b40 ffffffff8141ae4a ffff8807e7420678 0000000000000000 Call Trace: <IRQ> [<ffffffff8141ae4a>] tcp_ack+0x170a/0x1dd0 [<ffffffff8141c362>] tcp_rcv_state_process+0x122/0xab0 [<ffffffff81422c6c>] tcp_v4_do_rcv+0xac/0x220 [<ffffffff813fd02f>] ? nf_iterate+0x5f/0x90 [<ffffffff81424b26>] tcp_v4_rcv+0x586/0x6b0 [<ffffffff813fd0c5>] ? nf_hook_slow+0x65/0xf0 [<ffffffff81406b70>] ? ip_local_deliver_finish+0x0/0x120 [<ffffffff81406bcf>] ip_local_deliver_finish+0x5f/0x120 [<ffffffff8140715b>] ip_local_deliver+0x3b/0x90 [<ffffffff81406971>] ip_rcv_finish+0x141/0x340 [<ffffffff8140701f>] ip_rcv+0x24f/0x350 [<ffffffff813e7ced>] netif_receive_skb+0x20d/0x2f0 [<ffffffff813e7e90>] napi_skb_finish+0x40/0x50 [<ffffffff813e82f4>] napi_gro_receive+0x34/0x40 [<ffffffff8133e0c8>] e1000_receive_skb+0x48/0x60 [<ffffffff81342342>] e1000_clean_rx_irq+0xf2/0x330 [<ffffffff813410a1>] e1000_clean+0x81/0x2a0 [<ffffffff81054ce1>] ? ktime_get+0x11/0x50 [<ffffffff813eaf1c>] net_rx_action+0x9c/0x130 [<ffffffff81046940>] ? get_next_timer_interrupt+0x1d0/0x210 [<ffffffff81041bd7>] __do_softirq+0xb7/0x160 [<ffffffff8100c27c>] call_softirq+0x1c/0x30 [<ffffffff8100e04d>] do_softirq+0x3d/0x80 [<ffffffff81041b0b>] irq_exit+0x7b/0x90 [<ffffffff8100d613>] do_IRQ+0x73/0xe0 [<ffffffff8100bb13>] ret_from_intr+0x0/0xa <EOI> [<ffffffff81296e6c>] ? acpi_idle_enter_bm+0x245/0x271 [<ffffffff81296e62>] ? acpi_idle_enter_bm+0x23b/0x271 [<ffffffff813c7a08>] ? cpuidle_idle_call+0x98/0xf0 [<ffffffff8100a104>] ? cpu_idle+0x94/0xd0 [<ffffffff81468db6>] ? rest_init+0x66/0x70 [<ffffffff816a082f>] ? start_kernel+0x2ef/0x340 [<ffffffff8169fd54>] ? x86_64_start_reservations+0x84/0x90 [<ffffffff8169fe32>] ? x86_64_start_kernel+0xd2/0x100 Code: 00 eb 28 8b 83 d0 03 00 00 41 39 44 24 40 0f 89 00 01 00 00 41 0f b6 cd 41 bd 2f 00 00 00 83 e1 03 0f 84 fc 00 00 00 4d 8b 24 24 <49> 8b 04 24 4d 39 f4 0f 18 08 0f 84 d9 00 00 00 4c 3b a3 b8 01 RIP [<ffffffff814203cc>] tcp_xmit_retransmit_queue+0x8c/0x290 RSP <ffffc90000003a40> CR2: 0000000000000000 ---[ end trace d97d99c9ae1d52cc ]--- Kernel panic - not syncing: Fatal exception in interrupt Pid: 0, comm: swapper Tainted: G D 2.6.31 #2 Call Trace: <IRQ> [<ffffffff8103cab0>] panic+0xa0/0x170 [<ffffffff8100bb13>] ? ret_from_intr+0x0/0xa [<ffffffff8103c74e>] ? print_oops_end_marker+0x1e/0x20 [<ffffffff8100f38e>] oops_end+0x9e/0xb0 [<ffffffff81025b9a>] no_context+0x15a/0x250 [<ffffffff81025e2b>] __bad_area_nosemaphore+0xdb/0x1c0 [<ffffffff813e89e9>] ? dev_hard_start_xmit+0x269/0x2f0 [<ffffffff81025fae>] bad_area_nosemaphore+0xe/0x10 [<ffffffff8102639f>] do_page_fault+0x17f/0x260 [<ffffffff8147eadf>] page_fault+0x1f/0x30 [<ffffffff814203cc>] ? tcp_xmit_retransmit_queue+0x8c/0x290 [<ffffffff8141ae4a>] tcp_ack+0x170a/0x1dd0 [<ffffffff8141c362>] tcp_rcv_state_process+0x122/0xab0 [<ffffffff81422c6c>] tcp_v4_do_rcv+0xac/0x220 [<ffffffff813fd02f>] ? nf_iterate+0x5f/0x90 [<ffffffff81424b26>] tcp_v4_rcv+0x586/0x6b0 [<ffffffff813fd0c5>] ? nf_hook_slow+0x65/0xf0 [<ffffffff81406b70>] ? ip_local_deliver_finish+0x0/0x120 [<ffffffff81406bcf>] ip_local_deliver_finish+0x5f/0x120 [<ffffffff8140715b>] ip_local_deliver+0x3b/0x90 [<ffffffff81406971>] ip_rcv_finish+0x141/0x340 [<ffffffff8140701f>] ip_rcv+0x24f/0x350 [<ffffffff813e7ced>] netif_receive_skb+0x20d/0x2f0 [<ffffffff813e7e90>] napi_skb_finish+0x40/0x50 [<ffffffff813e82f4>] napi_gro_receive+0x34/0x40 [<ffffffff8133e0c8>] e1000_receive_skb+0x48/0x60 [<ffffffff81342342>] e1000_clean_rx_irq+0xf2/0x330 [<ffffffff813410a1>] e1000_clean+0x81/0x2a0 [<ffffffff81054ce1>] ? ktime_get+0x11/0x50 [<ffffffff813eaf1c>] net_rx_action+0x9c/0x130 [<ffffffff81046940>] ? get_next_timer_interrupt+0x1d0/0x210 [<ffffffff81041bd7>] __do_softirq+0xb7/0x160 [<ffffffff8100c27c>] call_softirq+0x1c/0x30 [<ffffffff8100e04d>] do_softirq+0x3d/0x80 [<ffffffff81041b0b>] irq_exit+0x7b/0x90 [<ffffffff8100d613>] do_IRQ+0x73/0xe0 [<ffffffff8100bb13>] ret_from_intr+0x0/0xa <EOI> [<ffffffff81296e6c>] ? acpi_idle_enter_bm+0x245/0x271 [<ffffffff81296e62>] ? acpi_idle_enter_bm+0x23b/0x271 [<ffffffff813c7a08>] ? cpuidle_idle_call+0x98/0xf0 [<ffffffff8100a104>] ? cpu_idle+0x94/0xd0 [<ffffffff81468db6>] ? rest_init+0x66/0x70 [<ffffffff816a082f>] ? start_kernel+0x2ef/0x340 [<ffffffff8169fd54>] ? x86_64_start_reservations+0x84/0x90 [<ffffffff8169fe32>] ? x86_64_start_kernel+0xd2/0x100
On Mon, 26 Oct 2009 08:41:32 -0700 Stephen Hemminger <shemminger@linux-foundation.org> wrote: > > > Begin forwarded message: > > Date: Mon, 26 Oct 2009 12:47:22 GMT > From: bugzilla-daemon@bugzilla.kernel.org > To: shemminger@linux-foundation.org > Subject: [Bug 14470] New: freez in TCP stack > Stephen, please retain the bugzilla and reporter email cc's when forwarding a report to a mailing list. > http://bugzilla.kernel.org/show_bug.cgi?id=14470 > > Summary: freez in TCP stack > Product: Networking > Version: 2.5 > Kernel Version: 2.6.31 > Platform: All > OS/Version: Linux > Tree: Mainline > Status: NEW > Severity: high > Priority: P1 > Component: IPV4 > AssignedTo: shemminger@linux-foundation.org > ReportedBy: kolo@albatani.cz > Regression: No > > > We are hiting kernel panics on Dell R610 servers with e1000e NICs; it apears > usualy under a high network trafic ( around 100Mbit/s) but it is not a rule > it > has happened even on low trafic. > > Servers are used as reverse http proxy (varnish). > > On 6 equal servers this panic happens aprox 2 times a day depending on > network > load. Machine completly freezes till the management watchdog reboots. > Twice a day on six separate machines. That ain't no hardware glitch. Vaclav, are you able to say whether this is a regression? Did those machines run 2.6.30 (for example)? Thanks. > We had to put serial console on these servers to catch the oops. Is there > anything else We can do to debug this? > The RIP is always the same: > > RIP: 0010:[<ffffffff814203cc>] [<ffffffff814203cc>] > tcp_xmit_retransmit_queue+0x8c/0x290 > > rest of the oops always differs a litle ... here is an example: > > RIP: 0010:[<ffffffff814203cc>] [<ffffffff814203cc>] > tcp_xmit_retransmit_queue+0x8c/0x290 > RSP: 0018:ffffc90000003a40 EFLAGS: 00010246 > RAX: ffff8807e7420678 RBX: ffff8807e74205c0 RCX: 0000000000000000 > RDX: 000000004598a105 RSI: 0000000000000000 RDI: ffff8807e74205c0 > RBP: ffffc90000003a80 R08: 0000000000000003 R09: 0000000000000000 > R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000 > R13: ffff8807e74205c0 R14: ffff8807e7420678 R15: 0000000000000000 > FS: 0000000000000000(0000) GS:ffffc90000000000(0000) knlGS:0000000000000000 > CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b > CR2: 0000000000000000 CR3: 0000000001001000 CR4: 00000000000006f0 > DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 > DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 > Process swapper (pid: 0, threadinfo ffffffff81608000, task ffffffff81631440) > Stack: > ffffc90000003a60 0000000000000000 4598a105e74205c0 000000004598a101 > <0> 000000000000050e ffff8807e74205c0 0000000000000003 0000000000000000 > <0> ffffc90000003b40 ffffffff8141ae4a ffff8807e7420678 0000000000000000 > Call Trace: > <IRQ> > [<ffffffff8141ae4a>] tcp_ack+0x170a/0x1dd0 > [<ffffffff8141c362>] tcp_rcv_state_process+0x122/0xab0 > [<ffffffff81422c6c>] tcp_v4_do_rcv+0xac/0x220 > [<ffffffff813fd02f>] ? nf_iterate+0x5f/0x90 > [<ffffffff81424b26>] tcp_v4_rcv+0x586/0x6b0 > [<ffffffff813fd0c5>] ? nf_hook_slow+0x65/0xf0 > [<ffffffff81406b70>] ? ip_local_deliver_finish+0x0/0x120 > [<ffffffff81406bcf>] ip_local_deliver_finish+0x5f/0x120 > [<ffffffff8140715b>] ip_local_deliver+0x3b/0x90 > [<ffffffff81406971>] ip_rcv_finish+0x141/0x340 > [<ffffffff8140701f>] ip_rcv+0x24f/0x350 > [<ffffffff813e7ced>] netif_receive_skb+0x20d/0x2f0 > [<ffffffff813e7e90>] napi_skb_finish+0x40/0x50 > [<ffffffff813e82f4>] napi_gro_receive+0x34/0x40 > [<ffffffff8133e0c8>] e1000_receive_skb+0x48/0x60 > [<ffffffff81342342>] e1000_clean_rx_irq+0xf2/0x330 > [<ffffffff813410a1>] e1000_clean+0x81/0x2a0 > [<ffffffff81054ce1>] ? ktime_get+0x11/0x50 > [<ffffffff813eaf1c>] net_rx_action+0x9c/0x130 > [<ffffffff81046940>] ? get_next_timer_interrupt+0x1d0/0x210 > [<ffffffff81041bd7>] __do_softirq+0xb7/0x160 > [<ffffffff8100c27c>] call_softirq+0x1c/0x30 > [<ffffffff8100e04d>] do_softirq+0x3d/0x80 > [<ffffffff81041b0b>] irq_exit+0x7b/0x90 > [<ffffffff8100d613>] do_IRQ+0x73/0xe0 > [<ffffffff8100bb13>] ret_from_intr+0x0/0xa > <EOI> > [<ffffffff81296e6c>] ? acpi_idle_enter_bm+0x245/0x271 > [<ffffffff81296e62>] ? acpi_idle_enter_bm+0x23b/0x271 > [<ffffffff813c7a08>] ? cpuidle_idle_call+0x98/0xf0 > [<ffffffff8100a104>] ? cpu_idle+0x94/0xd0 > [<ffffffff81468db6>] ? rest_init+0x66/0x70 > [<ffffffff816a082f>] ? start_kernel+0x2ef/0x340 > [<ffffffff8169fd54>] ? x86_64_start_reservations+0x84/0x90 > [<ffffffff8169fe32>] ? x86_64_start_kernel+0xd2/0x100 > Code: 00 eb 28 8b 83 d0 03 00 00 41 39 44 24 40 0f 89 00 01 00 00 41 0f b6 cd > 41 bd 2f 00 00 00 83 e1 03 0f 84 fc 00 00 00 4d 8b 24 24 <49> 8b 04 24 4d 39 > f4 > 0f 18 08 0f 84 d9 00 00 00 4c 3b a3 b8 01 > RIP [<ffffffff814203cc>] tcp_xmit_retransmit_queue+0x8c/0x290 > RSP <ffffc90000003a40> > CR2: 0000000000000000 > ---[ end trace d97d99c9ae1d52cc ]--- > Kernel panic - not syncing: Fatal exception in interrupt > Pid: 0, comm: swapper Tainted: G D 2.6.31 #2 > Call Trace: > <IRQ> [<ffffffff8103cab0>] panic+0xa0/0x170 > [<ffffffff8100bb13>] ? ret_from_intr+0x0/0xa > [<ffffffff8103c74e>] ? print_oops_end_marker+0x1e/0x20 > [<ffffffff8100f38e>] oops_end+0x9e/0xb0 > [<ffffffff81025b9a>] no_context+0x15a/0x250 > [<ffffffff81025e2b>] __bad_area_nosemaphore+0xdb/0x1c0 > [<ffffffff813e89e9>] ? dev_hard_start_xmit+0x269/0x2f0 > [<ffffffff81025fae>] bad_area_nosemaphore+0xe/0x10 > [<ffffffff8102639f>] do_page_fault+0x17f/0x260 > [<ffffffff8147eadf>] page_fault+0x1f/0x30 > [<ffffffff814203cc>] ? tcp_xmit_retransmit_queue+0x8c/0x290 > [<ffffffff8141ae4a>] tcp_ack+0x170a/0x1dd0 > [<ffffffff8141c362>] tcp_rcv_state_process+0x122/0xab0 > [<ffffffff81422c6c>] tcp_v4_do_rcv+0xac/0x220 > [<ffffffff813fd02f>] ? nf_iterate+0x5f/0x90 > [<ffffffff81424b26>] tcp_v4_rcv+0x586/0x6b0 > [<ffffffff813fd0c5>] ? nf_hook_slow+0x65/0xf0 > [<ffffffff81406b70>] ? ip_local_deliver_finish+0x0/0x120 > [<ffffffff81406bcf>] ip_local_deliver_finish+0x5f/0x120 > [<ffffffff8140715b>] ip_local_deliver+0x3b/0x90 > [<ffffffff81406971>] ip_rcv_finish+0x141/0x340 > [<ffffffff8140701f>] ip_rcv+0x24f/0x350 > [<ffffffff813e7ced>] netif_receive_skb+0x20d/0x2f0 > [<ffffffff813e7e90>] napi_skb_finish+0x40/0x50 > [<ffffffff813e82f4>] napi_gro_receive+0x34/0x40 > [<ffffffff8133e0c8>] e1000_receive_skb+0x48/0x60 > [<ffffffff81342342>] e1000_clean_rx_irq+0xf2/0x330 > [<ffffffff813410a1>] e1000_clean+0x81/0x2a0 > [<ffffffff81054ce1>] ? ktime_get+0x11/0x50 > [<ffffffff813eaf1c>] net_rx_action+0x9c/0x130 > [<ffffffff81046940>] ? get_next_timer_interrupt+0x1d0/0x210 > [<ffffffff81041bd7>] __do_softirq+0xb7/0x160 > [<ffffffff8100c27c>] call_softirq+0x1c/0x30 > [<ffffffff8100e04d>] do_softirq+0x3d/0x80 > [<ffffffff81041b0b>] irq_exit+0x7b/0x90 > [<ffffffff8100d613>] do_IRQ+0x73/0xe0 > [<ffffffff8100bb13>] ret_from_intr+0x0/0xa > <EOI> [<ffffffff81296e6c>] ? acpi_idle_enter_bm+0x245/0x271 > [<ffffffff81296e62>] ? acpi_idle_enter_bm+0x23b/0x271 > [<ffffffff813c7a08>] ? cpuidle_idle_call+0x98/0xf0 > [<ffffffff8100a104>] ? cpu_idle+0x94/0xd0 > [<ffffffff81468db6>] ? rest_init+0x66/0x70 > [<ffffffff816a082f>] ? start_kernel+0x2ef/0x340 > [<ffffffff8169fd54>] ? x86_64_start_reservations+0x84/0x90 > [<ffffffff8169fe32>] ? x86_64_start_kernel+0xd2/0x100 >
Andrew Morton a écrit : > On Mon, 26 Oct 2009 08:41:32 -0700 > Stephen Hemminger <shemminger@linux-foundation.org> wrote: > >> >> Begin forwarded message: >> >> Date: Mon, 26 Oct 2009 12:47:22 GMT >> From: bugzilla-daemon@bugzilla.kernel.org >> To: shemminger@linux-foundation.org >> Subject: [Bug 14470] New: freez in TCP stack >> > > Stephen, please retain the bugzilla and reporter email cc's when > forwarding a report to a mailing list. > > >> http://bugzilla.kernel.org/show_bug.cgi?id=14470 >> >> Summary: freez in TCP stack >> Product: Networking >> Version: 2.5 >> Kernel Version: 2.6.31 >> Platform: All >> OS/Version: Linux >> Tree: Mainline >> Status: NEW >> Severity: high >> Priority: P1 >> Component: IPV4 >> AssignedTo: shemminger@linux-foundation.org >> ReportedBy: kolo@albatani.cz >> Regression: No >> >> >> We are hiting kernel panics on Dell R610 servers with e1000e NICs; it apears >> usualy under a high network trafic ( around 100Mbit/s) but it is not a rule >> it >> has happened even on low trafic. >> >> Servers are used as reverse http proxy (varnish). >> >> On 6 equal servers this panic happens aprox 2 times a day depending on >> network >> load. Machine completly freezes till the management watchdog reboots. >> > > Twice a day on six separate machines. That ain't no hardware glitch. > > Vaclav, are you able to say whether this is a regression? Did those > machines run 2.6.30 (for example)? > > Thanks. > >> We had to put serial console on these servers to catch the oops. Is there >> anything else We can do to debug this? >> The RIP is always the same: >> >> RIP: 0010:[<ffffffff814203cc>] [<ffffffff814203cc>] >> tcp_xmit_retransmit_queue+0x8c/0x290 >> >> rest of the oops always differs a litle ... here is an example: >> >> RIP: 0010:[<ffffffff814203cc>] [<ffffffff814203cc>] >> tcp_xmit_retransmit_queue+0x8c/0x290 >> RSP: 0018:ffffc90000003a40 EFLAGS: 00010246 >> RAX: ffff8807e7420678 RBX: ffff8807e74205c0 RCX: 0000000000000000 >> RDX: 000000004598a105 RSI: 0000000000000000 RDI: ffff8807e74205c0 >> RBP: ffffc90000003a80 R08: 0000000000000003 R09: 0000000000000000 >> R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000 >> R13: ffff8807e74205c0 R14: ffff8807e7420678 R15: 0000000000000000 >> FS: 0000000000000000(0000) GS:ffffc90000000000(0000) knlGS:0000000000000000 >> CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b >> CR2: 0000000000000000 CR3: 0000000001001000 CR4: 00000000000006f0 >> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 >> DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 >> Process swapper (pid: 0, threadinfo ffffffff81608000, task ffffffff81631440) >> Stack: >> ffffc90000003a60 0000000000000000 4598a105e74205c0 000000004598a101 >> <0> 000000000000050e ffff8807e74205c0 0000000000000003 0000000000000000 >> <0> ffffc90000003b40 ffffffff8141ae4a ffff8807e7420678 0000000000000000 >> Call Trace: >> <IRQ> >> [<ffffffff8141ae4a>] tcp_ack+0x170a/0x1dd0 >> [<ffffffff8141c362>] tcp_rcv_state_process+0x122/0xab0 >> [<ffffffff81422c6c>] tcp_v4_do_rcv+0xac/0x220 >> [<ffffffff813fd02f>] ? nf_iterate+0x5f/0x90 >> [<ffffffff81424b26>] tcp_v4_rcv+0x586/0x6b0 >> [<ffffffff813fd0c5>] ? nf_hook_slow+0x65/0xf0 >> [<ffffffff81406b70>] ? ip_local_deliver_finish+0x0/0x120 >> [<ffffffff81406bcf>] ip_local_deliver_finish+0x5f/0x120 >> [<ffffffff8140715b>] ip_local_deliver+0x3b/0x90 >> [<ffffffff81406971>] ip_rcv_finish+0x141/0x340 >> [<ffffffff8140701f>] ip_rcv+0x24f/0x350 >> [<ffffffff813e7ced>] netif_receive_skb+0x20d/0x2f0 >> [<ffffffff813e7e90>] napi_skb_finish+0x40/0x50 >> [<ffffffff813e82f4>] napi_gro_receive+0x34/0x40 >> [<ffffffff8133e0c8>] e1000_receive_skb+0x48/0x60 >> [<ffffffff81342342>] e1000_clean_rx_irq+0xf2/0x330 >> [<ffffffff813410a1>] e1000_clean+0x81/0x2a0 >> [<ffffffff81054ce1>] ? ktime_get+0x11/0x50 >> [<ffffffff813eaf1c>] net_rx_action+0x9c/0x130 >> [<ffffffff81046940>] ? get_next_timer_interrupt+0x1d0/0x210 >> [<ffffffff81041bd7>] __do_softirq+0xb7/0x160 >> [<ffffffff8100c27c>] call_softirq+0x1c/0x30 >> [<ffffffff8100e04d>] do_softirq+0x3d/0x80 >> [<ffffffff81041b0b>] irq_exit+0x7b/0x90 >> [<ffffffff8100d613>] do_IRQ+0x73/0xe0 >> [<ffffffff8100bb13>] ret_from_intr+0x0/0xa >> <EOI> >> [<ffffffff81296e6c>] ? acpi_idle_enter_bm+0x245/0x271 >> [<ffffffff81296e62>] ? acpi_idle_enter_bm+0x23b/0x271 >> [<ffffffff813c7a08>] ? cpuidle_idle_call+0x98/0xf0 >> [<ffffffff8100a104>] ? cpu_idle+0x94/0xd0 >> [<ffffffff81468db6>] ? rest_init+0x66/0x70 >> [<ffffffff816a082f>] ? start_kernel+0x2ef/0x340 >> [<ffffffff8169fd54>] ? x86_64_start_reservations+0x84/0x90 >> [<ffffffff8169fe32>] ? x86_64_start_kernel+0xd2/0x100 >> Code: 00 eb 28 8b 83 d0 03 00 00 41 39 44 24 40 0f 89 00 01 00 00 41 0f b6 >> cd >> 41 bd 2f 00 00 00 83 e1 03 0f 84 fc 00 00 00 4d 8b 24 24 <49> 8b 04 24 4d 39 >> f4 >> 0f 18 08 0f 84 d9 00 00 00 4c 3b a3 b8 01 >> RIP [<ffffffff814203cc>] tcp_xmit_retransmit_queue+0x8c/0x290 >> RSP <ffffc90000003a40> >> CR2: 0000000000000000 >> ---[ end trace d97d99c9ae1d52cc ]--- >> Kernel panic - not syncing: Fatal exception in interrupt >> Pid: 0, comm: swapper Tainted: G D 2.6.31 #2 >> Call Trace: >> <IRQ> [<ffffffff8103cab0>] panic+0xa0/0x170 >> [<ffffffff8100bb13>] ? ret_from_intr+0x0/0xa >> [<ffffffff8103c74e>] ? print_oops_end_marker+0x1e/0x20 >> [<ffffffff8100f38e>] oops_end+0x9e/0xb0 >> [<ffffffff81025b9a>] no_context+0x15a/0x250 >> [<ffffffff81025e2b>] __bad_area_nosemaphore+0xdb/0x1c0 >> [<ffffffff813e89e9>] ? dev_hard_start_xmit+0x269/0x2f0 >> [<ffffffff81025fae>] bad_area_nosemaphore+0xe/0x10 >> [<ffffffff8102639f>] do_page_fault+0x17f/0x260 >> [<ffffffff8147eadf>] page_fault+0x1f/0x30 >> [<ffffffff814203cc>] ? tcp_xmit_retransmit_queue+0x8c/0x290 >> [<ffffffff8141ae4a>] tcp_ack+0x170a/0x1dd0 >> [<ffffffff8141c362>] tcp_rcv_state_process+0x122/0xab0 >> [<ffffffff81422c6c>] tcp_v4_do_rcv+0xac/0x220 >> [<ffffffff813fd02f>] ? nf_iterate+0x5f/0x90 >> [<ffffffff81424b26>] tcp_v4_rcv+0x586/0x6b0 >> [<ffffffff813fd0c5>] ? nf_hook_slow+0x65/0xf0 >> [<ffffffff81406b70>] ? ip_local_deliver_finish+0x0/0x120 >> [<ffffffff81406bcf>] ip_local_deliver_finish+0x5f/0x120 >> [<ffffffff8140715b>] ip_local_deliver+0x3b/0x90 >> [<ffffffff81406971>] ip_rcv_finish+0x141/0x340 >> [<ffffffff8140701f>] ip_rcv+0x24f/0x350 >> [<ffffffff813e7ced>] netif_receive_skb+0x20d/0x2f0 >> [<ffffffff813e7e90>] napi_skb_finish+0x40/0x50 >> [<ffffffff813e82f4>] napi_gro_receive+0x34/0x40 >> [<ffffffff8133e0c8>] e1000_receive_skb+0x48/0x60 >> [<ffffffff81342342>] e1000_clean_rx_irq+0xf2/0x330 >> [<ffffffff813410a1>] e1000_clean+0x81/0x2a0 >> [<ffffffff81054ce1>] ? ktime_get+0x11/0x50 >> [<ffffffff813eaf1c>] net_rx_action+0x9c/0x130 >> [<ffffffff81046940>] ? get_next_timer_interrupt+0x1d0/0x210 >> [<ffffffff81041bd7>] __do_softirq+0xb7/0x160 >> [<ffffffff8100c27c>] call_softirq+0x1c/0x30 >> [<ffffffff8100e04d>] do_softirq+0x3d/0x80 >> [<ffffffff81041b0b>] irq_exit+0x7b/0x90 >> [<ffffffff8100d613>] do_IRQ+0x73/0xe0 >> [<ffffffff8100bb13>] ret_from_intr+0x0/0xa >> <EOI> [<ffffffff81296e6c>] ? acpi_idle_enter_bm+0x245/0x271 >> [<ffffffff81296e62>] ? acpi_idle_enter_bm+0x23b/0x271 >> [<ffffffff813c7a08>] ? cpuidle_idle_call+0x98/0xf0 >> [<ffffffff8100a104>] ? cpu_idle+0x94/0xd0 >> [<ffffffff81468db6>] ? rest_init+0x66/0x70 >> [<ffffffff816a082f>] ? start_kernel+0x2ef/0x340 >> [<ffffffff8169fd54>] ? x86_64_start_reservations+0x84/0x90 >> [<ffffffff8169fe32>] ? x86_64_start_kernel+0xd2/0x100 >> Code: 00 eb 28 8b 83 d0 03 00 00 41 39 44 24 40 cmp %eax,0x40(%r12) 0f 89 00 01 00 00 jns ... 41 0f b6 cd movzbl %r13b,%ecx 41 bd 2f 00 00 00 mov $0x2f000000,%r13d 83 e1 03 and $0x3,%ecx 0f 84 fc 00 00 00 je ... 4d 8b 24 24 mov (%r12),%r12 skb = skb->next <>49 8b 04 24 mov (%r12),%rax << NULL POINTER dereference >> 4d 39 f4 cmp %r14,%r12 0f 18 08 prefetcht0 (%rax) 0f 84 d9 00 00 00 je ... 4c 3b a3 b8 01 cmp crash is in void tcp_xmit_retransmit_queue(struct sock *sk) { << HERE >> tcp_for_write_queue_from(skb, sk) { } Some skb in sk_write_queue has a NULL ->next pointer Strange thing is R14 and RAX =ffff8807e7420678 (&sk->sk_write_queue) R14 is the stable value during the loop, while RAW is scratch register. I dont have full disassembly for this function, but I guess we just entered the loop (or RAX should be really different at this point) So, maybe list head itself is corrupted (sk->sk_write_queue->next = NULL) or, retransmit_skb_hint problem ? (we forget to set it to NULL in some cases ?)
Eric Dumazet a écrit : > Andrew Morton a écrit : >> On Mon, 26 Oct 2009 08:41:32 -0700 >> Stephen Hemminger <shemminger@linux-foundation.org> wrote: >> >>> Begin forwarded message: >>> >>> Date: Mon, 26 Oct 2009 12:47:22 GMT >>> From: bugzilla-daemon@bugzilla.kernel.org >>> To: shemminger@linux-foundation.org >>> Subject: [Bug 14470] New: freez in TCP stack >>> >> Stephen, please retain the bugzilla and reporter email cc's when >> forwarding a report to a mailing list. >> >> >>> http://bugzilla.kernel.org/show_bug.cgi?id=14470 >>> >>> Summary: freez in TCP stack >>> Product: Networking >>> Version: 2.5 >>> Kernel Version: 2.6.31 >>> Platform: All >>> OS/Version: Linux >>> Tree: Mainline >>> Status: NEW >>> Severity: high >>> Priority: P1 >>> Component: IPV4 >>> AssignedTo: shemminger@linux-foundation.org >>> ReportedBy: kolo@albatani.cz >>> Regression: No >>> >>> >>> We are hiting kernel panics on Dell R610 servers with e1000e NICs; it >>> apears >>> usualy under a high network trafic ( around 100Mbit/s) but it is not a rule >>> it >>> has happened even on low trafic. >>> >>> Servers are used as reverse http proxy (varnish). >>> >>> On 6 equal servers this panic happens aprox 2 times a day depending on >>> network >>> load. Machine completly freezes till the management watchdog reboots. >>> >> Twice a day on six separate machines. That ain't no hardware glitch. >> >> Vaclav, are you able to say whether this is a regression? Did those >> machines run 2.6.30 (for example)? >> >> Thanks. >> >>> We had to put serial console on these servers to catch the oops. Is there >>> anything else We can do to debug this? >>> The RIP is always the same: >>> >>> RIP: 0010:[<ffffffff814203cc>] [<ffffffff814203cc>] >>> tcp_xmit_retransmit_queue+0x8c/0x290 >>> >>> rest of the oops always differs a litle ... here is an example: >>> >>> RIP: 0010:[<ffffffff814203cc>] [<ffffffff814203cc>] >>> tcp_xmit_retransmit_queue+0x8c/0x290 >>> RSP: 0018:ffffc90000003a40 EFLAGS: 00010246 >>> RAX: ffff8807e7420678 RBX: ffff8807e74205c0 RCX: 0000000000000000 >>> RDX: 000000004598a105 RSI: 0000000000000000 RDI: ffff8807e74205c0 >>> RBP: ffffc90000003a80 R08: 0000000000000003 R09: 0000000000000000 >>> R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000 >>> R13: ffff8807e74205c0 R14: ffff8807e7420678 R15: 0000000000000000 >>> FS: 0000000000000000(0000) GS:ffffc90000000000(0000) >>> knlGS:0000000000000000 >>> CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b >>> CR2: 0000000000000000 CR3: 0000000001001000 CR4: 00000000000006f0 >>> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 >>> DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 >>> Process swapper (pid: 0, threadinfo ffffffff81608000, task >>> ffffffff81631440) >>> Stack: >>> ffffc90000003a60 0000000000000000 4598a105e74205c0 000000004598a101 >>> <0> 000000000000050e ffff8807e74205c0 0000000000000003 0000000000000000 >>> <0> ffffc90000003b40 ffffffff8141ae4a ffff8807e7420678 0000000000000000 >>> Call Trace: >>> <IRQ> >>> [<ffffffff8141ae4a>] tcp_ack+0x170a/0x1dd0 >>> [<ffffffff8141c362>] tcp_rcv_state_process+0x122/0xab0 >>> [<ffffffff81422c6c>] tcp_v4_do_rcv+0xac/0x220 >>> [<ffffffff813fd02f>] ? nf_iterate+0x5f/0x90 >>> [<ffffffff81424b26>] tcp_v4_rcv+0x586/0x6b0 >>> [<ffffffff813fd0c5>] ? nf_hook_slow+0x65/0xf0 >>> [<ffffffff81406b70>] ? ip_local_deliver_finish+0x0/0x120 >>> [<ffffffff81406bcf>] ip_local_deliver_finish+0x5f/0x120 >>> [<ffffffff8140715b>] ip_local_deliver+0x3b/0x90 >>> [<ffffffff81406971>] ip_rcv_finish+0x141/0x340 >>> [<ffffffff8140701f>] ip_rcv+0x24f/0x350 >>> [<ffffffff813e7ced>] netif_receive_skb+0x20d/0x2f0 >>> [<ffffffff813e7e90>] napi_skb_finish+0x40/0x50 >>> [<ffffffff813e82f4>] napi_gro_receive+0x34/0x40 >>> [<ffffffff8133e0c8>] e1000_receive_skb+0x48/0x60 >>> [<ffffffff81342342>] e1000_clean_rx_irq+0xf2/0x330 >>> [<ffffffff813410a1>] e1000_clean+0x81/0x2a0 >>> [<ffffffff81054ce1>] ? ktime_get+0x11/0x50 >>> [<ffffffff813eaf1c>] net_rx_action+0x9c/0x130 >>> [<ffffffff81046940>] ? get_next_timer_interrupt+0x1d0/0x210 >>> [<ffffffff81041bd7>] __do_softirq+0xb7/0x160 >>> [<ffffffff8100c27c>] call_softirq+0x1c/0x30 >>> [<ffffffff8100e04d>] do_softirq+0x3d/0x80 >>> [<ffffffff81041b0b>] irq_exit+0x7b/0x90 >>> [<ffffffff8100d613>] do_IRQ+0x73/0xe0 >>> [<ffffffff8100bb13>] ret_from_intr+0x0/0xa >>> <EOI> >>> [<ffffffff81296e6c>] ? acpi_idle_enter_bm+0x245/0x271 >>> [<ffffffff81296e62>] ? acpi_idle_enter_bm+0x23b/0x271 >>> [<ffffffff813c7a08>] ? cpuidle_idle_call+0x98/0xf0 >>> [<ffffffff8100a104>] ? cpu_idle+0x94/0xd0 >>> [<ffffffff81468db6>] ? rest_init+0x66/0x70 >>> [<ffffffff816a082f>] ? start_kernel+0x2ef/0x340 >>> [<ffffffff8169fd54>] ? x86_64_start_reservations+0x84/0x90 >>> [<ffffffff8169fe32>] ? x86_64_start_kernel+0xd2/0x100 >>> Code: 00 eb 28 8b 83 d0 03 00 00 41 39 44 24 40 0f 89 00 01 00 00 41 0f b6 >>> cd >>> 41 bd 2f 00 00 00 83 e1 03 0f 84 fc 00 00 00 4d 8b 24 24 <49> 8b 04 24 4d >>> 39 f4 >>> 0f 18 08 0f 84 d9 00 00 00 4c 3b a3 b8 01 >>> RIP [<ffffffff814203cc>] tcp_xmit_retransmit_queue+0x8c/0x290 >>> RSP <ffffc90000003a40> >>> CR2: 0000000000000000 >>> ---[ end trace d97d99c9ae1d52cc ]--- >>> Kernel panic - not syncing: Fatal exception in interrupt >>> Pid: 0, comm: swapper Tainted: G D 2.6.31 #2 >>> Call Trace: >>> <IRQ> [<ffffffff8103cab0>] panic+0xa0/0x170 >>> [<ffffffff8100bb13>] ? ret_from_intr+0x0/0xa >>> [<ffffffff8103c74e>] ? print_oops_end_marker+0x1e/0x20 >>> [<ffffffff8100f38e>] oops_end+0x9e/0xb0 >>> [<ffffffff81025b9a>] no_context+0x15a/0x250 >>> [<ffffffff81025e2b>] __bad_area_nosemaphore+0xdb/0x1c0 >>> [<ffffffff813e89e9>] ? dev_hard_start_xmit+0x269/0x2f0 >>> [<ffffffff81025fae>] bad_area_nosemaphore+0xe/0x10 >>> [<ffffffff8102639f>] do_page_fault+0x17f/0x260 >>> [<ffffffff8147eadf>] page_fault+0x1f/0x30 >>> [<ffffffff814203cc>] ? tcp_xmit_retransmit_queue+0x8c/0x290 >>> [<ffffffff8141ae4a>] tcp_ack+0x170a/0x1dd0 >>> [<ffffffff8141c362>] tcp_rcv_state_process+0x122/0xab0 >>> [<ffffffff81422c6c>] tcp_v4_do_rcv+0xac/0x220 >>> [<ffffffff813fd02f>] ? nf_iterate+0x5f/0x90 >>> [<ffffffff81424b26>] tcp_v4_rcv+0x586/0x6b0 >>> [<ffffffff813fd0c5>] ? nf_hook_slow+0x65/0xf0 >>> [<ffffffff81406b70>] ? ip_local_deliver_finish+0x0/0x120 >>> [<ffffffff81406bcf>] ip_local_deliver_finish+0x5f/0x120 >>> [<ffffffff8140715b>] ip_local_deliver+0x3b/0x90 >>> [<ffffffff81406971>] ip_rcv_finish+0x141/0x340 >>> [<ffffffff8140701f>] ip_rcv+0x24f/0x350 >>> [<ffffffff813e7ced>] netif_receive_skb+0x20d/0x2f0 >>> [<ffffffff813e7e90>] napi_skb_finish+0x40/0x50 >>> [<ffffffff813e82f4>] napi_gro_receive+0x34/0x40 >>> [<ffffffff8133e0c8>] e1000_receive_skb+0x48/0x60 >>> [<ffffffff81342342>] e1000_clean_rx_irq+0xf2/0x330 >>> [<ffffffff813410a1>] e1000_clean+0x81/0x2a0 >>> [<ffffffff81054ce1>] ? ktime_get+0x11/0x50 >>> [<ffffffff813eaf1c>] net_rx_action+0x9c/0x130 >>> [<ffffffff81046940>] ? get_next_timer_interrupt+0x1d0/0x210 >>> [<ffffffff81041bd7>] __do_softirq+0xb7/0x160 >>> [<ffffffff8100c27c>] call_softirq+0x1c/0x30 >>> [<ffffffff8100e04d>] do_softirq+0x3d/0x80 >>> [<ffffffff81041b0b>] irq_exit+0x7b/0x90 >>> [<ffffffff8100d613>] do_IRQ+0x73/0xe0 >>> [<ffffffff8100bb13>] ret_from_intr+0x0/0xa >>> <EOI> [<ffffffff81296e6c>] ? acpi_idle_enter_bm+0x245/0x271 >>> [<ffffffff81296e62>] ? acpi_idle_enter_bm+0x23b/0x271 >>> [<ffffffff813c7a08>] ? cpuidle_idle_call+0x98/0xf0 >>> [<ffffffff8100a104>] ? cpu_idle+0x94/0xd0 >>> [<ffffffff81468db6>] ? rest_init+0x66/0x70 >>> [<ffffffff816a082f>] ? start_kernel+0x2ef/0x340 >>> [<ffffffff8169fd54>] ? x86_64_start_reservations+0x84/0x90 >>> [<ffffffff8169fe32>] ? x86_64_start_kernel+0xd2/0x100 >>> > > > Code: 00 eb 28 8b 83 d0 03 00 00 > 41 39 44 24 40 cmp %eax,0x40(%r12) > 0f 89 00 01 00 00 jns ... > 41 0f b6 cd movzbl %r13b,%ecx > 41 bd 2f 00 00 00 mov $0x2f000000,%r13d > 83 e1 03 and $0x3,%ecx > 0f 84 fc 00 00 00 je ... > 4d 8b 24 24 mov (%r12),%r12 skb = skb->next > <>49 8b 04 24 mov (%r12),%rax << NULL POINTER dereference >> > 4d 39 f4 cmp %r14,%r12 > 0f 18 08 prefetcht0 (%rax) > 0f 84 d9 00 00 00 je ... > 4c 3b a3 b8 01 cmp > > > crash is in > void tcp_xmit_retransmit_queue(struct sock *sk) > { > > << HERE >> tcp_for_write_queue_from(skb, sk) { > > } > > > Some skb in sk_write_queue has a NULL ->next pointer > > Strange thing is R14 and RAX =ffff8807e7420678 (&sk->sk_write_queue) > R14 is the stable value during the loop, while RAW is scratch register. > > I dont have full disassembly for this function, but I guess we just entered > the loop > (or RAX should be really different at this point) > > So, maybe list head itself is corrupted (sk->sk_write_queue->next = NULL) > > or, retransmit_skb_hint problem ? (we forget to set it to NULL in some cases > ?) > David, what do you think of following patch ? I wonder if we should reorganize code to add sanity checks in tcp_unlink_write_queue() that the skb we delete from queue is not still referenced. [PATCH] tcp: clear retrans hints in tcp_send_synack() There is a small possibility the skb we unlink from write queue is still referenced by retrans hints. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> --- diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c index fcd278a..b22a72d 100644 --- a/net/ipv4/tcp_output.c +++ b/net/ipv4/tcp_output.c @@ -2201,6 +2201,7 @@ int tcp_send_synack(struct sock *sk) struct sk_buff *nskb = skb_copy(skb, GFP_ATOMIC); if (nskb == NULL) return -ENOMEM; + tcp_clear_all_retrans_hints(tcp_sk(sk)); tcp_unlink_write_queue(skb, sk); skb_header_release(nskb); __tcp_add_write_queue_head(sk, nskb);
From: Eric Dumazet <eric.dumazet@gmail.com> Date: Thu, 29 Oct 2009 06:59:41 +0100 > David, what do you think of following patch ? > > I wonder if we should reorganize code to add sanity checks in > tcp_unlink_write_queue() > that the skb we delete from queue is not still referenced. > > [PATCH] tcp: clear retrans hints in tcp_send_synack() > > There is a small possibility the skb we unlink from write queue > is still referenced by retrans hints. > > Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Yes, the first thing I thought of when I saw this crash was the hints. I'll think this over.
Reply-To: v.bilek@1art.cz bugzilla-daemon@bugzilla.kernel.org napsal(a): > http://bugzilla.kernel.org/show_bug.cgi?id=14470 > > > > > > --- Comment #1 from Andrew Morton <akpm@linux-foundation.org> 2009-10-28 > 22:13:50 --- > On Mon, 26 Oct 2009 08:41:32 -0700 > Stephen Hemminger <shemminger@linux-foundation.org> wrote: > >> >> Begin forwarded message: >> >> Date: Mon, 26 Oct 2009 12:47:22 GMT >> From: bugzilla-daemon@bugzilla.kernel.org >> To: shemminger@linux-foundation.org >> Subject: [Bug 14470] New: freez in TCP stack >> > > Stephen, please retain the bugzilla and reporter email cc's when > forwarding a report to a mailing list. > > >> http://bugzilla.kernel.org/show_bug.cgi?id=14470 >> >> Summary: freez in TCP stack >> Product: Networking >> Version: 2.5 >> Kernel Version: 2.6.31 >> Platform: All >> OS/Version: Linux >> Tree: Mainline >> Status: NEW >> Severity: high >> Priority: P1 >> Component: IPV4 >> AssignedTo: shemminger@linux-foundation.org >> ReportedBy: kolo@albatani.cz >> Regression: No >> >> >> We are hiting kernel panics on Dell R610 servers with e1000e NICs; it apears >> usualy under a high network trafic ( around 100Mbit/s) but it is not a rule >> it >> has happened even on low trafic. >> >> Servers are used as reverse http proxy (varnish). >> >> On 6 equal servers this panic happens aprox 2 times a day depending on >> network >> load. Machine completly freezes till the management watchdog reboots. >> > > Twice a day on six separate machines. That ain't no hardware glitch. > > Vaclav, are you able to say whether this is a regression? Did those > machines run 2.6.30 (for example)? > > Thanks. > Cant say if it was the same bug we hit running on 2.6.30 but symptoms were the same ( high net load ; total freez).
I got it from 2.6.29 on several HP DL180 2.6.28-6 works well
From: Eric Dumazet <eric.dumazet@gmail.com> Date: Thu, 29 Oct 2009 06:59:41 +0100 > [PATCH] tcp: clear retrans hints in tcp_send_synack() > > There is a small possibility the skb we unlink from write queue > is still referenced by retrans hints. > > Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> So, this would only be true if we were dealing with a data packet here. We're not, this is a SYN+ACK which happens to be cloned in the write queue. The hint SKBs pointers can only point to real data packets. And we're only dealing with data packets once we enter established state, and when we enter established by definition we have unlinked and freed up any SYN and SYN+ACK SKBs in the write queue.
On Thu, 29 Oct 2009, Eric Dumazet wrote: > Andrew Morton a écrit : > > On Mon, 26 Oct 2009 08:41:32 -0700 > > Stephen Hemminger <shemminger@linux-foundation.org> wrote: > > > >> > >> Begin forwarded message: > >> > >> Date: Mon, 26 Oct 2009 12:47:22 GMT > >> From: bugzilla-daemon@bugzilla.kernel.org > >> To: shemminger@linux-foundation.org > >> Subject: [Bug 14470] New: freez in TCP stack > >> > > > > Stephen, please retain the bugzilla and reporter email cc's when > > forwarding a report to a mailing list. > > > > > >> http://bugzilla.kernel.org/show_bug.cgi?id=14470 > >> > >> Summary: freez in TCP stack > >> Product: Networking > >> Version: 2.5 > >> Kernel Version: 2.6.31 > >> Platform: All > >> OS/Version: Linux > >> Tree: Mainline > >> Status: NEW > >> Severity: high > >> Priority: P1 > >> Component: IPV4 > >> AssignedTo: shemminger@linux-foundation.org > >> ReportedBy: kolo@albatani.cz > >> Regression: No > >> > >> > >> We are hiting kernel panics on Dell R610 servers with e1000e NICs; it > apears > >> usualy under a high network trafic ( around 100Mbit/s) but it is not a > rule it > >> has happened even on low trafic. > >> > >> Servers are used as reverse http proxy (varnish). > >> > >> On 6 equal servers this panic happens aprox 2 times a day depending on > network > >> load. Machine completly freezes till the management watchdog reboots. > >> > > > > Twice a day on six separate machines. That ain't no hardware glitch. > > > > Vaclav, are you able to say whether this is a regression? Did those > > machines run 2.6.30 (for example)? > > > > Thanks. > > > >> We had to put serial console on these servers to catch the oops. Is there > >> anything else We can do to debug this? > >> The RIP is always the same: > >> > >> RIP: 0010:[<ffffffff814203cc>] [<ffffffff814203cc>] > >> tcp_xmit_retransmit_queue+0x8c/0x290 > >> > >> rest of the oops always differs a litle ... here is an example: > >> > >> RIP: 0010:[<ffffffff814203cc>] [<ffffffff814203cc>] > >> tcp_xmit_retransmit_queue+0x8c/0x290 > >> RSP: 0018:ffffc90000003a40 EFLAGS: 00010246 > >> RAX: ffff8807e7420678 RBX: ffff8807e74205c0 RCX: 0000000000000000 > >> RDX: 000000004598a105 RSI: 0000000000000000 RDI: ffff8807e74205c0 > >> RBP: ffffc90000003a80 R08: 0000000000000003 R09: 0000000000000000 > >> R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000 > >> R13: ffff8807e74205c0 R14: ffff8807e7420678 R15: 0000000000000000 > >> FS: 0000000000000000(0000) GS:ffffc90000000000(0000) > knlGS:0000000000000000 > >> CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b > >> CR2: 0000000000000000 CR3: 0000000001001000 CR4: 00000000000006f0 > >> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 > >> DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 > >> Process swapper (pid: 0, threadinfo ffffffff81608000, task > ffffffff81631440) > >> Stack: > >> ffffc90000003a60 0000000000000000 4598a105e74205c0 000000004598a101 > >> <0> 000000000000050e ffff8807e74205c0 0000000000000003 0000000000000000 > >> <0> ffffc90000003b40 ffffffff8141ae4a ffff8807e7420678 0000000000000000 > >> Call Trace: > >> <IRQ> > >> [<ffffffff8141ae4a>] tcp_ack+0x170a/0x1dd0 > >> [<ffffffff8141c362>] tcp_rcv_state_process+0x122/0xab0 > >> [<ffffffff81422c6c>] tcp_v4_do_rcv+0xac/0x220 > >> [<ffffffff813fd02f>] ? nf_iterate+0x5f/0x90 > >> [<ffffffff81424b26>] tcp_v4_rcv+0x586/0x6b0 > >> [<ffffffff813fd0c5>] ? nf_hook_slow+0x65/0xf0 > >> [<ffffffff81406b70>] ? ip_local_deliver_finish+0x0/0x120 > >> [<ffffffff81406bcf>] ip_local_deliver_finish+0x5f/0x120 > >> [<ffffffff8140715b>] ip_local_deliver+0x3b/0x90 > >> [<ffffffff81406971>] ip_rcv_finish+0x141/0x340 > >> [<ffffffff8140701f>] ip_rcv+0x24f/0x350 > >> [<ffffffff813e7ced>] netif_receive_skb+0x20d/0x2f0 > >> [<ffffffff813e7e90>] napi_skb_finish+0x40/0x50 > >> [<ffffffff813e82f4>] napi_gro_receive+0x34/0x40 > >> [<ffffffff8133e0c8>] e1000_receive_skb+0x48/0x60 > >> [<ffffffff81342342>] e1000_clean_rx_irq+0xf2/0x330 > >> [<ffffffff813410a1>] e1000_clean+0x81/0x2a0 > >> [<ffffffff81054ce1>] ? ktime_get+0x11/0x50 > >> [<ffffffff813eaf1c>] net_rx_action+0x9c/0x130 > >> [<ffffffff81046940>] ? get_next_timer_interrupt+0x1d0/0x210 > >> [<ffffffff81041bd7>] __do_softirq+0xb7/0x160 > >> [<ffffffff8100c27c>] call_softirq+0x1c/0x30 > >> [<ffffffff8100e04d>] do_softirq+0x3d/0x80 > >> [<ffffffff81041b0b>] irq_exit+0x7b/0x90 > >> [<ffffffff8100d613>] do_IRQ+0x73/0xe0 > >> [<ffffffff8100bb13>] ret_from_intr+0x0/0xa > >> <EOI> > >> [<ffffffff81296e6c>] ? acpi_idle_enter_bm+0x245/0x271 > >> [<ffffffff81296e62>] ? acpi_idle_enter_bm+0x23b/0x271 > >> [<ffffffff813c7a08>] ? cpuidle_idle_call+0x98/0xf0 > >> [<ffffffff8100a104>] ? cpu_idle+0x94/0xd0 > >> [<ffffffff81468db6>] ? rest_init+0x66/0x70 > >> [<ffffffff816a082f>] ? start_kernel+0x2ef/0x340 > >> [<ffffffff8169fd54>] ? x86_64_start_reservations+0x84/0x90 > >> [<ffffffff8169fe32>] ? x86_64_start_kernel+0xd2/0x100 > >> Code: 00 eb 28 8b 83 d0 03 00 00 41 39 44 24 40 0f 89 00 01 00 00 41 0f b6 > cd > >> 41 bd 2f 00 00 00 83 e1 03 0f 84 fc 00 00 00 4d 8b 24 24 <49> 8b 04 24 4d > 39 f4 > >> 0f 18 08 0f 84 d9 00 00 00 4c 3b a3 b8 01 > >> RIP [<ffffffff814203cc>] tcp_xmit_retransmit_queue+0x8c/0x290 > >> RSP <ffffc90000003a40> > >> CR2: 0000000000000000 > >> ---[ end trace d97d99c9ae1d52cc ]--- > >> Kernel panic - not syncing: Fatal exception in interrupt > >> Pid: 0, comm: swapper Tainted: G D 2.6.31 #2 > >> Call Trace: > >> <IRQ> [<ffffffff8103cab0>] panic+0xa0/0x170 > >> [<ffffffff8100bb13>] ? ret_from_intr+0x0/0xa > >> [<ffffffff8103c74e>] ? print_oops_end_marker+0x1e/0x20 > >> [<ffffffff8100f38e>] oops_end+0x9e/0xb0 > >> [<ffffffff81025b9a>] no_context+0x15a/0x250 > >> [<ffffffff81025e2b>] __bad_area_nosemaphore+0xdb/0x1c0 > >> [<ffffffff813e89e9>] ? dev_hard_start_xmit+0x269/0x2f0 > >> [<ffffffff81025fae>] bad_area_nosemaphore+0xe/0x10 > >> [<ffffffff8102639f>] do_page_fault+0x17f/0x260 > >> [<ffffffff8147eadf>] page_fault+0x1f/0x30 > >> [<ffffffff814203cc>] ? tcp_xmit_retransmit_queue+0x8c/0x290 > >> [<ffffffff8141ae4a>] tcp_ack+0x170a/0x1dd0 > >> [<ffffffff8141c362>] tcp_rcv_state_process+0x122/0xab0 > >> [<ffffffff81422c6c>] tcp_v4_do_rcv+0xac/0x220 > >> [<ffffffff813fd02f>] ? nf_iterate+0x5f/0x90 > >> [<ffffffff81424b26>] tcp_v4_rcv+0x586/0x6b0 > >> [<ffffffff813fd0c5>] ? nf_hook_slow+0x65/0xf0 > >> [<ffffffff81406b70>] ? ip_local_deliver_finish+0x0/0x120 > >> [<ffffffff81406bcf>] ip_local_deliver_finish+0x5f/0x120 > >> [<ffffffff8140715b>] ip_local_deliver+0x3b/0x90 > >> [<ffffffff81406971>] ip_rcv_finish+0x141/0x340 > >> [<ffffffff8140701f>] ip_rcv+0x24f/0x350 > >> [<ffffffff813e7ced>] netif_receive_skb+0x20d/0x2f0 > >> [<ffffffff813e7e90>] napi_skb_finish+0x40/0x50 > >> [<ffffffff813e82f4>] napi_gro_receive+0x34/0x40 > >> [<ffffffff8133e0c8>] e1000_receive_skb+0x48/0x60 > >> [<ffffffff81342342>] e1000_clean_rx_irq+0xf2/0x330 > >> [<ffffffff813410a1>] e1000_clean+0x81/0x2a0 > >> [<ffffffff81054ce1>] ? ktime_get+0x11/0x50 > >> [<ffffffff813eaf1c>] net_rx_action+0x9c/0x130 > >> [<ffffffff81046940>] ? get_next_timer_interrupt+0x1d0/0x210 > >> [<ffffffff81041bd7>] __do_softirq+0xb7/0x160 > >> [<ffffffff8100c27c>] call_softirq+0x1c/0x30 > >> [<ffffffff8100e04d>] do_softirq+0x3d/0x80 > >> [<ffffffff81041b0b>] irq_exit+0x7b/0x90 > >> [<ffffffff8100d613>] do_IRQ+0x73/0xe0 > >> [<ffffffff8100bb13>] ret_from_intr+0x0/0xa > >> <EOI> [<ffffffff81296e6c>] ? acpi_idle_enter_bm+0x245/0x271 > >> [<ffffffff81296e62>] ? acpi_idle_enter_bm+0x23b/0x271 > >> [<ffffffff813c7a08>] ? cpuidle_idle_call+0x98/0xf0 > >> [<ffffffff8100a104>] ? cpu_idle+0x94/0xd0 > >> [<ffffffff81468db6>] ? rest_init+0x66/0x70 > >> [<ffffffff816a082f>] ? start_kernel+0x2ef/0x340 > >> [<ffffffff8169fd54>] ? x86_64_start_reservations+0x84/0x90 > >> [<ffffffff8169fe32>] ? x86_64_start_kernel+0xd2/0x100 > >> > > > Code: 00 eb 28 8b 83 d0 03 00 00 > 41 39 44 24 40 cmp %eax,0x40(%r12) > 0f 89 00 01 00 00 jns ... > 41 0f b6 cd movzbl %r13b,%ecx > 41 bd 2f 00 00 00 mov $0x2f000000,%r13d > 83 e1 03 and $0x3,%ecx > 0f 84 fc 00 00 00 je ... > 4d 8b 24 24 mov (%r12),%r12 skb = skb->next > <>49 8b 04 24 mov (%r12),%rax << NULL POINTER dereference >> > 4d 39 f4 cmp %r14,%r12 > 0f 18 08 prefetcht0 (%rax) > 0f 84 d9 00 00 00 je ... > 4c 3b a3 b8 01 cmp > > > crash is in > void tcp_xmit_retransmit_queue(struct sock *sk) > { > > << HERE >> tcp_for_write_queue_from(skb, sk) { > > } > > > Some skb in sk_write_queue has a NULL ->next pointer > > Strange thing is R14 and RAX =ffff8807e7420678 (&sk->sk_write_queue) > R14 is the stable value during the loop, while RAW is scratch register. > > I dont have full disassembly for this function, but I guess we just > entered the loop (or RAX should be really different at this point) > > So, maybe list head itself is corrupted (sk->sk_write_queue->next = NULL) One more alternative along those lines could perhaps be: We enter with empty write_queue there and with the hint being null, so we take the else branch... and skb_peek then gives us the NULL ptr. However, I cannot see how this could happen as all branches trap with return before the reach tcp_xmit_retransmit_queue. > or, retransmit_skb_hint problem ? (we forget to set it to NULL in some > cases ?) ...I don't understand how a stale reference would yield to a consistent NULL ptr crash there rather than hard to track corruption for most of the times and random crashes then here and there. Or perhaps we were just very lucky to immediately get only those reports which point out to the right track :-). ...I tried to find what is wrong with it but sadly came up only ah-this-is-it-oh-wait-it's-ok type of things.
> ...I don't understand how a stale reference would yield to a consistent > NULL ptr crash there rather than hard to track corruption for most of the > times and random crashes then here and there. Or perhaps we were just very > lucky to immediately get only those reports which point out to the right > track :-). > When a skb is freed, and re-allocated, we clear most of its fields in __alloc_skb() memset(skb, 0, offsetof(struct sk_buff, tail)); Then if this skb is freed again, not queued anywhere, its skb->next stays NULL So if we have a stale reference to a freed skb, we can : - Get a NULL pointer, or a poisonned value (if SLUB_DEBUG) Here is a debug patch to check we dont have stale pointers, maybe this will help ?sync [PATCH] tcp: check stale pointers in tcp_unlink_write_queue() In order to track some obscure bug, we check in tcp_unlink_write_queue() if we dont have stale references to unlinked skb Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> --- include/net/tcp.h | 4 ++++ net/ipv4/tcp.c | 2 +- net/ipv4/tcp_input.c | 4 ++-- net/ipv4/tcp_output.c | 8 ++++---- 4 files changed, 11 insertions(+), 7 deletions(-) diff --git a/include/net/tcp.h b/include/net/tcp.h index 740d09b..09da342 100644 --- a/include/net/tcp.h +++ b/include/net/tcp.h @@ -1357,6 +1357,10 @@ static inline void tcp_insert_write_queue_before(struct sk_buff *new, static inline void tcp_unlink_write_queue(struct sk_buff *skb, struct sock *sk) { + WARN_ON(skb == tcp_sk(sk)->retransmit_skb_hint); + WARN_ON(skb == tcp_sk(sk)->lost_skb_hint); + WARN_ON(skb == tcp_sk(sk)->scoreboard_skb_hint); + WARN_ON(skb == sk->sk_send_head); __skb_unlink(skb, &sk->sk_write_queue); } diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c index e0cfa63..328bdb1 100644 --- a/net/ipv4/tcp.c +++ b/net/ipv4/tcp.c @@ -1102,11 +1102,11 @@ out: do_fault: if (!skb->len) { - tcp_unlink_write_queue(skb, sk); /* It is the one place in all of TCP, except connection * reset, where we can be unlinking the send_head. */ tcp_check_send_head(sk, skb); + tcp_unlink_write_queue(skb, sk); sk_wmem_free_skb(sk, skb); } diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c index ba0eab6..fccc6e9 100644 --- a/net/ipv4/tcp_input.c +++ b/net/ipv4/tcp_input.c @@ -3251,13 +3251,13 @@ static int tcp_clean_rtx_queue(struct sock *sk, int prior_fackets, if (!fully_acked) break; - tcp_unlink_write_queue(skb, sk); - sk_wmem_free_skb(sk, skb); tp->scoreboard_skb_hint = NULL; if (skb == tp->retransmit_skb_hint) tp->retransmit_skb_hint = NULL; if (skb == tp->lost_skb_hint) tp->lost_skb_hint = NULL; + tcp_unlink_write_queue(skb, sk); + sk_wmem_free_skb(sk, skb); } if (likely(between(tp->snd_up, prior_snd_una, tp->snd_una))) diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c index 616c686..196171d 100644 --- a/net/ipv4/tcp_output.c +++ b/net/ipv4/tcp_output.c @@ -1791,6 +1791,10 @@ static void tcp_collapse_retrans(struct sock *sk, struct sk_buff *skb) tcp_highest_sack_combine(sk, next_skb, skb); + /* changed transmit queue under us so clear hints */ + tcp_clear_retrans_hints_partial(tp); + if (next_skb == tp->retransmit_skb_hint) + tp->retransmit_skb_hint = skb; tcp_unlink_write_queue(next_skb, sk); skb_copy_from_linear_data(next_skb, skb_put(skb, next_skb_size), @@ -1813,10 +1817,6 @@ static void tcp_collapse_retrans(struct sock *sk, struct sk_buff *skb) */ TCP_SKB_CB(skb)->sacked |= TCP_SKB_CB(next_skb)->sacked & TCPCB_EVER_RETRANS; - /* changed transmit queue under us so clear hints */ - tcp_clear_retrans_hints_partial(tp); - if (next_skb == tp->retransmit_skb_hint) - tp->retransmit_skb_hint = skb; tcp_adjust_pcount(sk, next_skb, tcp_skb_pcount(next_skb));
Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> wrote: > > One more alternative along those lines could perhaps be: > > We enter with empty write_queue there and with the hint being null, so we > take the else branch... and skb_peek then gives us the NULL ptr. However, > I cannot see how this could happen as all branches trap with return > before the reach tcp_xmit_retransmit_queue. Why don't we add a WARN_ON in there and see if it triggers? Thanks,
aproved regression ... on 2.6.28.6 runs stable
I'm having the exact same issue, it looks like 2.6.28.9 is working fine though. 2.6.29+ I believe is having the problem with crashes on high network load.
On Thu, 29 Oct 2009, David Miller wrote: > From: Eric Dumazet <eric.dumazet@gmail.com> > Date: Thu, 29 Oct 2009 06:59:41 +0100 > > > [PATCH] tcp: clear retrans hints in tcp_send_synack() > > > > There is a small possibility the skb we unlink from write queue > > is still referenced by retrans hints. > > > > Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> > > So, this would only be true if we were dealing with a data > packet here. We're not, this is a SYN+ACK which happens to > be cloned in the write queue. > > The hint SKBs pointers can only point to real data packets. > > And we're only dealing with data packets once we enter established > state, and when we enter established by definition we have unlinked > and freed up any SYN and SYN+ACK SKBs in the write queue. How about this then... Does the original reporter have NFS in use? [PATCH] tcp: clear hints to avoid a stale one (nfs only affected?) Eric Dumazet mentioned in a context of another problem: "Well, it seems NFS reuses its socket, so maybe we miss some cleaning as spotted in this old patch" I've not check under which conditions that actually happens but if true, we need to make sure we don't accidently leave stale hints behind when the write queue had to be purged (whether reusing with NFS can actually happen if purging took place is something I'm not sure of). ...At least it compiles. Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> --- include/net/tcp.h | 1 + 1 files changed, 1 insertions(+), 0 deletions(-) diff --git a/include/net/tcp.h b/include/net/tcp.h index 03a49c7..6b13faa 100644 --- a/include/net/tcp.h +++ b/include/net/tcp.h @@ -1228,6 +1228,7 @@ static inline void tcp_write_queue_purge(struct sock *sk) while ((skb = __skb_dequeue(&sk->sk_write_queue)) != NULL) sk_wmem_free_skb(sk, skb); sk_mem_reclaim(sk); + tcp_clear_all_retrans_hints(tcp_sk(sk)); } static inline struct sk_buff *tcp_write_queue_head(struct sock *sk)
From: "Ilpo Järvinen" <ilpo.jarvinen@helsinki.fi> Date: Thu, 26 Nov 2009 23:54:53 +0200 (EET) > How about this then... Does the original reporter have NFS in use? > > [PATCH] tcp: clear hints to avoid a stale one (nfs only affected?) I must be getting old and senile, but I specifically remembered that we prevented a socket from ever being bound again once it has been bound one time specifically so we didn't have to deal with issues like this. I really don't think it's valid for NFS to reuse the socket structure like this over and over again. And that's why only NFS can reproduce this, the interfaces provided userland can't actually go through this sequence after a socket goes down one time all the way to close. Do we really want to audit each and every odd member of the socket structure from the generic portion all the way down to INET and TCP specifics to figure out what needs to get zero'd out? So much relies upon the one-time full zero out during sock allocation. Let's fix NFS instead.
David Miller a écrit : > I must be getting old and senile, but I specifically remembered that > we prevented a socket from ever being bound again once it has been > bound one time specifically so we didn't have to deal with issues > like this. > > I really don't think it's valid for NFS to reuse the socket structure > like this over and over again. And that's why only NFS can reproduce > this, the interfaces provided userland can't actually go through this > sequence after a socket goes down one time all the way to close. > > Do we really want to audit each and every odd member of the socket > structure from the generic portion all the way down to INET and > TCP specifics to figure out what needs to get zero'd out? An audit is always welcomed, we might find bugs :) > > So much relies upon the one-time full zero out during sock allocation. > > Let's fix NFS instead. bugzilla reference : http://bugzilla.kernel.org/show_bug.cgi?id=14580 Trond said : NFS MUST reuse the same port because on most servers, the replay cache is keyed to the port number. In other words, when we replay an RPC call, the server will only recognise it as a replay if it originates from the same port. See http://www.connectathon.org/talks96/werme1.html Please note the socket stays bound to a given local port. We want to connect() it to a possible other target, that's all. In NFS case 'other target' is in fact the same target, but this is a special case of a more general one. Hmm... if an application wants to keep a local port for itself (not allowing another one to get this (ephemeral ?) port during the close()/socket()/bind() window), this is the only way. TCP state machine allows this IMHO. google for "tcp AF_UNSPEC connect" to find many references and man pages for this stuff. http://kerneltrap.org/Linux/Connect_Specification_versus_Man_Page How other Unixes / OS handle this ? How many applications use this trick ?
we donot use NFS only varnish http reverse proxy: http://varnish.projects.linpro.no/
(In reply to comment #16) > we donot use NFS > only varnish http reverse proxy: http://varnish.projects.linpro.no/ (In reply to comment #9) > > ...I don't understand how a stale reference would yield to a consistent > > NULL ptr crash there rather than hard to track corruption for most of the > > times and random crashes then here and there. Or perhaps we were just very > > lucky to immediately get only those reports which point out to the right > > track :-). > > > > > When a skb is freed, and re-allocated, we clear most of its fields > in __alloc_skb() > > memset(skb, 0, offsetof(struct sk_buff, tail)); > > Then if this skb is freed again, not queued anywhere, its skb->next stays > NULL > > So if we have a stale reference to a freed skb, we can : > > - Get a NULL pointer, or a poisonned value (if SLUB_DEBUG) > > > Here is a debug patch to check we dont have stale pointers, maybe this will > help ?sync > > > [PATCH] tcp: check stale pointers in tcp_unlink_write_queue() > > In order to track some obscure bug, we check in tcp_unlink_write_queue() if > we dont have stale references to unlinked skb > > Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> > --- > include/net/tcp.h | 4 ++++ > net/ipv4/tcp.c | 2 +- > net/ipv4/tcp_input.c | 4 ++-- > net/ipv4/tcp_output.c | 8 ++++---- > 4 files changed, 11 insertions(+), 7 deletions(-) > > diff --git a/include/net/tcp.h b/include/net/tcp.h > index 740d09b..09da342 100644 > --- a/include/net/tcp.h > +++ b/include/net/tcp.h > @@ -1357,6 +1357,10 @@ static inline void > tcp_insert_write_queue_before(struct > sk_buff *new, > > static inline void tcp_unlink_write_queue(struct sk_buff *skb, struct sock > *sk) > { > + WARN_ON(skb == tcp_sk(sk)->retransmit_skb_hint); > + WARN_ON(skb == tcp_sk(sk)->lost_skb_hint); > + WARN_ON(skb == tcp_sk(sk)->scoreboard_skb_hint); > + WARN_ON(skb == sk->sk_send_head); > __skb_unlink(skb, &sk->sk_write_queue); > } > > diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c > index e0cfa63..328bdb1 100644 > --- a/net/ipv4/tcp.c > +++ b/net/ipv4/tcp.c > @@ -1102,11 +1102,11 @@ out: > > do_fault: > if (!skb->len) { > - tcp_unlink_write_queue(skb, sk); > /* It is the one place in all of TCP, except connection > * reset, where we can be unlinking the send_head. > */ > tcp_check_send_head(sk, skb); > + tcp_unlink_write_queue(skb, sk); > sk_wmem_free_skb(sk, skb); > } > > diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c > index ba0eab6..fccc6e9 100644 > --- a/net/ipv4/tcp_input.c > +++ b/net/ipv4/tcp_input.c > @@ -3251,13 +3251,13 @@ static int tcp_clean_rtx_queue(struct sock *sk, int > prior_fackets, > if (!fully_acked) > break; > > - tcp_unlink_write_queue(skb, sk); > - sk_wmem_free_skb(sk, skb); > tp->scoreboard_skb_hint = NULL; > if (skb == tp->retransmit_skb_hint) > tp->retransmit_skb_hint = NULL; > if (skb == tp->lost_skb_hint) > tp->lost_skb_hint = NULL; > + tcp_unlink_write_queue(skb, sk); > + sk_wmem_free_skb(sk, skb); > } > > if (likely(between(tp->snd_up, prior_snd_una, tp->snd_una))) > diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c > index 616c686..196171d 100644 > --- a/net/ipv4/tcp_output.c > +++ b/net/ipv4/tcp_output.c > @@ -1791,6 +1791,10 @@ static void tcp_collapse_retrans(struct sock *sk, > struct > sk_buff *skb) > > tcp_highest_sack_combine(sk, next_skb, skb); > > + /* changed transmit queue under us so clear hints */ > + tcp_clear_retrans_hints_partial(tp); > + if (next_skb == tp->retransmit_skb_hint) > + tp->retransmit_skb_hint = skb; > tcp_unlink_write_queue(next_skb, sk); > > skb_copy_from_linear_data(next_skb, skb_put(skb, next_skb_size), > @@ -1813,10 +1817,6 @@ static void tcp_collapse_retrans(struct sock *sk, > struct > sk_buff *skb) > */ > TCP_SKB_CB(skb)->sacked |= TCP_SKB_CB(next_skb)->sacked & > TCPCB_EVER_RETRANS; > > - /* changed transmit queue under us so clear hints */ > - tcp_clear_retrans_hints_partial(tp); > - if (next_skb == tp->retransmit_skb_hint) > - tp->retransmit_skb_hint = skb; > > tcp_adjust_pcount(sk, next_skb, tcp_skb_pcount(next_skb)); any instruction against what release (2.6.31.6?) apply that patch?
I have 2.6.32 running on roughly 50 servers. So far there haven't been any crashes due to TCP. There were a couple of changes to tcp_* in 2.6.31.10 & 2.6.32. One was the following... http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=bbf31bf18d34caa87dd01f08bf713635593697f2 Has anyone gave 2.6.32 or 2.6.31.10 a try yet? In the past 14 hours or so, there have been 0 crashes due to TCP.
Still no problems with 2.6.32, I'm going to put this kernel on 50 more servers today to test it out. I'll let you know how it goes.
2.6.32 HP DL180G5 got crash today same trace
(In reply to comment #20) > 2.6.32 HP DL180G5 > got crash today > same trace Yup, it just happened on me as well. I had 2 boxes have the problem.
Have the same problem on ~10 boxes. Upgraded to 2.6.32.3, didn't help, they keep crashing like 1 server per day.
(In reply to comment #22) > Have the same problem on ~10 boxes. Upgraded to 2.6.32.3, didn't help, they > keep crashing like 1 server per day. Just curious, what kind of boxes are you running, and what kind of network cards?
It's supermicro, nvidia cards
dell r610; intel 82571EB and 82575 NIC
I confirm this bug. I have 2 Ubuntu 2.6.31-17-servers and they freeze around once a day when under high load. It ends with the same stack trace. It happens with servers behind LVS (Keepalived) load balancer. Server configuration: INTEL MB DQ965GF/GUARDFISH/uATX/A,R,IG (Intel Corporation 82566DM Gigabit integrated) with another Intel Corporation 82541PI Gigabit network adapter. I tried both with the same results.
I've moved disks to completely different hardware (Tyan Transport GT20 B2925G20V4H, no e1000s, nvidia chipset and ethernet) and it still hangs. This is what I've found on the screen (typed by hand from screen, do it helps without more detail?): tcp_rcv_state_process tcp_v4_do_rcv tcp_v4_rcv ip_local_deliver_finish nf_hook_slow ip_local_deliver_finish ip_local_deliver_finish ip_local_deliver ip_rcv_finish ip_rcv netif_receive_skb process_backlog net_rx_action __do_softirq call_softirq do_softirq irq_exit do_IRQ ret_from_intr native_safe_halt default_idle c1e_idle cpu_idle start_secondary maybe the fact that we all (?) have this problem with servers behind load balancer is important? I've moved back to 2.6.28-6 as recommended and it seems stable...
I don't use any balancer in front. Just nginx... Seems I have no option than just go back to 2.6.28 until this problem is fixed in 2.6.31+ We thought it was because of netconsole + nvidia, removed netconsole from kernel, still happening with 2.6.32.x. Jan 31 14:23:21 bstorage37-i [133369.201250] BUG: unable to handle kernel Jan 31 14:23:21 bstorage37-i NULL pointer dereference Jan 31 14:23:21 bstorage37-i at (null) Jan 31 14:23:21 bstorage37-i [133369.201274] IP: Jan 31 14:23:21 bstorage37-i [<c060164a>] tcp_xmit_retransmit_queue+0x1b2/0x1dc Jan 31 14:23:21 bstorage37-i [133369.201295] *pdpt = 0000000021b03001 Jan 31 14:23:21 bstorage37-i *pde = 0000000000000000 Jan 31 14:23:21 bstorage37-i Jan 31 14:23:21 bstorage37-i [133369.201311] Thread overran stack, or stack corrupted Jan 31 14:23:21 bstorage37-i [133369.201323] Oops: 0000 [#1] Jan 31 14:23:21 bstorage37-i SMP Jan 31 14:23:21 bstorage37-i Jan 31 14:23:21 bstorage37-i [133369.201336] last sysfs file: /sys/devices/pci0000:00/0000:00:0f.0/0000:07:00.0/0000:08:01.0/0000:09:00.0/class Jan 31 14:23:21 bstorage37-i [133369.201355] Jan 31 14:23:21 bstorage37-i [133369.201363] Pid: 0, comm: swapper Not tainted (2.6.31.6-v03 #2) H8DMU Jan 31 14:23:21 bstorage37-i [133369.201377] EIP: 0060:[<c060164a>] EFLAGS: 00010246 CPU: 0 Jan 31 14:23:21 bstorage37-i [133369.201390] EIP is at tcp_xmit_retransmit_queue+0x1b2/0x1dc Jan 31 14:23:21 bstorage37-i [133369.201401] EAX: dc5d08fc EBX: dc5d0880 ECX: 19dc6948 EDX: dc5d08fc Jan 31 14:23:21 bstorage37-i [133369.201413] ESI: 00000000 EDI: 00000000 EBP: c0805d28 ESP: c0805d0c Jan 31 14:23:21 bstorage37-i [133369.201426] DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068 Jan 31 14:23:21 bstorage37-i [133369.201438] Process swapper (pid: 0, ti=c0804000 task=c080b5a0 task.ti=c0804000) Jan 31 14:23:21 bstorage37-i [133369.201453] Stack: Jan 31 14:23:21 bstorage37-i [133369.201457] 00000202
I confirm this bug with for two Supermicro AMD servers with nvidia MCP55 NICs, running 2.6.32.2. Both run apache httpd behind a LVS loadbalancer without NFS. Other network traffic would be ssh and snmp. eth0 is shared with an ipmi device. They have also crashed both using 2.6.32.8, but I did not have an opportunity to look at the console before rebooting, so I don't know for sure it was in the tcp_xmit_retransmit_queue
I confirm this bug with for 6 Supermicro AMD servers with Intel 80003ES2LAN Gigabit Ethernet NICs, kernel version 2.6.29
This problem disappeared when I downgraded kernel to 2.6.26.8. supermicro intel NIC netconsole enabled. so, it's not hardware-related. waiting for kernel's fix.
Just an update: both our supermicro's with nVidia NICs seem to stay up with 2.6.28.6
Anyone trying latest 2.6.33? I'm testing it on one machine and it's stable for 7 days now. It's this package: http://kernel.ubuntu.com/~kernel-ppa/mainline/v2.6.33/ amd64 version.
I'm installing 2.6.33 to ~7 boxes, lets see how stable it is.
*** Bug 15487 has been marked as a duplicate of this bug. ***
Any news about 2.6.33 ? I compared /net/ipv4/tcp_output.c between 2.6.32.8 and 2.6.34-rc1 and no differences found in tcp_xmit_retransmit_queue(). All I notices were about new sysctl variable tcp_cookie_size. And no information in kernel Changelogs about corrections in this field.
I'm not sure for 100%, but it seems that disabling SACK hides the problem on 2.6.32.8 (and may be earlier) net.ipv4.tcp_sack=0 (by default it is enabled) Without SACK i haven't noticed this bug yet on 27 servers with 2.6.32.8 and 4 servers with 2.6.31.2 (all highly loaded web servers)
Today, 2.6.33 server crashed again with the usual symptoms. Returning back to 2.6.28-6
Petr, did you have tcp_sack=0 when it crashed?
No, I had default /proc/sys/net/ipv4/tcp_sack = 1 I was starting to think that 2.6.33 is going to be stable (it survived 14 days - 2.6.32 always lasted only few days) so I didn't try that recommendation. Ok, it's an idea. I'm not returning to older version, I'll try that old freezing 2.6.31 with tcp_sack 0 and we'll see.
We're seeing the same issue on 2.6.30.8 -> 2.6.33 (mostly 2.6.33 now) systems pushing 1gbit/s+, these are SuperMicro X8DTU and Dell PE850 systems. (This oops from the SuperMicro flavor w/ igb) tcp_sack = 1 here's the OOPS (posting because it differs slightly from the above) /sys/devices/pci0000:00/0000:00:01.0/0000:01:00.1/irq == igb Mar 16 02:26:04 mobile02 kernel: [48936.189104] BUG: unable to handle kernel NULL pointer dereference at (null) Mar 16 02:26:04 mobile02 kernel: [48936.189144] IP: [<ffffffff816cf308>] tcp_xmit_retransmit_queue+0x68/0x270 Mar 16 02:26:04 mobile02 kernel: [48936.189181] PGD c35c08067 PUD c35c09067 PMD 0 Mar 16 02:26:04 mobile02 kernel: [48936.189211] Oops: 0000 [#1] SMP Mar 16 02:26:04 mobile02 kernel: [48936.189236] last sysfs file: /sys/devices/pci0000:00/0000:00:01.0/0000:01:00.1/irq Mar 16 02:26:04 mobile02 kernel: [48936.189280] CPU 15 Mar 16 02:26:04 mobile02 kernel: [48936.189329] Pid: 0, comm: swapper Not tainted 2.6.33 #1 X8DTU/X8DTU Mar 16 02:26:04 mobile02 kernel: [48936.189360] RIP: 0010:[<ffffffff816cf308>] [<ffffffff816cf308>] tcp_xmit_retransmit_queue+0x68/0x270 Mar 16 02:26:04 mobile02 kernel: [48936.189409] RSP: 0018:ffff8806555c3b10 EFLAGS: 00010246 Mar 16 02:26:04 mobile02 kernel: [48936.189435] RAX: 000000000adbc3a1 RBX: ffff880b7b649980 RCX: ffff880b7b649c98 Mar 16 02:26:04 mobile02 kernel: [48936.189464] RDX: 0000000000000002 RSI: 000000000000050e RDI: ffff880b7b649980 Mar 16 02:26:04 mobile02 kernel: [48936.189494] RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000003 Mar 16 02:26:04 mobile02 kernel: [48936.189523] R10: 000000000adb9981 R11: 0000000000000000 R12: 0000000000000006 Mar 16 02:26:04 mobile02 kernel: [48936.189576] R13: 0000000000000000 R14: ffff880b7b649a48 R15: 0000000000000000 Mar 16 02:26:04 mobile02 kernel: [48936.189607] FS: 0000000000000000(0000) GS:ffff8806555c0000(0000) knlGS:0000000000000000 Mar 16 02:26:04 mobile02 kernel: [48936.189652] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b Mar 16 02:26:04 mobile02 kernel: [48936.189689] CR2: 0000000000000000 CR3: 0000000c35c07000 CR4: 00000000000006e0 Mar 16 02:26:04 mobile02 kernel: [48936.189721] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 Mar 16 02:26:04 mobile02 kernel: [48936.189750] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Mar 16 02:26:04 mobile02 kernel: [48936.189781] Process swapper (pid: 0, threadinfo ffff880c3ce14000, task ffff88063cdd0cc0) Mar 16 02:26:04 mobile02 kernel: [48936.189828] Stack: Mar 16 02:26:04 mobile02 kernel: [48936.189847] ffff880a8dba6c00 ffff880b7b649c98 0adbc3a1fdbf4c00 000000000000050e Mar 16 02:26:04 mobile02 kernel: [48936.189881] <0> ffff880b7b649980 0000000000000006 0000000000000000 0000000000000000 Mar 16 02:26:04 mobile02 kernel: [48936.189933] <0> 0000000000000000 ffffffff816c9469 0000000000000005 0000000000000000 Mar 16 02:26:04 mobile02 kernel: [48936.190002] Call Trace: Mar 16 02:26:04 mobile02 kernel: [48936.190023] <IRQ> Mar 16 02:26:04 mobile02 kernel: [48936.190046] [<ffffffff816c9469>] ? tcp_ack+0x1389/0x2020 Mar 16 02:26:04 mobile02 kernel: [48936.190075] [<ffffffff816ca625>] ? tcp_validate_incoming+0x105/0x330 Mar 16 02:26:04 mobile02 kernel: [48936.190104] [<ffffffff816cb60e>] ? tcp_rcv_state_process+0x7e/0xc70 Mar 16 02:26:04 mobile02 kernel: [48936.190133] [<ffffffff816d2d61>] ? tcp_v4_do_rcv+0xa1/0x230 Mar 16 02:26:04 mobile02 kernel: [48936.190160] [<ffffffff816d34fa>] ? tcp_v4_rcv+0x60a/0x7e0 Mar 16 02:26:04 mobile02 kernel: [48936.190188] [<ffffffff816b37ea>] ? ip_local_deliver_finish+0x8a/0x1a0 Mar 16 02:26:04 mobile02 kernel: [48936.190216] [<ffffffff816b325d>] ? ip_rcv_finish+0x18d/0x3b0 Mar 16 02:26:04 mobile02 kernel: [48936.190243] [<ffffffff816b36d7>] ? ip_rcv+0x257/0x2e0 Mar 16 02:26:04 mobile02 kernel: [48936.190272] [<ffffffff8166b080>] ? napi_skb_finish+0x40/0x50 Mar 16 02:26:04 mobile02 kernel: [48936.190330] [<ffffffff814e3150>] ? igb_poll+0x7d0/0xe50 Mar 16 02:26:04 mobile02 kernel: [48936.190356] [<ffffffff8166b663>] ? net_rx_action+0x83/0x120 Mar 16 02:26:04 mobile02 kernel: [48936.190387] [<ffffffff81044fe7>] ? __do_softirq+0xa7/0x130 Mar 16 02:26:04 mobile02 kernel: [48936.190417] [<ffffffff8105e851>] ? ktime_get+0x61/0xe0 Mar 16 02:26:04 mobile02 kernel: [48936.190445] [<ffffffff8100330c>] ? call_softirq+0x1c/0x30 Mar 16 02:26:04 mobile02 kernel: [48936.190472] [<ffffffff8100524d>] ? do_softirq+0x4d/0x80 Mar 16 02:26:04 mobile02 kernel: [48936.190499] [<ffffffff81044cd5>] ? irq_exit+0x75/0x90 Mar 16 02:26:04 mobile02 kernel: [48936.190525] [<ffffffff810047ee>] ? do_IRQ+0x6e/0xf0 Mar 16 02:26:04 mobile02 kernel: [48936.190585] [<ffffffff817a5ed3>] ? ret_from_intr+0x0/0xa Mar 16 02:26:04 mobile02 kernel: [48936.190611] <EOI> Mar 16 02:26:04 mobile02 kernel: [48936.190635] [<ffffffff8163b3b0>] ? menu_reflect+0x0/0x20 Mar 16 02:26:04 mobile02 kernel: [48936.190665] [<ffffffff813936a8>] ? acpi_idle_enter_c1+0x8a/0xf3 Mar 16 02:26:04 mobile02 kernel: [48936.190693] [<ffffffff81393672>] ? acpi_idle_enter_c1+0x54/0xf3 Mar 16 02:26:04 mobile02 kernel: [48936.190724] [<ffffffff8163b4d8>] ? menu_select+0x108/0x290 Mar 16 02:26:04 mobile02 kernel: [48936.190751] [<ffffffff8163a5da>] ? cpuidle_idle_call+0xba/0x120 Mar 16 02:26:04 mobile02 kernel: [48936.190779] [<ffffffff810016fa>] ? cpu_idle+0xaa/0x110 Mar 16 02:26:04 mobile02 kernel: [48936.190807] Code: 05 00 00 4c 8d b3 c8 00 00 00 39 c2 89 54 24 14 78 04 89 44 24 14 48 8d 8b 18 03 00 00 45 31 ed 45 31 ff 48 89 4c 24 08 0f 1f 00 <48> 8b 45 00 49 39 ee 0f 18 08 74 62 48 3b ab 00 02 00 00 48 8d Mar 16 02:26:04 mobile02 kernel: [48936.191007] RIP [<ffffffff816cf308>] tcp_xmit_retransmit_queue+0x68/0x270 Mar 16 02:26:04 mobile02 kernel: [48936.191038] RSP <ffff8806555c3b10> Mar 16 02:26:04 mobile02 kernel: [48936.191060] CR2: 0000000000000000 Mar 16 02:26:04 mobile02 kernel: [48936.191412] ---[ end trace 3f1fda40fce80ab1 ]--- Mar 16 02:26:04 mobile02 kernel: [48936.191478] Kernel panic - not syncing: Fatal exception in interrupt Mar 16 02:26:04 mobile02 kernel: [48936.191546] Pid: 0, comm: swapper Tainted: G D 2.6.33 #1 Mar 16 02:26:04 mobile02 kernel: [48936.191616] Call Trace: Mar 16 02:26:04 mobile02 kernel: [48936.191677] <IRQ> [<ffffffff817a2f4d>] ? panic+0x86/0x159 Mar 16 02:26:04 mobile02 kernel: [48936.191792] [<ffffffff81002dd3>] ? apic_timer_interrupt+0x13/0x20 Mar 16 02:26:04 mobile02 kernel: [48936.191863] [<ffffffff8136c060>] ? vgacon_cursor+0x0/0x240 Mar 16 02:26:04 mobile02 kernel: [48936.191930] [<ffffffff8104037e>] ? kmsg_dump+0x7e/0x140 Mar 16 02:26:04 mobile02 kernel: [48936.191999] [<ffffffff810066d5>] ? oops_end+0x95/0xa0 Mar 16 02:26:04 mobile02 kernel: [48936.193254] [<ffffffff81023530>] ? no_context+0x100/0x270 Mar 16 02:26:04 mobile02 kernel: [48936.193321] [<ffffffff810237f5>] ? __bad_area_nosemaphore+0x155/0x230 Mar 16 02:26:04 mobile02 kernel: [48936.193392] [<ffffffff8167f497>] ? sch_direct_xmit+0x77/0x1d0 Mar 16 02:26:04 mobile02 kernel: [48936.193461] [<ffffffff8166bffd>] ? dev_queue_xmit+0x13d/0x5a0 Mar 16 02:26:04 mobile02 kernel: [48936.193539] [<ffffffff817a60df>] ? page_fault+0x1f/0x30 Mar 16 02:26:04 mobile02 kernel: [48936.193646] [<ffffffff816cf308>] ? tcp_xmit_retransmit_queue+0x68/0x270 Mar 16 02:26:04 mobile02 kernel: [48936.193714] [<ffffffff816c9469>] ? tcp_ack+0x1389/0x2020 Mar 16 02:26:04 mobile02 kernel: [48936.193782] [<ffffffff816ca625>] ? tcp_validate_incoming+0x105/0x330 Mar 16 02:26:04 mobile02 kernel: [48936.193850] [<ffffffff816cb60e>] ? tcp_rcv_state_process+0x7e/0xc70 Mar 16 02:26:04 mobile02 kernel: [48936.193918] [<ffffffff816d2d61>] ? tcp_v4_do_rcv+0xa1/0x230 Mar 16 02:26:04 mobile02 kernel: [48936.193984] [<ffffffff816d34fa>] ? tcp_v4_rcv+0x60a/0x7e0 Mar 16 02:26:04 mobile02 kernel: [48936.194054] [<ffffffff816b37ea>] ? ip_local_deliver_finish+0x8a/0x1a0 Mar 16 02:26:04 mobile02 kernel: [48936.194125] [<ffffffff816b325d>] ? ip_rcv_finish+0x18d/0x3b0 Mar 16 02:26:04 mobile02 kernel: [48936.194191] [<ffffffff816b36d7>] ? ip_rcv+0x257/0x2e0 Mar 16 02:26:04 mobile02 kernel: [48936.194257] [<ffffffff8166b080>] ? napi_skb_finish+0x40/0x50 Mar 16 02:26:04 mobile02 kernel: [48936.194348] [<ffffffff814e3150>] ? igb_poll+0x7d0/0xe50 Mar 16 02:26:04 mobile02 kernel: [48936.194414] [<ffffffff8166b663>] ? net_rx_action+0x83/0x120 Mar 16 02:26:04 mobile02 kernel: [48936.194482] [<ffffffff81044fe7>] ? __do_softirq+0xa7/0x130 Mar 16 02:26:04 mobile02 kernel: [48936.194579] [<ffffffff8105e851>] ? ktime_get+0x61/0xe0 Mar 16 02:26:04 mobile02 kernel: [48936.194649] [<ffffffff8100330c>] ? call_softirq+0x1c/0x30 Mar 16 02:26:04 mobile02 kernel: [48936.194719] [<ffffffff8100524d>] ? do_softirq+0x4d/0x80
On Wed, 02 Dec 2009 22:24:46 -0800 (PST) David Miller <davem@davemloft.net> wrote: > From: "Ilpo J__rvinen" <ilpo.jarvinen@helsinki.fi> > Date: Thu, 26 Nov 2009 23:54:53 +0200 (EET) > > > [PATCH] tcp: clear hints to avoid a stale one (nfs only affected?) > > Ok, since Linus just released 2.6.32 I'm tossing this into net-next-2.6 > so it gets wider exposure. > > I still want to see test results from the bug reporter, and if it fixes > things we can toss this into -stable too. Despite my request to take this to email, quite a few people have been jumping onto this report via bugzilla: http://bugzilla.kernel.org/show_bug.cgi?id=14470 Bit of a pita, but it'd be worth someone taking a look to ensure that we're all talking about the same bug.
confirmed the same bug with 2.6.33: tcp_sack = 1 Mar 19 23:42:55 bstorage25-i [644676.050103] EIP: [<c141c0f1>] Mar 19 23:42:55 bstorage25-i tcp_xmit_retransmit_queue+0x1a8/0x1d2 Mar 19 23:42:55 bstorage25-i SS:ESP 0068:c1631d28 Mar 19 23:42:55 bstorage25-i [644676.050103] CR2: 0000000000000000 Mar 19 23:42:55 bstorage25-i [644676.052710] ---[ end trace b193123ded1c81f0 ]--- Mar 19 23:42:55 bstorage25-i [644676.052943] Kernel panic - not syncing: Fatal exception in interrupt Mar 19 23:42:55 bstorage25-i [644676.053147] Pid: 0, comm: swapper Tainted: G D 2.6.33-v01 #1 Mar 19 23:42:55 bstorage25-i [644676.053350] Call Trace: I'm going to do tcp_sack = 0 to see if there's any changes
On Thu, 18 Mar 2010, Andrew Morton wrote: > On Wed, 02 Dec 2009 22:24:46 -0800 (PST) > David Miller <davem@davemloft.net> wrote: > > > From: "Ilpo J__rvinen" <ilpo.jarvinen@helsinki.fi> > > Date: Thu, 26 Nov 2009 23:54:53 +0200 (EET) > > > > > [PATCH] tcp: clear hints to avoid a stale one (nfs only affected?) > > > > Ok, since Linus just released 2.6.32 I'm tossing this into net-next-2.6 > > so it gets wider exposure. > > > > I still want to see test results from the bug reporter, and if it fixes > > things we can toss this into -stable too. > > Despite my request to take this to email, quite a few people have been > jumping onto this report via bugzilla: > http://bugzilla.kernel.org/show_bug.cgi?id=14470 > > Bit of a pita, but it'd be worth someone taking a look to ensure that > we're all talking about the same bug. Could one try with this debug patch: http://marc.info/?l=linux-kernel&m=126624014117610&w=2 It should prevent crashing too.
I've got a new kernel rolled out on over 200 servers with tcp_sack set to 0, and we haven't had any stability issues in over 72 hours. We would have had at least 10 servers kernel panic by now. I'll take a look at that patch and might give it a try if I have time.
The issue is still happening in stable 2.6.33.1 Is there a patch that's in the works for one of the next kernel revisions?
(In reply to comment #46) > The issue is still happening in stable 2.6.33.1 > > Is there a patch that's in the works for one of the next kernel revisions? Bill, Have you tried disabling tcp_sack? echo 0 > /proc/sys/net/ipv4/tcp_sack ... or add it to sysctl.conf. ... or here is a little patch to keep it disabled. --- linux-2.6.33.1/net/ipv4/tcp_input.c.old 2010-03-19 18:39:43.000000000 -0500 +++ linux-2.6.33.1/net/ipv4/tcp_input.c 2010-03-19 18:39:29.000000000 -0500 @@ -74,7 +74,7 @@ int sysctl_tcp_timestamps __read_mostly = 1; int sysctl_tcp_window_scaling __read_mostly = 1; -int sysctl_tcp_sack __read_mostly = 1; +int sysctl_tcp_sack __read_mostly = 0; int sysctl_tcp_fack __read_mostly = 1; int sysctl_tcp_reordering __read_mostly = TCP_FASTRETRANS_THRESH; int sysctl_tcp_ecn __read_mostly = 2;
Won't turning of tcp selective acknowledgement result in other performance penalties?
Same problem on 2.6.31.12 and 2.6.32.8 on 4 machines. disabling sack hides the problem.
Same problem here: * 2.6.32.13-grsec - PANICs * 2.6.32.9-grsec - PANICs * 2.6.27.4-grsec - OK Hardware is Supermicro X7DVL with Intel Xeon CPUs and Intel Ethernet Controllers. Disabling SACK resolves the problem.
Problem seems to be solved by: commit dc6330590fbd5fad17f06663c5f0bed834054b2b Author: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Date: Mon Jul 19 01:16:18 2010 +0000 tcp: fix crash in tcp_xmit_retransmit_queue commit 45e77d314585869dfe43c82679f7e08c9b35b898 upstream. It can happen that there are no packets in queue while calling tcp_xmit_retransmit_queue(). tcp_write_queue_head() then returns NULL and that gets deref'ed to get sacked into a local var. There is no work to do if no packets are outstanding so we just exit early. This oops was introduced by 08ebd1721ab8fd (tcp: remove tp->lost_out guard to make joining diff nicer). Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Reported-by: Lennart Schulte <lennart.schulte@nets.rwth-aachen.de> Tested-by: Lennart Schulte <lennart.schulte@nets.rwth-aachen.de> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de> in 2.6.32.17 and in all other active branches after Jul 19 (you can search in corresponding changelog). I enabled sack and had no any crash for several weeks. May be someone can test too.