Bug 60779

Summary: tcp_fastretrans_alert triggered on tcp_input.c
Product: Networking Reporter: Otavio Cipriani (otavio.n.cipriani)
Component: IPV4Assignee: Stephen Hemminger (stephen)
Status: NEW ---    
Severity: normal CC: cristian.ciupitu, gabriel, hamer.mk, igor, leon+kernel, luka.kacil, phmagic, root, witold.baryluk+kernel, www
Priority: P1    
Hardware: x86-64   
OS: Linux   
Kernel Version: 3.10.7 Subsystem:
Regression: No Bisected commit-id:
Attachments: Call trace
A few more Call traces
Even more call traces
stack trace with IPv6 receive path
Even more stack traces
another calltrace 3.11 (ubuntu 13.10)

Description Otavio Cipriani 2013-08-21 22:55:42 UTC
Created attachment 107276 [details]
Call trace

Unfortunately I do not know how to reproduce the bug. Nothing seems to be affected.

The only parameters I changed were:
net.core.rmem_max = 4194304;
net.core.wmem_max = 1048576.
Comment 1 www 2013-09-05 03:21:51 UTC
Created attachment 107422 [details]
A few more Call traces
Comment 2 www 2013-09-05 03:23:44 UTC
My system has ipv6 enabled with static ipv4 and ipv6 addresses
the net.core.rmem_max is 212992
and the net.core.wmem_max is 212992
Comment 3 Luka 2013-09-29 20:41:53 UTC
Created attachment 109971 [details]
Even more call traces
Comment 4 Luka 2013-09-29 20:42:40 UTC
Comment on attachment 109971 [details]
Even more call traces

The same issue here. I have ipv6 disabled (ipv6.disable=1)
Comment 5 Witold Baryluk 2013-10-01 02:41:08 UTC
Created attachment 110091 [details]
stack trace with IPv6 receive path
Comment 6 Witold Baryluk 2013-10-01 02:41:24 UTC
I belive I have same issues. I am using native IPv6 on my network.

Running:

Linux version 3.11-trunk-amd64 (debian-kernel@lists.debian.org) (gcc version 4.8.1 (Debian 4.8.1-9) ) #1 SMP Debian 3.11-1~exp1 (2013-09-12)

Hi,

from time to time, I expirience hard crashes / hangs of my computer. I
was attributing this previously to fglrx driver, but I switched few weeks
ago to open source driver completly, and it still happens sometimes. I
also removed virtualbox modules and hangs are still happening. It is also
rather not related to my usage of ZFS-on-linux, as problem happens
even when I do not have modules loaded, or file systems mounted.

I think it happens when I am running boinc-client which runs
einstein@home workload (GPU disabled). When I manually disable it in
/etc/default/boinc or in boinc-manager GUI, I can run machine for days
without issues. Otherwise it crashes after few minutes or maybe hours.

Sometimes I am able to unmount/kill/reboot machine using SysRq
from USB or PS/2 keyboard, but often not. Screen is black. Machine
is also not reachable from network. Almost always it happens when I am
away from computer. I sometimes happens even when I am not logged into
desktop manager (like MATE), so it is rather impossible that it is
graphics card related (I had previously issues with flash or
chromium crashing system).

I do not have much of the logs, but few times I was able to reboot
computer from keyboard, and I think I identified
problem. 2 log files contained exactly same kernel WARNING, attached
below.

It is amd64 system with Intel Core i7-3930K, with 32GB RAM, on ASrock X79
Extreme11, BIOS P2.80.

It happens both with and without Xen hypervisor. (most of the time I run
without it).

Problem appears also on older kernels, like 3.10, 3.2. I
am using 3.11 as it finally allows me running open
source drivers for Radeon 7970. Previous kernels also had issues loading
firmware for my sound card, so this is the one I am stickig to. I belive
problem also happens on custom built kernels (actually it crashes faster
I think).

I am using native IPv6 on one ethernet connection.

I am using native IPv6 on one ethernet connection.

1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default.
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host.
       valid_lft forever preferred_lft forever
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
    link/ether bc:5f:xx:xx:xx:xx brd ff:ff:ff:ff:ff:ff
    inet 10.0.0.x/24 brd 10.0.0.255 scope global eth0
       valid_lft forever preferred_lft forever
    inet6 2001:xxx:xx:xxx:xxxx:xxxx:xxxx:xxxx/64 scope global temporary dynamic.
       valid_lft 603591sec preferred_lft 84591sec
    inet6 2001:xxx:xx:xxx:xxxx:xxxx:xxxx:xxxx/64 scope global dynamic.
       valid_lft 2591713sec preferred_lft 604513sec
    inet6 fe80::be5f:xxxx:xxxx:xxxx/64 scope link.
       valid_lft forever preferred_lft forever
3: eth1: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc mq state DOWN group default qlen 1000
    link/ether bc:5f:xx:xx:xx:xx brd ff:ff:ff:ff:ff:ff

Addresses are provided via router advertising IPv6 subnet and routing via
some tunnels. I also run rdnssd daemon on this
machine, and local bind9 server over IPv6 and IPv4.
Comment 7 Witold Baryluk 2013-10-01 02:43:06 UTC
https://bugzilla.redhat.com/show_bug.cgi?id=989251 looks to be similar issue
Comment 8 Alexander Bezrukov 2013-10-22 05:25:03 UTC
Created attachment 111881 [details]
Even more stack traces

I seem to also have hit this bug. The kernel is vanilla 3.10.17, configured without ipv6 support. The system is a web-server under a very slight load (at the moment).

CONFIG_NO_HZ_IDLE=y
# CONFIG_IPV6 is not set

Till yesterday I was running 3.10.12 and have never seen these symptoms. Since I updated to 3.10.17, there are 8 occurencies in 1 day. Attached is the excerpt from dmesg output.
Comment 9 Igor Novgorodov 2013-10-28 16:11:02 UTC
Same here, kernel 3.10.10

[3696421.517564] ------------[ cut here ]------------
[3696421.517574] WARNING: at net/ipv4/tcp_input.c:2776 tcp_fastretrans_alert+0xc44/0xc70()
[3696421.517575] Modules linked in:
[3696421.517578] CPU: 1 PID: 0 Comm: swapper/1 Not tainted 3.10.10-NAS #1
[3696421.517579] Hardware name: System manufacturer System Product Name/P8H77-I, BIOS 1004 06/08/2013
[3696421.517580]  ffffffff817addc0 0000000000000000 ffffffff8107855a ffff8800a7dd5b00
[3696421.517582]  000000000000452e 0000000000000000 0000000000000000 0000000000000000
[3696421.517584]  ffffffff81677434 9e61baaa00000000 0000000100000286 ffff8800a7dd5b00
[3696421.517586] Call Trace:
[3696421.517587]  <IRQ>  [<ffffffff817addc0>] ? dump_stack+0xd/0x17
[3696421.517594]  [<ffffffff8107855a>] ? warn_slowpath_common+0x6a/0xa0
[3696421.517596]  [<ffffffff81677434>] ? tcp_fastretrans_alert+0xc44/0xc70
[3696421.517597]  [<ffffffff81677e68>] ? tcp_ack+0x958/0xe20
[3696421.517599]  [<ffffffff816788f1>] ? tcp_rcv_established+0x271/0x6e0
[3696421.517602]  [<ffffffff81681816>] ? tcp_v4_do_rcv+0x156/0x320
[3696421.517605]  [<ffffffff817b462a>] ? common_interrupt+0x6a/0x6a
[3696421.517607]  [<ffffffff81683437>] ? tcp_v4_rcv+0x737/0x750
[3696421.517610]  [<ffffffff8163b84d>] ? nf_hook_slow+0xdd/0x120
[3696421.517612]  [<ffffffff81660ec0>] ? ip_rcv_finish+0x2b0/0x2b0
[3696421.517614]  [<ffffffff81660f58>] ? ip_local_deliver_finish+0x98/0x1d0
[3696421.517616]  [<ffffffff8161a162>] ? __netif_receive_skb_core+0x432/0x560
[3696421.517618]  [<ffffffff8161a393>] ? process_backlog+0x93/0x160
[3696421.517620]  [<ffffffff8161aa51>] ? net_rx_action+0x81/0x130
[3696421.517623]  [<ffffffff8107f416>] ? __do_softirq+0xd6/0x1a0
[3696421.517625]  [<ffffffff8107f615>] ? irq_exit+0x95/0xa0
[3696421.517628]  [<ffffffff8103838b>] ? do_IRQ+0x5b/0xd0
[3696421.517631]  [<ffffffff817b462a>] ? common_interrupt+0x6a/0x6a
[3696421.517631]  <EOI>  [<ffffffff8159c066>] ? cpuidle_enter_state+0x56/0xe0
[3696421.517635]  [<ffffffff8159c062>] ? cpuidle_enter_state+0x52/0xe0
[3696421.517638]  [<ffffffff810b6f68>] ? __tick_nohz_idle_enter+0x368/0x440
[3696421.517640]  [<ffffffff8159d3d1>] ? ladder_select_state+0x31/0x1e0
[3696421.517641]  [<ffffffff8159c18e>] ? cpuidle_idle_call+0x9e/0x140
[3696421.517644]  [<ffffffff8103f499>] ? arch_cpu_idle+0x9/0x30
[3696421.517646]  [<ffffffff810ae816>] ? cpu_startup_entry+0x76/0x160
[3696421.517647] ---[ end trace 2e556a8d5507b1af ]---
Comment 10 Leonid Evdokimov 2013-11-21 10:56:43 UTC
Created attachment 115481 [details]
another calltrace 3.11 (ubuntu 13.10)