Bug 60779
Summary: | tcp_fastretrans_alert triggered on tcp_input.c | ||
---|---|---|---|
Product: | Networking | Reporter: | Otavio Cipriani (otavio.n.cipriani) |
Component: | IPV4 | Assignee: | Stephen Hemminger (stephen) |
Status: | NEW --- | ||
Severity: | normal | CC: | cristian.ciupitu, gabriel, hamer.mk, igor, leon+kernel, luka.kacil, phmagic, root, witold.baryluk+kernel, www |
Priority: | P1 | ||
Hardware: | x86-64 | ||
OS: | Linux | ||
Kernel Version: | 3.10.7 | Subsystem: | |
Regression: | No | Bisected commit-id: | |
Attachments: |
Call trace
A few more Call traces Even more call traces stack trace with IPv6 receive path Even more stack traces another calltrace 3.11 (ubuntu 13.10) |
Created attachment 107422 [details]
A few more Call traces
My system has ipv6 enabled with static ipv4 and ipv6 addresses the net.core.rmem_max is 212992 and the net.core.wmem_max is 212992 Created attachment 109971 [details]
Even more call traces
Comment on attachment 109971 [details]
Even more call traces
The same issue here. I have ipv6 disabled (ipv6.disable=1)
Created attachment 110091 [details]
stack trace with IPv6 receive path
I belive I have same issues. I am using native IPv6 on my network. Running: Linux version 3.11-trunk-amd64 (debian-kernel@lists.debian.org) (gcc version 4.8.1 (Debian 4.8.1-9) ) #1 SMP Debian 3.11-1~exp1 (2013-09-12) Hi, from time to time, I expirience hard crashes / hangs of my computer. I was attributing this previously to fglrx driver, but I switched few weeks ago to open source driver completly, and it still happens sometimes. I also removed virtualbox modules and hangs are still happening. It is also rather not related to my usage of ZFS-on-linux, as problem happens even when I do not have modules loaded, or file systems mounted. I think it happens when I am running boinc-client which runs einstein@home workload (GPU disabled). When I manually disable it in /etc/default/boinc or in boinc-manager GUI, I can run machine for days without issues. Otherwise it crashes after few minutes or maybe hours. Sometimes I am able to unmount/kill/reboot machine using SysRq from USB or PS/2 keyboard, but often not. Screen is black. Machine is also not reachable from network. Almost always it happens when I am away from computer. I sometimes happens even when I am not logged into desktop manager (like MATE), so it is rather impossible that it is graphics card related (I had previously issues with flash or chromium crashing system). I do not have much of the logs, but few times I was able to reboot computer from keyboard, and I think I identified problem. 2 log files contained exactly same kernel WARNING, attached below. It is amd64 system with Intel Core i7-3930K, with 32GB RAM, on ASrock X79 Extreme11, BIOS P2.80. It happens both with and without Xen hypervisor. (most of the time I run without it). Problem appears also on older kernels, like 3.10, 3.2. I am using 3.11 as it finally allows me running open source drivers for Radeon 7970. Previous kernels also had issues loading firmware for my sound card, so this is the one I am stickig to. I belive problem also happens on custom built kernels (actually it crashes faster I think). I am using native IPv6 on one ethernet connection. I am using native IPv6 on one ethernet connection. 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default. link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 inet 127.0.0.1/8 scope host lo valid_lft forever preferred_lft forever inet6 ::1/128 scope host. valid_lft forever preferred_lft forever 2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000 link/ether bc:5f:xx:xx:xx:xx brd ff:ff:ff:ff:ff:ff inet 10.0.0.x/24 brd 10.0.0.255 scope global eth0 valid_lft forever preferred_lft forever inet6 2001:xxx:xx:xxx:xxxx:xxxx:xxxx:xxxx/64 scope global temporary dynamic. valid_lft 603591sec preferred_lft 84591sec inet6 2001:xxx:xx:xxx:xxxx:xxxx:xxxx:xxxx/64 scope global dynamic. valid_lft 2591713sec preferred_lft 604513sec inet6 fe80::be5f:xxxx:xxxx:xxxx/64 scope link. valid_lft forever preferred_lft forever 3: eth1: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc mq state DOWN group default qlen 1000 link/ether bc:5f:xx:xx:xx:xx brd ff:ff:ff:ff:ff:ff Addresses are provided via router advertising IPv6 subnet and routing via some tunnels. I also run rdnssd daemon on this machine, and local bind9 server over IPv6 and IPv4. https://bugzilla.redhat.com/show_bug.cgi?id=989251 looks to be similar issue Created attachment 111881 [details]
Even more stack traces
I seem to also have hit this bug. The kernel is vanilla 3.10.17, configured without ipv6 support. The system is a web-server under a very slight load (at the moment).
CONFIG_NO_HZ_IDLE=y
# CONFIG_IPV6 is not set
Till yesterday I was running 3.10.12 and have never seen these symptoms. Since I updated to 3.10.17, there are 8 occurencies in 1 day. Attached is the excerpt from dmesg output.
Same here, kernel 3.10.10 [3696421.517564] ------------[ cut here ]------------ [3696421.517574] WARNING: at net/ipv4/tcp_input.c:2776 tcp_fastretrans_alert+0xc44/0xc70() [3696421.517575] Modules linked in: [3696421.517578] CPU: 1 PID: 0 Comm: swapper/1 Not tainted 3.10.10-NAS #1 [3696421.517579] Hardware name: System manufacturer System Product Name/P8H77-I, BIOS 1004 06/08/2013 [3696421.517580] ffffffff817addc0 0000000000000000 ffffffff8107855a ffff8800a7dd5b00 [3696421.517582] 000000000000452e 0000000000000000 0000000000000000 0000000000000000 [3696421.517584] ffffffff81677434 9e61baaa00000000 0000000100000286 ffff8800a7dd5b00 [3696421.517586] Call Trace: [3696421.517587] <IRQ> [<ffffffff817addc0>] ? dump_stack+0xd/0x17 [3696421.517594] [<ffffffff8107855a>] ? warn_slowpath_common+0x6a/0xa0 [3696421.517596] [<ffffffff81677434>] ? tcp_fastretrans_alert+0xc44/0xc70 [3696421.517597] [<ffffffff81677e68>] ? tcp_ack+0x958/0xe20 [3696421.517599] [<ffffffff816788f1>] ? tcp_rcv_established+0x271/0x6e0 [3696421.517602] [<ffffffff81681816>] ? tcp_v4_do_rcv+0x156/0x320 [3696421.517605] [<ffffffff817b462a>] ? common_interrupt+0x6a/0x6a [3696421.517607] [<ffffffff81683437>] ? tcp_v4_rcv+0x737/0x750 [3696421.517610] [<ffffffff8163b84d>] ? nf_hook_slow+0xdd/0x120 [3696421.517612] [<ffffffff81660ec0>] ? ip_rcv_finish+0x2b0/0x2b0 [3696421.517614] [<ffffffff81660f58>] ? ip_local_deliver_finish+0x98/0x1d0 [3696421.517616] [<ffffffff8161a162>] ? __netif_receive_skb_core+0x432/0x560 [3696421.517618] [<ffffffff8161a393>] ? process_backlog+0x93/0x160 [3696421.517620] [<ffffffff8161aa51>] ? net_rx_action+0x81/0x130 [3696421.517623] [<ffffffff8107f416>] ? __do_softirq+0xd6/0x1a0 [3696421.517625] [<ffffffff8107f615>] ? irq_exit+0x95/0xa0 [3696421.517628] [<ffffffff8103838b>] ? do_IRQ+0x5b/0xd0 [3696421.517631] [<ffffffff817b462a>] ? common_interrupt+0x6a/0x6a [3696421.517631] <EOI> [<ffffffff8159c066>] ? cpuidle_enter_state+0x56/0xe0 [3696421.517635] [<ffffffff8159c062>] ? cpuidle_enter_state+0x52/0xe0 [3696421.517638] [<ffffffff810b6f68>] ? __tick_nohz_idle_enter+0x368/0x440 [3696421.517640] [<ffffffff8159d3d1>] ? ladder_select_state+0x31/0x1e0 [3696421.517641] [<ffffffff8159c18e>] ? cpuidle_idle_call+0x9e/0x140 [3696421.517644] [<ffffffff8103f499>] ? arch_cpu_idle+0x9/0x30 [3696421.517646] [<ffffffff810ae816>] ? cpu_startup_entry+0x76/0x160 [3696421.517647] ---[ end trace 2e556a8d5507b1af ]--- Created attachment 115481 [details]
another calltrace 3.11 (ubuntu 13.10)
|
Created attachment 107276 [details] Call trace Unfortunately I do not know how to reproduce the bug. Nothing seems to be affected. The only parameters I changed were: net.core.rmem_max = 4194304; net.core.wmem_max = 1048576.