Bug 195835 - regression: tcp_fastretrans_alert
Summary: regression: tcp_fastretrans_alert
Status: NEW
Alias: None
Product: Networking
Classification: Unclassified
Component: IPV4 (show other bugs)
Hardware: x86-64 Linux
: P1 high
Assignee: Stephen Hemminger
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2017-05-21 08:53 UTC by nanericwang
Modified: 2017-08-21 23:57 UTC (History)
7 users (show)

See Also:
Kernel Version: 4.11.2
Subsystem:
Regression: Yes
Bisected commit-id:


Attachments
dmesg output (11.14 KB, text/plain)
2017-05-22 14:39 UTC, Oleksandr Natalenko
Details

Description nanericwang 2017-05-21 08:53:17 UTC
The issue can be reproduced in 4.11 running netfilter and Transmission BitTorrent. However 4.10.x has no such issue.

[May21 16:38] ------------[ cut here ]------------
[  +0.000044] WARNING: CPU: 3 PID: 0 at net/ipv4/tcp_input.c:2819 tcp_fastretrans_alert+0x7ff/0xa10
[  +0.000006] Modules linked in: xt_TPROXY iptable_mangle xt_owner xt_REDIRECT nf_nat_redirect xt_set af_packet iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack iptable_filter ip_set_hash_net i
[  +0.000100] CPU: 3 PID: 0 Comm: swapper/3 Not tainted 4.11.2 #2
[  +0.000004] Hardware name: BIOSTAR Group A68N-5000/A68N-5000, BIOS 4.6.5 04/22/2014
[  +0.000002] Call Trace:
[  +0.000007]  <IRQ>
[  +0.000010]  ? dump_stack+0x46/0x5e
[  +0.000009]  ? __warn+0xb4/0xd0
[  +0.000006]  ? tcp_fastretrans_alert+0x7ff/0xa10
[  +0.000006]  ? tcp_ack+0xe95/0x1448
[  +0.000009]  ? __ip_flush_pending_frames.isra.51+0x70/0x70
[  +0.000007]  ? tcp_rcv_established+0x1ae/0x6b0
[  +0.000007]  ? tcp_v4_do_rcv+0x12b/0x1e8
[  +0.000006]  ? tcp_v4_rcv+0x7c4/0x880
[  +0.000011]  ? nf_nat_ipv4_fn+0x53/0x150 [nf_nat_ipv4]
[  +0.000007]  ? ip_local_deliver_finish+0x49/0x110
[  +0.000006]  ? ip_local_deliver+0x66/0xd8
[  +0.000006]  ? inet_del_offload+0x30/0x30
[  +0.000008]  ? ip_sabotage_in+0x26/0x30 [br_netfilter]
[  +0.000009]  ? nf_hook_slow+0x21/0xa0
[  +0.000005]  ? ip_rcv+0x2e8/0x370
[  +0.000006]  ? ip_local_deliver_finish+0x110/0x110
[  +0.000008]  ? __netif_receive_skb_core+0x363/0x700
[  +0.000007]  ? netif_receive_skb_internal+0x2a/0x98
[  +0.000014]  ? br_pass_frame_up+0x60/0xd8 [bridge]
[  +0.000012]  ? br_handle_frame_finish+0x132/0x330 [bridge]
[  +0.000015]  ? nf_ct_get_tuple+0x56/0x90 [nf_conntrack]
[  +0.000011]  ? br_pass_frame_up+0xd8/0xd8 [bridge]
[  +0.000007]  ? br_nf_hook_thresh+0xa7/0xb8 [br_netfilter]
[  +0.000009]  ? ipt_do_table+0x2b5/0x3e0 [ip_tables]
[  +0.000007]  ? br_nf_pre_routing_finish+0x173/0x2f8 [br_netfilter]
[  +0.000010]  ? br_pass_frame_up+0xd8/0xd8 [bridge]
[  +0.000007]  ? nf_nat_ipv4_in+0x23/0x70 [nf_nat_ipv4]
[  +0.000007]  ? br_nf_pre_routing+0x2c4/0x420 [br_netfilter]
[  +0.000007]  ? br_nf_forward_arp+0x250/0x250 [br_netfilter]
[  +0.000007]  ? nf_hook_slow+0x21/0xa0
[  +0.000010]  ? br_handle_frame+0x1ae/0x2c8 [bridge]
[  +0.000010]  ? br_pass_frame_up+0xd8/0xd8 [bridge]
[  +0.000010]  ? br_handle_local_finish+0x38/0x38 [bridge]
[  +0.000006]  ? __netif_receive_skb_core+0x1d7/0x700
[  +0.000006]  ? netif_receive_skb_internal+0x2a/0x98
[  +0.000006]  ? napi_gro_receive+0x6a/0xb0
[  +0.000012]  ? rtl8169_poll+0x2c9/0x690 [r8169]
[  +0.000007]  ? net_rx_action+0x1c6/0x2b8
[  +0.000007]  ? __do_softirq+0xd8/0x200
[  +0.000005]  ? irq_exit+0xa3/0xa8
[  +0.000007]  ? do_IRQ+0x45/0xc0
[  +0.000009]  ? common_interrupt+0x7f/0x7f
[  +0.000003]  </IRQ>
[  +0.000013]  ? acpi_idle_do_entry+0x2b/0xed0 [processor]
[  +0.000007]  ? acpi_idle_enter+0xe1/0x288 [processor]
[  +0.000009]  ? cpuidle_enter_state+0x128/0x1f0
[  +0.000008]  ? do_idle+0x17a/0x1d8
[  +0.000006]  ? cpu_startup_entry+0x68/0x70
[  +0.000009]  ? start_secondary+0x138/0x158
[  +0.000005]  ? start_cpu+0x14/0x14
[  +0.000008] ---[ end trace ab2d840ff39c9793 ]---
Comment 1 Oleksandr Natalenko 2017-05-22 14:39:29 UTC
Created attachment 256665 [details]
dmesg output

Confirming with v4.11.x (checked with .0 and .2).
Comment 2 Oleksandr Natalenko 2017-05-22 19:15:41 UTC
If it helps, here is what was altered in sysctl by me:

===
ยป cat /etc/sysctl.d/*
net.ipv4.ip_nonlocal_bind = 1
net.ipv4.ip_local_port_range = 1026 59999
net.ipv4.ip_forward = 1
net.ipv6.conf.all.forwarding = 1
net.ipv6.route.max_size = 16384
net.ipv4.ip_dynaddr = 1
net.ipv4.tcp_mtu_probing = 1
net.ipv4.tcp_timestamps = 0
net.ipv4.tcp_congestion_control = bbr
net.ipv4.tcp_fastopen = 3
net.ipv4.tcp_low_latency = 1
net.ipv4.tcp_fin_timeout = 10
net.ipv4.tcp_tw_reuse = 1
net.ipv4.tcp_slow_start_after_idle = 0
net.ipv4.tcp_rmem = 4096 262143 4194304
net.ipv4.tcp_wmem = 4096 262143 4194304
net.ipv4.tcp_keepalive_time = 300
net.ipv4.tcp_keepalive_intvl = 60
net.ipv4.tcp_keepalive_probes = 3
net.ipv4.tcp_fin_timeout = 10
net.ipv4.tcp_retries2 = 5
net.core.rmem_max = 4194304
net.core.rmem_default = 262143
net.core.wmem_max = 4194304
net.core.wmem_default = 262143
net.core.bpf_jit_enable = 1
net.ipv4.tcp_ecn = 1
vm.min_free_kbytes = 131072
===
Comment 3 David Karlsson 2017-06-17 12:57:15 UTC
Since this bug is still present in 4.11.5 I figure I'll help with the bug reporting.

------------[ cut here ]------------
WARNING: CPU: 0 PID: 373 at net/ipv4/tcp_input.c:2820 tcp_fastretrans_alert+0x83f/0xa80
Modules linked in: ip6table_nat nf_nat_ipv6 ip6t_REJECT nf_reject_ipv6 ip6table_mangle xt_recent ipt_REJECT nf_reject_ipv4 xt_comment xt_hashlimit xt_addrtype xt_mark xt_nat ip6table_raw xt_multiport nf_conntrack_ipv6
 iptable_mangle iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack libcrc32c crc32c_generic btrfs xor adt7475 hwmon_vid gpio_ich iTCO_wdt evdev iTCO_vendor_support mac_hid raid6_pq nouveau psm
CPU: 0 PID: 373 Comm: irq/19-enp5s4 Not tainted 4.11.5-2-ck-core2 #1
Hardware name: System manufacturer System Product Name/P5B-Deluxe, BIOS 1238    09/30/2008
Call Trace:
 <IRQ>
 dump_stack+0x63/0x81
 __warn+0xcb/0xf0
 warn_slowpath_null+0x1d/0x20
 tcp_fastretrans_alert+0x83f/0xa80
 tcp_ack+0xdbd/0x1450
 tcp_rcv_established+0x137/0x700
 ? sk_filter_trim_cap+0x7e/0x280
 tcp_v4_do_rcv+0x11d/0x220
 tcp_v4_rcv+0x9aa/0xd20
 ip_local_deliver_finish+0x68/0x200
 ip_local_deliver+0x5c/0x110
 ? inet_del_offload+0x40/0x40
 ip_rcv_finish+0x227/0x400
 ip_rcv+0x258/0x3b0
 ? ip_local_deliver_finish+0x200/0x200
 __netif_receive_skb_core+0x3b6/0xaa0
 ? tcp4_gro_receive+0xf3/0x1d0
 __netif_receive_skb+0x18/0x60
 netif_receive_skb_internal+0x81/0xd0
 napi_gro_receive+0x124/0x160
 skge_poll+0x3c9/0x8d0 [skge]
 net_rx_action+0x12c/0x3a0
 __do_softirq+0xd1/0x317
 ? irq_finalize_oneshot.part.2+0xe0/0xe0
 do_softirq_own_stack+0x1c/0x30
 </IRQ>
 do_softirq.part.4+0x4e/0x50
 __local_bh_enable_ip+0x88/0xa0
 irq_forced_thread_fn+0x59/0x70
 irq_thread+0x12f/0x1c0
 ? wake_threads_waitq+0x30/0x30
 kthread+0x124/0x140
 ? irq_thread_dtor+0xc0/0xc0
 ? kthread_create_on_node+0x70/0x70
 ret_from_fork+0x2c/0x40
---[ end trace 626ef6824790ee27 ]---

And this is what's in my sysctl:
vm.dirty_ratio=40
vm.swappiness=10
net.ipv4.tcp_fin_timeout=15
net.ipv4.tcp_timestamps=1
net.ipv4.tcp_no_metrics_save=1
net.ipv4.tcp_rfc1337=1
net.ipv4.tcp_max_tw_buckets=1440000
net.ipv4.tcp_max_syn_backlog=8096
net.ipv4.tcp_wmem=4096 524288 11960320
net.ipv4.tcp_rmem=4096 524288 11960320
net.ipv4.tcp_mtu_probing=1
net.ipv4.tcp_ecn=1
net.ipv4.tcp_slow_start_after_idle=0
net.ipv4.tcp_tw_reuse=1
net.core.netdev_max_backlog=5000
net.core.somaxconn=1000
net.core.optmem_max=25165824
net.core.rmem_default=262144
net.core.wmem_default=262144
net.core.rmem_max=11960320
net.core.wmem_max=11960320
net.ipv4.udp_rmem_min=8192
net.ipv4.udp_wmem_min=8192
net.ipv4.tcp_synack_retries=2
fs.inotify.max_user_watches = 100000
Comment 4 Gluzskiy Alexandr 2017-07-05 02:42:05 UTC
same problem on 4.11.8

 1441.130236] WARNING: CPU: 0 PID: 0 at net/ipv4/tcp_input.c:2820 tcp_fastretrans_alert+0x8db/0xac0
[ 1441.130388] Modules linked in: netconsole veth ccm sch_pie sctp act_mirred ifb sch_ingress nf_conntrack_netlink nfnetlink cls_u32 sch_sfq sch_htb sit tunnel4 ip_tunnel nf_conntrack_ipv6 nf_defrag_ipv6 ip6table_filter ip6table_raw ip6_tables iptable_raw ipt_MASQUERADE nf_nat_masquerade_ipv4 xt_nat iptable_nat nf_nat_ipv4 nf_nat xt_TCPMSS xt_sctp ipt_REJECT nf_reject_ipv4 nf_conntrack_ipv4 nf_defrag_ipv4 xt_tcpudp xt_recent xt_conntrack nf_conntrack xt_multiport iptable_filter iptable_mangle ip_tables x_tables radeon ath9k led_class ath9k_common ath9k_hw i2c_algo_bit ttm drm_kms_helper mac80211 sch_fq_codel ath cfbfillrect cfg80211 syscopyarea cfbimgblt sysfillrect sysimgblt fb_sys_fops cfbcopyarea drm br_netfilter bridge snd_hda_codec_via snd_hda_codec_generic snd_hda_intel snd_hda_codec stp backlight
[ 1441.131643]  llc rfkill r8169 xhci_pci snd_usb_audio snd_hwdep snd_usbmidi_lib snd_hda_core snd_rawmidi ohci_pci xhci_hcd snd_seq_device ohci_hcd parport_pc vhost_net snd_pcm i2c_piix4 snd_timer tun mii btrfs acpi_cpufreq button asus_atk0110 snd soundcore vhost processor tap xor kvm_amd kvm irqbypass gspca_zc3xx gspca_main v4l2_common k10temp hwmon raid6_pq uvcvideo videobuf2_vmalloc videobuf2_memops videobuf2_v4l2 videodev videobuf2_core i2c_core parport fbcon nfsd auth_rpcgss oid_registry nfs_acl bitblit lockd softcursor fb grace sunrpc fbdev font ipv6 autofs4
[ 1441.148700] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.11.8 #5
[ 1441.152945] Hardware name: System manufacturer System Product Name/M4A77TD, BIOS 2104    06/28/2010
[ 1441.157251] Call Trace:
[ 1441.161496]  <IRQ>
[ 1441.165683]  ? dump_stack+0x46/0x6d
[ 1441.169853]  ? __warn+0xb4/0xe0
[ 1441.174022]  ? tcp_fastretrans_alert+0x8db/0xac0
[ 1441.178238]  ? tcp_ack+0xd58/0x1300
[ 1441.182411]  ? tcp_rcv_established+0xf4/0x6a0
[ 1441.186550]  ? tcp_v4_do_rcv+0x115/0x200
[ 1441.190633]  ? tcp_v4_rcv+0xaef/0xb80
[ 1441.194722]  ? nf_nat_ipv4_fn+0x53/0x1a0 [nf_nat_ipv4]
[ 1441.198818]  ? ip_local_deliver_finish+0x87/0x1e0
[ 1441.202898]  ? ip_local_deliver+0x3d/0xc0
[ 1441.206960]  ? inet_del_offload+0x40/0x40
[ 1441.211011]  ? ip_rcv+0x281/0x380
[ 1441.215058]  ? ip_local_deliver_finish+0x1e0/0x1e0
[ 1441.219141]  ? __netif_receive_skb_core+0x49b/0x9c0
[ 1441.223230]  ? netif_receive_skb_internal+0x1a/0x80
[ 1441.227307]  ? ifb_ri_tasklet+0x16a/0x240 [ifb]
[ 1441.231361]  ? tasklet_action+0x8c/0xa0
[ 1441.235395]  ? __do_softirq+0xd4/0x200
[ 1441.239428]  ? irq_exit+0xe7/0x100
[ 1441.243450]  ? do_IRQ+0x45/0xc0
[ 1441.247472]  ? common_interrupt+0x89/0x89
[ 1441.251506]  </IRQ>
[ 1441.255512]  ? amd_e400_idle+0x1a/0x40
[ 1441.259503]  ? amd_e400_idle+0x18/0x40
[ 1441.263451]  ? do_idle+0x159/0x1a0
[ 1441.267373]  ? cpu_startup_entry+0x58/0x60
[ 1441.271340]  ? start_kernel+0x3d4/0x3dc
[ 1441.275331]  ? start_cpu+0x14/0x14
[ 1441.279504] ---[ end trace fa5d523840b633b2 ]---

sysctl ip related:

net.netfilter.nf_conntrack_tcp_timeout_established = 6000
net.core.rmem_max = 134217728
net.core.rmem_max = 134217728
net.core.optmem_max = 25165824
net.ipv4.tcp_rmem = 4096 87380 67108864
net.ipv4.tcp_wmem = 4096 65536 67108864
net.ipv4.udp_mem = 4096 65536 67108864
net.core.netdev_max_backlog = 250000
net.core.somaxconn = 32000
net.ipv4.tcp_timestamps = 0
net.ipv4.tcp_sack = 1
net.ipv4.tcp_fack = 1
net.ipv4.tcp_window_scaling = 1
net.ipv4.tcp_max_syn_backlog = 65535
net.ipv4.tcp_max_tw_buckets = 1440000
net.ipv4.tcp_tw_reuse = 1
net.ipv4.tcp_tw_recycle = 1
net.ipv4.tcp_fin_timeout = 15
net.ipv4.tcp_keepalive_time = 300
net.ipv4.tcp_keepalive_probes = 3
net.ipv4.tcp_keepalive_intvl = 20
net.ipv4.tcp_fastopen = 1
net.ipv4.tcp_mtu_probing = 1
net.ipv4.tcp_rfc1337 = 1
net.ipv4.tcp_syncookies = 1
net.ipv4.tcp_syn_retries = 2
net.ipv4.tcp_synack_retries = 2
net.ipv4.icmp_ignore_bogus_error_responses = 1
net.ipv4.conf.all.force_igmp_version = 0
net.ipv4.conf.all.accept_redirects = 1
net.ipv4.conf.all.secure_redirects = 1
net.ipv4.conf.eth_net.accept_redirects = 0
net.ipv4.icmp_echo_ignore_broadcasts = 1
net.ipv4.conf.default.rp_filter = 0
net.ipv4.conf.all.rp_filter = 0
net.ipv4.conf.eth_net.rp_filter = 1
net.ipv4.ip_forward = 1
Comment 5 Oleksandr Natalenko 2017-07-07 22:09:30 UTC
Same with v4.12.
Comment 6 Bryan Seitz 2017-07-29 23:11:56 UTC
I too am seeing this on Fedora 26:4.11.11-300.fc26.x86_64

Is there a workaround for this, or is it mostly harmless?

Thanks!
Comment 7 David Karlsson 2017-07-30 17:54:48 UTC
(In reply to Bryan Seitz from comment #6)
> I too am seeing this on Fedora 26:4.11.11-300.fc26.x86_64
> 
> Is there a workaround for this, or is it mostly harmless?
> 
> Thanks!

It causes my system to hang after 10+ hours or so. The only way to avoid it is to either not run Transmission at all or to use an older kernel, like 4.9.
Comment 8 Bryan Seitz 2017-08-21 21:03:45 UTC
Anyone know if this will ever be fixed???
Comment 9 Stephen Hemminger 2017-08-21 23:57:07 UTC
Bugzilla is not used directly for tracking Linux networking bugs.
Bugs reported here are forwarded onto netdev@vger.kernel.org

Note You need to log in before you can comment on or make changes to this bug.