Created attachment 280139 [details] iptables-script i triggered something similar as https://www.spinics.net/lists/netdev/msg533254.html recently with 4.19.12-200.fc28.x86_64 by just call my "iptables.sh" when connlimt and some otehr stuff is in place and was triggered "iptables -t filter -P INPUT DROP" but i guess that should not happen and may point out a general problem explaining the random instability of the whole 4.19.x series on most of my setups every single machine using "connlimit" freezes randomly and with fedora kernel kernel-core-4.19.12-200.fc28.x86_64 it seems to got the first time more or less easy to redproduce with a simple iptables script that don't completly explain what happens when machines crash after hours but it's a fact that only machines using iptables connlimit are affected by random crashes see shellscript, stacktrace and files of "/etc/sysconfig/iptables" and "/etc/ipset/ipset" * ssh port is 10022 on the vmare guest * execute "iptables-debug.sh" * fire some connections to port 10022 while ssh is still connected * call "iptables-debug.sh" repeatly Fedora 4.20.0 build 4.20.0-1.fc30.x86_64 has the same problem [Tue Dec 25 10:36:09 2018] RIP: 0010:rb_erase+0x216/0x370 [Tue Dec 25 10:36:09 2018] Code: e9 6b fe ff ff 4d 89 48 10 e9 91 fe ff ff c3 48 89 06 48 89 d0 48 8b 52 10 e9 b1 fe ff ff 48 8b 07 48 89 c1 48 83 e1 fc 74 53 <48> 3b 79 10 0f 84 94 00 00 00 4c 89 41 08 4d 85 c0 75 4c a8 01 0f [Tue Dec 25 10:36:09 2018] RSP: 0018:ffffa012c1533d28 EFLAGS: 00010282 [Tue Dec 25 10:36:09 2018] RAX: 92c28cd212bb14de RBX: ffff8a61e5b88c00 RCX: 92c28cd212bb14dc [Tue Dec 25 10:36:09 2018] RDX: 0000000000000000 RSI: ffff8a61e2f5c3b0 RDI: ffff8a61e5b88c00 [Tue Dec 25 10:36:09 2018] RBP: ffff8a61e595ad68 R08: 0000000000000000 R09: ffffffffc01d73de [Tue Dec 25 10:36:09 2018] R10: ffff8a61ddc3e000 R11: 0000000000000001 R12: ffff8a61e2f5c3b0 [Tue Dec 25 10:36:09 2018] R13: ffff8a61e2f5c808 R14: ffff8a61e2f5c000 R15: ffff8a61e5b88c20 [Tue Dec 25 10:36:09 2018] FS: 00007f0584f23740(0000) GS:ffff8a61e5e00000(0000) knlGS:0000000000000000 [Tue Dec 25 10:36:09 2018] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [Tue Dec 25 10:36:09 2018] CR2: 00005572afa8d000 CR3: 000000001eef8004 CR4: 00000000001606f0 [Tue Dec 25 10:36:09 2018] Call Trace: [Tue Dec 25 10:36:09 2018] nf_conncount_destroy+0x58/0xc0 [nf_conncount] [Tue Dec 25 10:36:09 2018] cleanup_match+0x45/0x70 [Tue Dec 25 10:36:09 2018] cleanup_entry+0x3e/0xc0 [Tue Dec 25 10:36:09 2018] __do_replace+0x1ca/0x230 [Tue Dec 25 10:36:09 2018] do_ipt_set_ctl+0x146/0x1a2 [Tue Dec 25 10:36:09 2018] nf_setsockopt+0x44/0x70 [Tue Dec 25 10:36:09 2018] __sys_setsockopt+0x82/0xe0 [Tue Dec 25 10:36:09 2018] __x64_sys_setsockopt+0x20/0x30 [Tue Dec 25 10:36:09 2018] do_syscall_64+0x5b/0x160 [Tue Dec 25 10:36:09 2018] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [Tue Dec 25 10:36:09 2018] RIP: 0033:0x7f0583e4e4ea the kernel should *really* be able to present the stacktrace also on machines which are running a desktop environment instead a freezed screen with no informations and the stacktraces should repeat the entry point at the bottom because it's not helpful when you get a screenshot from "VMware HA" before a guest-system was hard restarted and every relevant info is far above the viewport
Duplicate of Bug 202013
Patches for 4.20 has been posted: https://patchwork.ozlabs.org/project/netfilter-devel/list/?series=83718
If you use the above url from Comment 2, you have to click the minus sign on the right of: Archived = No State = Action Required otherwise you won't see any patches. Or just use this url, which already done so: https://patchwork.ozlabs.org/project/netfilter-devel/list/?series=83718&archive=both&state=* Note: patches are already part of v5.0-rc1 kernel.
ok the bugzilla comment system took away the "=*" from the above url, so you either have to manually add it, or just don't forget to click the minus sign on the right of "State = Action Required"
hopefully there is soon a 4.19 update, given that the kernel don#t show anything when you are running KDE it was impossible to guess where the random freezes are coming from and on firewall devices you need to stick with a no longer supported 4.18.20 for way too long which is a really bad kernel because it was a mistake to backport the spectre regressions and then switch to EOL while there was also the filesystem corruption problem known :-(
changelog of 4.20.1 don't look like this is fixed and my hope for 4.19.14 which isexpected some is hence not very high - terrible https://cdn.kernel.org/pub/linux/kernel/v4.x/ChangeLog-4.20.1
WTF - still not fixed in any stable kernel? https://cdn.kernel.org/pub/linux/kernel/v4.x/ChangeLog-4.19.16 https://cdn.kernel.org/pub/linux/kernel/v4.x/ChangeLog-4.20.3
Also following this. This is impacting my ability to run 4.19.x or newer kernels with connection rate limiting on my webservers. Absolutely a P1 bug for me.
well, Fedora 4.19.16 build done, first it look despite to the changelog from kernel.org not mention it that it's no longer a problem in my nested VM setup which means not more than "no stacktrace after one connection and calling iptable.sh) on the host machine removed the comment from this two lines # $IPTABLES -t filter -A RATELIMIT -p tcp -m connlimit --connlimit-above 50 --connlimit-mask 32 -j DROP # $IPTABLES -t filter -A RATELIMIT -p tcp -m connlimit --connlimit-above 150 --connlimit-mask 24 -j DROP it took a few hours and the machine freezed again as it did from the first moment i tried a 4.19.2 kernel months ago https://patchwork.ozlabs.org/project/netfilter-devel/list/?state=* there is a patchset with state "Accepted" from 2018-12-29 are you frankly kidding that combined with the filesystem corruption bug which is in the meantime fixed qualifies 4.19 as LTS kernel? this by far the worst kernel over the last 5 years where a RAID10 scrub again and again randomly freezed machines
terrible when you have to stick on 4.18.20 in such setups which also got the in the meantime replaced spectre fixes backported before 4.18 went EOL which was also a bad idea [root@firewall:~]$ firewall_status | grep conn 1 298K 15M RETURN all -- * * 0.0.0.0/0 0.0.0.0/0 #conn src/24 > 15 1 0 0 LD_C_HST all -- * * 0.0.0.0/0 0.0.0.0/0 #conn src/32 > 50 1 104 5376 LD_C_HST all -- * * 0.0.0.0/0 0.0.0.0/0 #conn src/32 > 50 1 4508 229K LD_C_HST all -- * * 0.0.0.0/0 0.0.0.0/0 #conn src/32 > 50 1 226 11780 LD_C_HST all -- * * 0.0.0.0/0 0.0.0.0/0 #conn src/32 > 50 1 0 0 LD_C_HST all -- * * 0.0.0.0/0 0.0.0.0/0 #conn src/32 > 50 1 5 300 LD_C_HST all -- * * 0.0.0.0/0 0.0.0.0/0 #conn src/32 > 50 1 11 660 LD_C_HST all -- * * 0.0.0.0/0 0.0.0.0/0 #conn src/32 > 50 1 1706 89668 LD_C_HST all -- * * 0.0.0.0/0 0.0.0.0/0 #conn src/32 > 50 1 55164 4292K LD_C_HST all -- * * 0.0.0.0/0 0.0.0.0/0 #conn src/32 > 50 1 0 0 LD_C_HST all -- * * 0.0.0.0/0 0.0.0.0/0 #conn src/32 > 50 1 0 0 LD_C_HST all -- * * 0.0.0.0/0 0.0.0.0/0 #conn src/32 > 50 1 0 0 LD_C_HST all -- * * 0.0.0.0/0 0.0.0.0/0 #conn src/32 > 50 1 0 0 LD_C_HST all -- * * 0.0.0.0/0 0.0.0.0/0 #conn src/32 > 50 1 0 0 LD_C_HST all -- * * 0.0.0.0/0 0.0.0.0/0 #conn src/32 > 50 1 6 360 LD_C_HST all -- * * 0.0.0.0/0 0.0.0.0/0 #conn src/32 > 50 1 25 1500 LD_C_HST all -- * * 0.0.0.0/0 0.0.0.0/0 #conn src/32 > 50 1 0 0 LD_C_HST all -- * * 0.0.0.0/0 0.0.0.0/0 #conn src/32 > 50 1 11 660 LD_C_HST all -- * * 0.0.0.0/0 0.0.0.0/0 #conn src/32 > 50 1 0 0 LD_C_HST all -- * * 0.0.0.0/0 0.0.0.0/0 #conn src/32 > 50 1 0 0 LD_C_HST all -- * * 0.0.0.0/0 0.0.0.0/0 #conn src/32 > 50 4 369 22140 LD_C_32 all -- * * 0.0.0.0/0 0.0.0.0/0 #conn src/32 > 120 5 196K 9465K LD_C_24 all -- * * 0.0.0.0/0 0.0.0.0/0 #conn src/24 > 250 6 3807 194K LD_C_16 all -- * * 0.0.0.0/0 0.0.0.0/0 #conn src/16 > 500 1 910 58240 LD_C_MAIL tcp -- * * 0.0.0.0/0 0.0.0.0/0 multiport dports 25,465,587,110,143,993,995 #conn src/32 > 75 1 2603 133K LD_C_MX tcp -- * * 0.0.0.0/0 0.0.0.0/0 tcp dpt:25 #conn src/32 > 10
Looks to be in 4.20.5 and 4.19.18 https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/log/?h=v4.20.5&qt=grep&q=conncount https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/log/?h=v4.19.18&qt=grep&q=conncount Will begin a download, test and run now.
No issues so far with 4.20.5 after 8 days. Issue resolved for me.