I have oops=panic on /proc/cmdline but kdump wasn't set correctly(crashkernel=128M was too low, 256M was needed) so I don't know why the panic happened, however on next boot I read the previous boot's last log lines(via `journalctl -b -1`) which had these lines: ``` May 27 18:22:00 i87k kernel: nf_ct_proto_6: invalid packet ignored in state ESTABLISHED IN= OUT= SRC=151.101.112.133 DST=192.168.0.63 LEN=60 TOS=0x00 PREC=0x20 TTL=55 ID=0 DF PROTO=TCP SPT=443 DPT=55408 SEQ=1298638150 ACK=6832714 WINDOW=27320 RES=0x00 ACK SYN URGP=0 OPT (020405620402080A35735DFC2CCE4B1E01030309) May 27 18:22:00 i87k kernel: nf_ct_proto_6: invalid packet ignored in state ESTABLISHED IN= OUT= SRC=151.101.112.133 DST=192.168.0.63 LEN=60 TOS=0x00 PREC=0x20 TTL=55 ID=0 DF PROTO=TCP SPT=443 DPT=55406 SEQ=2698278138 ACK=1746385854 WINDOW=27320 RES=0x00 ACK SYN URGP=0 OPT (020405620402080A6A2DE5892CCE4B1E01030309) May 27 18:22:00 i87k kernel: nf_ct_proto_6: invalid packet ignored in state ESTABLISHED IN= OUT= SRC=151.101.112.133 DST=192.168.0.63 LEN=60 TOS=0x00 PREC=0x20 TTL=55 ID=0 DF PROTO=TCP SPT=443 DPT=55430 SEQ=3980092654 ACK=3018519461 WINDOW=27320 RES=0x00 ACK SYN URGP=0 OPT (020405620402080A0904C1332CCE4C1701030309) ``` so the kernel panic happened seconds(or maybe minutes? depending on how much was lost due to no sync?) This is either because the packets were malformed(intentionally?) or more likely they were resent due to congestion? since that github IP (151.101.112.133) was losing about 40% of the packets (via `ping`) at the time. I've gathered some info since the crash happened here: https://gist.github.com/howaboutsynergy/c69f4a44ad10f7cce48c1544266e43f6 If I'm encountering this issue again I will have a crash dump and thus stacktrace ... But until then, is there some way I can test if resending the same(?) packets(due to congestion, possibly) could cause kernel to oops(or panic) ? Maybe `CONFIG_NET_PKTGEN` ? I've never used it before! I'll have a read `Documentation/networking/pktgen.txt` I'm on ArchLinux and was using kernel stable 5.1.5-g835365932f0d, but not a pristine one though...
the commit for this message `nf_ct_proto_6: invalid packet ignored in state ESTABLISHED` is from 2012: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=1a4ac9870fb82eed56623d0f69ec59aa5bef85fe attempting to add those people as CC, maybe they've some ideas?
I can't use `CONFIG_NET_PKTGEN` because it only deals with UDP packets. I'd need something to emulate congestion... maybe some firewall rules that drops packets ? hmm...
I tried using some `iptables` rules with `hashlimit` to drop or accept TCP packets in `ESTABLISHED` state to cause the source to resend them... it didn't work as well as I expected, but I wasn't able to reproduce the issue(or any of the dmesg messages) but I did get a 260K file to download in like 3mins. Oh well, I just hope that whatever caused the kernel oops(or panic) doesn't allow for anything worse(like RCE) than just that. Github Support seems content with the fact that I'm no longer experiencing the slowdown or packet loss. No word on what might've caused those dmesg messages or anything else really. I won't be trying anything else to reproduce this. But if it happens again, my kdump[1](https://github.com/howaboutsynergy/q1q/blob/9f656ef9f31f227cc0951f7dc06761b660a56fdd/OSes/archlinux/etc/systemd/system/kdump.service.hostspecific%3Di87k#L1)[2](https://github.com/howaboutsynergy/q1q/blob/9f656ef9f31f227cc0951f7dc06761b660a56fdd/OSes/archlinux/etc/systemd/system/kdump-save.service#L1) should be ready to catch it and then I'll update this.
Created attachment 283021 [details] some traffic shaping script I tried using this traffic shaping script but still couldn't reproduce the issue with it! I guess whatever happened it has to be done upstream of me, and lose like 30% of packets or something, in order to have any chance of reproducing this.
I got the crash dump! System froze again(with corrupted screen) just like the first time and this is the info: looks like it was caused by `rustc` (just like the first time!) while compiling itself... ``` $ crash_kernel_read Not already root, re-executing myself as root by using sudo(required!)... /usr/bin/makedumpfile is owned by makedumpfile 1.6.5-1 crash 7.2.6 Copyright (C) 2002-2019 Red Hat, Inc. Copyright (C) 2004, 2005, 2006, 2010 IBM Corporation Copyright (C) 1999-2006 Hewlett-Packard Co Copyright (C) 2005, 2006, 2011, 2012 Fujitsu Limited Copyright (C) 2006, 2007 VA Linux Systems Japan K.K. Copyright (C) 2005, 2011 NEC Corporation Copyright (C) 1999, 2002, 2007 Silicon Graphics, Inc. Copyright (C) 1999, 2000, 2001, 2002 Mission Critical Linux, Inc. This program is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Enter "help copying" to see the conditions. This program has absolutely no warranty. Enter "help warranty" for details. GNU gdb (GDB) 7.6 Copyright (C) 2013 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html> This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. Type "show copying" and "show warranty" for details. This GDB was configured as "x86_64-unknown-linux-gnu"... WARNING: kernel relocated [720MB]: patching 75391 gdb minimal_symbol values KERNEL: /usr/lib/modules/5.1.7-g2f7d9d47575e/build/vmlinux DUMPFILE: /var/crash/crashdump-2019-06-08-09:08:23 [PARTIAL DUMP] CPUS: 6 DATE: Sat Jun 8 09:07:50 2019 UPTIME: 00:40:33 LOAD AVERAGE: 4.74, 5.80, 5.45 TASKS: 728 NODENAME: i87k RELEASE: 5.1.7-g2f7d9d47575e VERSION: #31 SMP Fri Jun 7 00:10:52 CEST 2019 MACHINE: x86_64 (3700 Mhz) MEMORY: 31.9 GB PANIC: "Oops: 0000 [#1] SMP PTI" (check log for details) PID: 25124 COMMAND: "rustc" TASK: ffff968413869e40 [THREAD_INFO: ffff968413869e40] CPU: 2 STATE: TASK_RUNNING (PANIC) crash> bt PID: 25124 TASK: ffff968413869e40 CPU: 2 COMMAND: "rustc" #0 [ffffb6eecf1a7660] machine_kexec at ffffffffae03400b #1 [ffffb6eecf1a76a8] __crash_kexec at ffffffffae124bd8 #2 [ffffb6eecf1a7770] crash_kexec at ffffffffae125a08 #3 [ffffb6eecf1a7788] oops_end at ffffffffae011866 #4 [ffffb6eecf1a77a8] no_context at ffffffffae03bdb7 #5 [ffffb6eecf1a7848] do_page_fault at ffffffffae03c7bb #6 [ffffb6eecf1a7870] page_fault at ffffffffae800dfe [exception RIP: compaction_alloc+1339] RIP: ffffffffae1b47eb RSP: ffffb6eecf1a7928 RFLAGS: 00010286 RAX: 0000000000000001 RBX: ffffb6eecf1a7b00 RCX: 0000000000000001 RDX: 80000000000ffe00 RSI: 0000000000000000 RDI: 000000000000003c RBP: 80000000000ffe00 R8: 0000000000000000 R9: 0000000000000034 R10: ffffe632c3ff8000 R11: ffffb6eecf1a7980 R12: 8000000000100000 R13: 0000000000000001 R14: ffffe632c3ff8000 R15: ffff96850dfded00 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0000 #7 [ffffb6eecf1a79c8] migrate_pages at ffffffffae1fa9a2 #8 [ffffb6eecf1a7a48] compact_zone at ffffffffae1b62f0 #9 [ffffb6eecf1a7ae8] compact_zone_order at ffffffffae1b67c3 #10 [ffffb6eecf1a7ba8] try_to_compact_pages at ffffffffae1b6fa4 #11 [ffffb6eecf1a7bf8] __alloc_pages_direct_compact at ffffffffae193f72 #12 [ffffb6eecf1a7c50] __alloc_pages_slowpath at ffffffffae1945f0 #13 [ffffb6eecf1a7d40] __alloc_pages_nodemask at ffffffffae194fa8 #14 [ffffb6eecf1a7da0] do_huge_pmd_anonymous_page at ffffffffae1fd37c #15 [ffffb6eecf1a7df0] __handle_mm_fault at ffffffffae1c0e0d #16 [ffffb6eecf1a7ea0] handle_mm_fault at ffffffffae1c152f #17 [ffffb6eecf1a7ec8] __do_page_fault at ffffffffae03c522 #18 [ffffb6eecf1a7f28] do_page_fault at ffffffffae03c7bb #19 [ffffb6eecf1a7f50] page_fault at ffffffffae800dfe RIP: 00007cd640afabbc RSP: 00007cd633fedc50 RFLAGS: 00010206 RAX: 00007cd5b02fe020 RBX: 00007cd602d01020 RCX: 00007cd5b00fe010 RDX: 00000000001fffff RSI: 0000000000d02000 RDI: 0000000000000010 RBP: 000000000000ffff R8: 0000000000000067 R9: 00007cd601fff010 R10: 0000000000000000 R11: 0000000000000246 R12: 00007cd5fd1f81b0 R13: 00007cd602d01020 R14: 00000000fffffdd2 R15: 00007cd601f67120 ORIG_RAX: ffffffffffffffff CS: 0033 SS: 002b crash> ``` ``` ... [ 2389.137459] i2c i2c-1: NAK from device addr 0x50 msg #0 [ 2389.141642] i2c i2c-3: NAK from device addr 0x50 msg #0 [ 2432.660116] gpg-agent[1125]: handler 0x74e5c195d700 for fd 10 started [ 2432.730809] gpg-agent[1125]: handler 0x74e5c195d700 for fd 10 terminated [ 2434.126414] BUGGY: unable to handle kernel paging request at ffffe632c3ff8030 [ 2434.126415] #PF error: [normal kernel read fault] [ 2434.126416] PGD 82dfd5067 P4D 82dfd5067 PUD 82dfd4067 PMD 0 [ 2434.126418] Oops: 0000 [#1] SMP PTI [ 2434.126419] CPU: 2 PID: 25124 Comm: rustc Kdump: loaded Tainted: G U 5.1.7-g2f7d9d47575e #31 [ 2434.126420] Hardware name: System manufacturer System Product Name/PRIME Z370-A, BIOS 1002 07/02/2018 [ 2434.126422] RIP: 0010:compaction_alloc+0x53b/0x890 [ 2434.126423] Code: 1f 41 83 c5 01 4c 39 f5 0f 82 5e 01 00 00 4c 89 34 24 eb 76 49 89 ea 49 c1 e2 06 4c 03 15 75 f3 d0 00 4d 89 d6 4d 85 f6 74 44 <41> 8b 46 30 25 80 00 00 f0 3d 00 00 00 f0 0f 84 ff 00 00 00 80 7b [ 2434.126423] RSP: 0000:ffffb6eecf1a7928 EFLAGS: 00010286 [ 2434.126424] RAX: 0000000000000001 RBX: ffffb6eecf1a7b00 RCX: 0000000000000001 [ 2434.126425] RDX: 80000000000ffe00 RSI: 0000000000000000 RDI: 000000000000003c [ 2434.126425] RBP: 80000000000ffe00 R08: 0000000000000000 R09: 0000000000000034 [ 2434.126426] R10: ffffe632c3ff8000 R11: ffffb6eecf1a7980 R12: 8000000000100000 [ 2434.126426] R13: 0000000000000001 R14: ffffe632c3ff8000 R15: ffff96850dfded00 [ 2434.126427] FS: 00007cd633fff700(0000) GS:ffff9684eda80000(0000) knlGS:0000000000000000 [ 2434.126428] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 2434.126428] CR2: ffffe632c3ff8030 CR3: 00000007ae5a8005 CR4: 00000000003606e0 [ 2434.126429] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 2434.126429] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 2434.126430] Call Trace: [ 2434.126432] migrate_pages+0x112/0xa00 [ 2434.126433] ? isolate_freepages_block+0x330/0x330 [ 2434.126434] ? move_freelist_tail+0xd0/0xd0 [ 2434.126435] compact_zone+0x6b0/0xab0 [ 2434.126436] compact_zone_order+0xd3/0x110 [ 2434.126437] ? psi_task_change+0xe2/0x210 [ 2434.126438] try_to_compact_pages+0x164/0x220 [ 2434.126439] __alloc_pages_direct_compact+0x82/0x170 [ 2434.126440] __alloc_pages_slowpath+0x430/0xb70 [ 2434.126441] __alloc_pages_nodemask+0x278/0x2c0 [ 2434.126442] do_huge_pmd_anonymous_page+0x12c/0x5e0 [ 2434.126444] __handle_mm_fault+0xbed/0x1250 [ 2434.126445] handle_mm_fault+0xbf/0x1e0 [ 2434.126446] __do_page_fault+0x242/0x490 [ 2434.126448] ? page_fault+0x8/0x30 [ 2434.126449] do_page_fault+0x1b/0x5e [ 2434.126449] page_fault+0x1e/0x30 [ 2434.126450] RIP: 0033:0x7cd640afabbc [ 2434.126451] Code: 74 dc 41 f7 d6 eb 31 49 c1 e8 39 48 8d 46 f0 48 21 d0 44 88 04 31 44 88 44 01 10 48 8b 44 24 40 48 c1 e6 05 c4 c1 7e 6f 45 00 <c5> fe 7f 04 30 4c 89 cd 66 45 85 f6 74 a6 48 8b 7c 24 60 c5 f8 77 [ 2434.126452] RSP: 002b:00007cd633fedc50 EFLAGS: 00010206 [ 2434.126452] RAX: 00007cd5b02fe020 RBX: 00007cd602d01020 RCX: 00007cd5b00fe010 [ 2434.126453] RDX: 00000000001fffff RSI: 0000000000d02000 RDI: 0000000000000010 [ 2434.126453] RBP: 000000000000ffff R08: 0000000000000067 R09: 00007cd601fff010 [ 2434.126454] R10: 0000000000000000 R11: 0000000000000246 R12: 00007cd5fd1f81b0 [ 2434.126454] R13: 00007cd602d01020 R14: 00000000fffffdd2 R15: 00007cd601f67120 [ 2434.126455] Modules linked in: xt_comment msr xt_TCPMSS iptable_mangle iptable_security iptable_nat nf_nat iptable_raw nf_log_ipv4 nf_log_common xt_owner xt_LOG xt_connlimit nf_conncount xt_conntrack nf_conntrack nf_defrag_ipv4 xt_hashlimit xt_multiport xt_addrtype snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic intel_rapl x86_pkg_temp_thermal intel_powerclamp coretemp i915 crct10dif_pclmul crc32_pclmul crc32c_intel i2c_algo_bit drm_kms_helper snd_hda_intel snd_hda_codec syscopyarea sysfillrect sysimgblt ghash_clmulni_intel snd_hwdep fb_sys_fops iTCO_wdt intel_cstate snd_hda_core drm iTCO_vendor_support intel_uncore snd_pcm intel_rapl_perf snd_timer pcspkr mei_me mq_deadline snd drm_panel_orientation_quirks e1000e soundcore i2c_i801 mei xhci_pci xhci_hcd [ 2434.126466] CR2: ffffe632c3ff8030 ``` What other info should I provide?
the `BUGGY` text is from: 2600_whichbug.patch:+ pr_alert("BUGGY: unable to handle kernel %s at %px\n", more info here: https://gist.github.com/howaboutsynergy/c69f4a44ad10f7cce48c1544266e43f6#gistcomment-2938304
I'll make a new bug since it's unrelated to what I thought(net packets) done as: https://bugzilla.kernel.org/show_bug.cgi?id=203849