Created attachment 289439 [details] blurry kernel panic :( I get random kernel panic on one of my systems. It is running Arch Linux with Kernel 5.6.15-arch1-1 with an AMD Ryzen 5 2400G and "RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller (rev 15)" (1462:7a40) The machine uses a bridge, vlan and docker. I could only take a photo of the panic as netconsole does not log it. BUG: kernel NULL pointer dereference, address: 00000000000010 #PF supervisor read access in kernel mode #PD: error_code(0x0000) - not-present page PGD 3e65a6067 P4D 3e65a6067 PUD 0 Oops: 0000 [#1] PREEMPT SMP NOPTI CPU: 3 PID: 0 Comm: swapper/3 Not tainted 5.6.15-arch1-1 #1 Hardware name: Micro-Star International Co., ltd. MS-7A40/B450I GAMING PLUS AC (MS-7A40), BIOS A.60 03/06/2019 RIP: 0010:__cgroup_bpf_run_filter_skb+0xd9/0x230 Code: 00 48 01 c8 48 89 43 50 41 83 ff 01 0f 84 c2 00 00 00 e8 ca f2 ec ff e8 85 31 f2 ff 44 89 fa 48 8d 84 d5 30 06 00 00 48 8b 00 <48> 8b 78 10 4c 8d 78 10 48 85 ff 0f 84 29 01 00 00 bd 01 00 00 00 RSP: 0018:ffffad9b402e4758 EFLAGS: 00010206 RAX: 0000000000000000 RBX: ffff93d66b09c900 RCX: 00000000000000f0 RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000001 RBP: ffff93d67abe8000 R08: 0000000000000000 R09: 0000000000000001 R10: 000000000000b302 R11: ffff93d66b09c9d4 R12: 0000000000000014 R13: 0000000000000014 R14: 0000000000000000 R15: 0000000000000000 FS: 0000000000000000(0000) GS: ffff93d6908c0000(0000) knlGS: 0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000010 CR3: 00000003ecfa0000 CR4: 00000000003406e0 Call Trace: <IRQ> sk_filter_trim_cap+0x12f/0x270 ? tcp_v4_inbound_md5_hash+0x5e/0x170 tcp_v4_rcv+0xb18/0xd80 ip_protocol_deliver_rcu+0x2b/0x1e0 ip_local_deliver_finish+0x55&0x70 ip_local_deliver+0x155/0x130 ? ip_protocol_deliver_rcu+0x1e0/0x1e0 ip_sabotage_in+0x5a/0x70 [br_netfilter] nf_hook_slow+0x3f/0xb0 ip_rcv+0xd1/0x110 ? ip_rev_finish_core.constprop.0+0x470/0x470 __netif_receive_skb_one_core+0x80/0x90 netif_receive_skb+0x162/0x1b0 br_pass_frame_up+0xf0/0x1d0 [bridge] ...
Created attachment 289443 [details] next kernel panic with the same output next crash a day later...
I can confirm the same on an Intel NUC running 5.6.15-arch1 Call Trace: <IRQ> sk_filter_trim_cap+0x12f/0x270 ? tcp_v4_inbound_md5_hash+0x5e/0x170 tcp_v4_rcv+0xb18/0xd80 ip_protocol_deliver_rcu+0x2b/0x1e0 ip_local_deliver_finish+0x55/0x70 ip_local_deliver+0x115/0x130 ? ip_protocol_deliver_rcu+0x1e0/0x1e0 ip_sublist_rcv_finish+0x3d/0x50 ip_sublist_rcv+0x182/0x210 ? ip_rev_finish_core.constprop.0+0x470/0x470 ip_list_rcv+0xf6/0x11a __netif_receive_skb_list_core+0x263/0x290 netif_receive_skb_list_internal+0x1c8/0x300 napi_complete_done+0x93/0x140 e1000e_poll+0xca/0x2c0 [e1000e] net_rx_action+0x168/0x400 __do_softirq+0x10f/0x358 irq_exit+0xab/0x120 do_IRQ+0x89/0x140 common_interrupt+0xf/0xf </IRQ> RIP: 0033:0x55a466def20c an other time, RIP was: 0010:cpuidle_enter_state+0xb6/0x420 Computer running 24/7 and crashes about 3-4 times a week. Crashburn does not seem to save logs/core dump. I only have photos of the screen.
I wanted to fill in a few more details. It is an Intel NUC6i3SYB Ethernet interface is the integrated e1000e I219-V / 00:1f.6 Ethernet controller: Intel Corporation Ethernet Connection I219-V (rev 21) let me know if I can help you debug this
Created attachment 289567 [details] screenshot of kernel panic with 5.4.45-1 LTS kernel The behavior can also be observed with recent 5.4.45 LTS kernel.
https://www.spinics.net/lists/netdev/msg658421.html this thread looks related.
I have 3 servers that recently upgraded to kernel 5.6.15. Since the upgrade, two have crashed randomly, one is stable. The two that crashed are running docker and systemd.unified_cgroup_hierarchy=0. Reverting to kernel 5.6.13 resolved the crashes. As above, I have nothing in logs to explain the crashes.
Two reports [1][1] [2][2] that reverting [3][3] resolved the issue. The same commit was introduced in 5.4.42 by [4][4]. [1]: https://bbs.archlinux.org/viewtopic.php?pid=1908840#p1908840 [2]: https://bugs.archlinux.org/task/66824#comment190126 [3]: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=e2d928d5ee43f372618a9f98b0c73674717f2a2c [4]: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=e26e2a3febcb7ad7a93c75314591aef2101d7e47
Possible workaround based on [0] is: cd /usr/lib/systemd/system sed -i.bak 's/^IPAddressDeny/#IPAddressDeny/g' *.service Just posting it also here so that people coming here from search engine can find it here. [0]: https://www.spinics.net/lists/netdev/msg658503.html
Also complete panic from my machine: "BUG: kernel NULL pointer dereference, address: 0000000000000010 "#PF: supervisor read access in kernel mode "#PF: error_code(0x0000) - not-present page PGD 0 P4D 0 ) Oops: 0000 [#1] SMP PTI CPU: 1 PID: 0 Comm: swapper/1 Kdump: loaded Tainted: G U 5.4.44-1-lts #1 Hardware name: ASUSTeK COMPUTER INC. P10S WS/P10S WS, BIOS 3402 07/12/2018 RIP: 0010:__cgroup_bpf_run_filter_skb+0xca/0x1d0# Code: 48 89 83 c8 00 00 00 48 01 c8 48 89 14 24 48 89 43 50 41 83 ff 01 0f 84 92 00 00 00 44 89 fa 48 8d 84 d6 30 06 00 00 48 8b 00 <48> 8b 78 10 4c 8d 78 10 48 85 ff 0f 84 df 00 00 00 bd 01 00 00 00 RSP: 0000:ffffc9000003cc10 EFLAGS: 00010297 RAX: 0000000000000000 RBX: ffff8882a6c16300 RCX: 0000000000000114 RDX: 0000000000000000 RSI: ffff888820bec000 RDI: ffff8881ec9af1c0 RBP: ffff8882a6c16300 R08: 0000000000000000 R09: 0000000000000001 R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000014 R13: 0000000000000014 R14: ffff88882054e862 R15: 0000000000000000 FS: 0000000000000000(0000) GS:ffff88884ea80000(0000) knlGS:0000000000000000rK CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000010 CR3: 0000000214c0a005 CR4: 00000000003606e0 Call Trace: <IRQ> sk_filter_trim_cap+0x123/0x220 ? tcp_v4_inbound_md5_hash+0x56/0x170 tcp_v4_rcv+0x990/0xbc0 ip_protocol_deliver_rcu+0x2b/0x1b0 ip_local_deliver_finish+0x44/0x50 ip_local_deliver+0xf5/0x100[ ? ip_protocol_deliver_rcu+0x1b0/0x1b0 ip_rcv+0xc5/0xd0 ? ip_rcv_finish_core.constprop.0+0x390/0x390 __netif_receive_skb_one_core+0x87/0xa0 netif_receive_skb_internal+0x52/0xc0 napi_gro_receive+0x114/0x140 igb_poll+0x56e/0xf40 [igb] net_rx_action+0x16d/0x3f0 ? rcu_core+0xde/0x4e0 __do_softirq+0xe9/0x2dc irq_exit+0xa2/0xe0 do_IRQ+0x58/0xe0 common_interrupt+0xf/0xf </IRQ> RIP: 0010:cpuidle_enter_state+0xea/0x400 Code: ed 8d 7e e8 58 39 99 ff 48 89 04 24 0f 1f 44 00 00 31 ff e8 08 47 99 ff 80 7c 24 0b 00 0f 85 12 02 00 00 fb 66 0f 1f 44 00 00 <45> 85 e4 0f 88 1e 02 00 00 49 63 cc 48 8b 34 24 48 8d 04 49 48 c1 RSP: 0000:ffffc900000c7e80 EFLAGS: 00000246 ORIG_RAX: ffffffffffffffd8 RAX: ffff88884eaaa500 RBX: ffffffff822bba00 RCX: 000000000000001f RDX: 0000000000000000 RSI: 000000002819aad9 RDI: 0000000000000000 RBP: ffff88884eab4548 R08: 00002728d4ce484e R09: 0000000000000005 R10: 0000000000000031 R11: ffff88884eaa92c0 R12: 0000000000000001 R13: ffffffff822bba60 R14: ffffffff822bba78 R15: 00002728d4c27551 cpuidle_enter+0x29/0x40w6 do_idle+0x1c4/0x240 cpu_startup_entry+0x19/0x20 start_secondary+0x176/0x1d0 secondary_startup_64+0xb6/0xc0 Modules linked in: rpcsec_gss_krb5 veth tun xt_nat xt_MASQUERADE nf_conntrack_netlink nfnetlink xfrm_user xfrm_algo xt_addrtype iptable_nat nf_nat br_netfilter bridge stp llc xt_tcpudp xt_state msr xt_conntrack nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 iptable_filter intel_rapl_msr snd_hda_codec_hdmi intel_rapl_common snd_hda_codec_realtek snd_hda_codec_generic ledtrig_audio snd_hda_intel snd_intel_nhlt snd_hda_codec eeepc_wmi x86_pkg_temp_thermal asus_wmi intel_powerclamp battery sparse_keymap coretemp iTCO_wdt iTCO_vendor_support mei_hdcp rfkill snd_hda_core wmi_bmof snd_hwdep mxm_wmi kvm snd_pcm snd_timer igb irqbypass intel_cstate intel_uncore snd intel_rapl_perf pcspkr i2c_i801 soundcore dca mei_me joydev mousedev mei input_leds wmi evdev mac_hid nfsd nfs_acl lockd auth_rpcgss grace sunrpc ip_tables x_tables btrfs libcrc32c crc32c_generic xor raid6_pq dm_crypt dm_mod hid_logitech_hidpp hid_logitech_dj hid_generic sd_mod usbhid hid crct10dif_pclmul crc32_pclmulFn crc32c_intel ghash_clmulni_intel aesni_intel ahci libahci crypto_simd cryptd glue_helper libata xhci_pci xhci_hcd scsi_mod i915 intel_gtt i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm agpgartYD CR2: 0000000000000010
(In reply to wolf from comment #8) > Possible workaround based on [0] is: Our machine with this workaround crashed after 3 days.
Contrary to the "Kernel Version" tag above this bug has been introduced with kernel 5.6.14. Have been hit by this, too, on a debian system. Machine is running docker and several other services. Reproducible stops working after several hours. Tested all kernels up to 5.7.1, all showing the same behaviour. Downgrading to 5.6.13 solves the problem. Trying 5.7.1 with e2d928d5ee43f372618a9f98b0c73674717f2a2c reverted now.
I can confirm Andreas Kurth observation as well. My kernel crashed regularly with 5.7.2 but reverting to 5.6.13 does not trigger this problem. I can add that I do not have anything related to IPAddressDeny however VMs I run on the crashing host do show bpf-prog running. They are part of a kubernetes cluster.
We have the same problem with all kernels past 5.6.13 on two different servers. On both servers VMs are running virtualized by QEMU/KVM and some Docker containers too. Here's the dmesg of a crash on one of the servers: [106909.018431][ T4042] BUG: kernel NULL pointer dereference, address: 0000000000000010 [106909.020070][ T4042] #PF: supervisor read access in kernel mode [106909.021678][ T4042] #PF: error_code(0x0000) - not-present page [106909.023550][ T4042] PGD 0 P4D 0 [106909.025622][ T4042] Oops: 0000 [#1] SMP PTI [106909.027662][ T4042] CPU: 6 PID: 4042 Comm: vhost-3998 Not tainted 5.7.1-serverv4 #336 [106909.029720][ T4042] Hardware name: Supermicro A1SAi/A1SRi, BIOS 2.2 11/22/2019 [106909.031802][ T4042] RIP: 0010:__cgroup_bpf_run_filter_skb+0xe7/0x3d0 [106909.033912][ T4042] Code: 89 87 c8 00 00 00 48 01 c8 48 89 5c 24 18 49 89 47 50 41 83 fd 01 0f 84 8b 01 00 00 44 89 ea 48 8d 84 d6 30 06 00 00 48 8b 00 <4c> 8b 70 10 4c 8d 68 10 4d 85 f6 0f 84 b7 02 00 00 49 8d 47 30 bb [106909.039815][ T4042] RSP: 0018:ffffa90f0160f7f8 EFLAGS: 00010297 [106909.042990][ T4042] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 000000000000003c [106909.046252][ T4042] RDX: 0000000000000000 RSI: ffff9b3d6b34d000 RDI: ffff9b3d6cbbb480 [106909.049508][ T4042] RBP: ffff9b3d69267000 R08: 000000000000b403 R09: 0000000000000000 [106909.052783][ T4042] R10: 0000000000000000 R11: 00000000712aa8c0 R12: 0000000000000006 [106909.056057][ T4042] R13: 0000000000000000 R14: 0000000000000000 R15: ffff9b3d69267000 [106909.059322][ T4042] FS: 0000000000000000(0000) GS:ffff9b3d6fd80000(0000) knlGS:0000000000000000 [106909.062623][ T4042] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [106909.065923][ T4042] CR2: 0000000000000010 CR3: 000000043ef76000 CR4: 00000000001026e0 [106909.069247][ T4042] Call Trace: [106909.072534][ T4042] sk_filter_trim_cap+0xd7/0x210 [106909.075806][ T4042] ? tcp_v4_inbound_md5_hash+0x4c/0x160 [106909.079059][ T4042] tcp_v4_rcv+0xb52/0xd60 [106909.082281][ T4042] ip_protocol_deliver_rcu+0x2b/0x1b0 [106909.085493][ T4042] ip_local_deliver_finish+0x44/0x50 [106909.088692][ T4042] ip_local_deliver+0xe5/0xf0 [106909.091877][ T4042] ? ip_protocol_deliver_rcu+0x1b0/0x1b0 [106909.095103][ T4042] ip_rcv+0xbc/0xd0 [106909.098290][ T4042] ? ip_rcv_finish_core.isra.0+0x410/0x410 [106909.101492][ T4042] __netif_receive_skb_one_core+0x87/0xa0 [106909.104720][ T4042] netif_receive_skb+0x46/0x150 [106909.107911][ T4042] br_pass_frame_up+0x133/0x150 [106909.111047][ T4042] ? fdb_find_rcu+0xf2/0x170 [106909.114128][ T4042] br_handle_frame_finish+0x316/0x440 [106909.117193][ T4042] br_handle_frame+0x241/0x370 [106909.120242][ T4042] ? br_pass_frame_up+0x150/0x150 [106909.123235][ T4042] __netif_receive_skb_core+0x2ee/0xef0 [106909.126201][ T4042] __netif_receive_skb_one_core+0x3d/0xa0 [106909.129139][ T4042] netif_receive_skb+0x46/0x150 [106909.132041][ T4042] ? __build_skb+0x1f/0x50 [106909.134910][ T4042] tun_sendmsg+0x459/0x85e [106909.137791][ T4042] vhost_tx_batch.isra.0+0x5d/0xd0 [vhost_net] [106909.140730][ T4042] handle_tx_copy+0x15a/0x550 [vhost_net] [106909.143658][ T4042] handle_tx+0xa5/0xe0 [vhost_net] [106909.146553][ T4042] vhost_worker+0xac/0x100 [vhost] [106909.149459][ T4042] kthread+0xfb/0x130 [106909.152321][ T4042] ? __vhost_add_used_n+0x210/0x210 [vhost] [106909.155170][ T4042] ? kthread_park+0x90/0x90 [106909.157948][ T4042] ret_from_fork+0x35/0x40 The other one crashes too but with completely other output but the crashes are reproducible with 5.6.14+: 8959.334989][ T264] ata6.00: exception Emask 0x0 SAct 0xffffffff SErr 0x0 action 0x6 frozen [ 8959.335005][ T264] ata6.00: failed command: READ FPDMA QUEUED [ 8959.335017][ T264] ata6.00: cmd 60/08:00:58:4c:bb/00:00:03:00:00/40 tag 0 ncq dma 4096 in [ 8959.335017][ T264] res 40/00:00:00:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout) [ 8959.335025][ T264] ata6.00: status: { DRDY } [ 8959.335032][ T264] ata6.00: failed command: READ FPDMA QUEUED [ 8959.335042][ T264] ata6.00: cmd 60/08:08:b0:4c:98/00:00:07:00:00/40 tag 1 ncq dma 4096 in [ 8959.335042][ T264] res 40/00:01:00:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout) [ 8959.335048][ T264] ata6.00: status: { DRDY } [ 8959.335054][ T264] ata6.00: failed command: READ FPDMA QUEUED [ 8959.335063][ T264] ata6.00: cmd 60/08:10:f0:b5:0a/00:00:09:00:00/40 tag 2 ncq dma 4096 in [ 8959.335063][ T264] res 40/00:ff:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout) [ 8959.335068][ T264] ata6.00: status: { DRDY } [ 8959.335073][ T264] ata6.00: failed command: READ FPDMA QUEUED [ 8959.335084][ T264] ata6.00: cmd 60/18:18:78:fe:1f/00:00:00:00:00/40 tag 3 ncq dma 12288 in [ 8959.335084][ T264] res 40/00:01:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout) [ 8959.335091][ T264] ata6.00: status: { DRDY } [ 8959.335096][ T264] ata6.00: failed command: READ FPDMA QUEUED [ 8959.335106][ T264] ata6.00: cmd 60/08:20:f0:54:bf/00:00:0c:00:00/40 tag 4 ncq dma 4096 in [ 8959.335106][ T264] res 40/00:ff:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout) [ 8959.335111][ T264] ata6.00: status: { DRDY } [ 8959.335116][ T264] ata6.00: failed command: READ FPDMA QUEUED [ 8959.335124][ T264] ata6.00: cmd 60/30:28:98:fe:1f/00:00:00:00:00/40 tag 5 ncq dma 24576 in [ 8959.335124][ T264] res 40/00:01:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout) [ 8959.335130][ T264] ata6.00: status: { DRDY } [ 8959.335135][ T264] ata6.00: failed command: READ FPDMA QUEUED [ 8959.335145][ T264] ata6.00: cmd 60/20:30:d0:fe:1f/00:00:00:00:00/40 tag 6 ncq dma 16384 in [ 8959.335145][ T264] res 40/00:01:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout) [ 8959.335150][ T264] ata6.00: status: { DRDY } [ 8959.335155][ T264] ata6.00: failed command: READ FPDMA QUEUED [ 8959.335165][ T264] ata6.00: cmd 60/20:38:c0:cc:54/00:00:07:00:00/40 tag 7 ncq dma 16384 in [ 8959.335165][ T264] res 40/00:01:01:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout) [ 8959.335170][ T264] ata6.00: status: { DRDY } [ 8959.335175][ T264] ata6.00: failed command: READ FPDMA QUEUED [ 8959.335183][ T264] ata6.00: cmd 60/08:40:10:ff:1f/00:00:00:00:00/40 tag 8 ncq dma 4096 in [ 8959.335183][ T264] res 40/00:00:00:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout) [ 8959.335187][ T264] ata6.00: status: { DRDY } [ 8959.335191][ T264] ata6.00: failed command: READ FPDMA QUEUED [ 8959.335198][ T264] ata6.00: cmd 60/08:48:a8:4c:98/00:00:07:00:00/40 tag 9 ncq dma 4096 in [ 8959.335198][ T264] res 40/00:01:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout) [ 8959.335202][ T264] ata6.00: status: { DRDY } [ 8959.335206][ T264] ata6.00: failed command: READ FPDMA QUEUED [ 8959.335213][ T264] ata6.00: cmd 60/28:50:20:ff:1f/00:00:00:00:00/40 tag 10 ncq dma 20480 in [ 8959.335213][ T264] res 40/00:ff:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout) [ 8959.335217][ T264] ata6.00: status: { DRDY } [ 8959.335221][ T264] ata6.00: failed command: READ FPDMA QUEUED [ 8959.335228][ T264] ata6.00: cmd 60/08:58:f8:b5:0a/00:00:09:00:00/40 tag 11 ncq dma 4096 in [ 8959.335228][ T264] res 40/00:01:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout) [ 8959.335233][ T264] ata6.00: status: { DRDY } [ 8959.335237][ T264] ata6.00: failed command: READ FPDMA QUEUED [ 8959.335244][ T264] ata6.00: cmd 60/20:60:58:ff:1f/00:00:00:00:00/40 tag 12 ncq dma 16384 in [ 8959.335244][ T264] res 40/00:01:00:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout) [ 8959.335249][ T264] ata6.00: status: { DRDY } [ 8959.335253][ T264] ata6.00: failed command: READ FPDMA QUEUED [ 8959.335261][ T264] ata6.00: cmd 60/20:68:70:fd:1f/00:00:00:00:00/40 tag 13 ncq dma 16384 in [ 8959.335261][ T264] res 40/00:01:01:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout) [ 8959.335265][ T264] ata6.00: status: { DRDY } [ 8959.335269][ T264] ata6.00: failed command: READ FPDMA QUEUED [ 8959.335276][ T264] ata6.00: cmd 60/08:70:80:4c:98/00:00:07:00:00/40 tag 14 ncq dma 4096 in [ 8959.335276][ T264] res 40/00:00:00:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout) [ 8959.335280][ T264] ata6.00: status: { DRDY } [ 8959.335284][ T264] ata6.00: failed command: READ FPDMA QUEUED [ 8959.335291][ T264] ata6.00: cmd 60/28:78:98:fd:1f/00:00:00:00:00/40 tag 15 ncq dma 20480 in [ 8959.335291][ T264] res 40/00:01:01:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout) [ 8959.335295][ T264] ata6.00: status: { DRDY } [ 8959.335300][ T264] ata6.00: failed command: READ FPDMA QUEUED [ 8959.335307][ T264] ata6.00: cmd 60/18:80:c8:fd:1f/00:00:00:00:00/40 tag 16 ncq dma 12288 in [ 8959.335307][ T264] res 40/00:01:06:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout) [ 8959.335311][ T264] ata6.00: status: { DRDY } [ 8959.335316][ T264] ata6.00: failed command: READ FPDMA QUEUED [ 8959.335323][ T264] ata6.00: cmd 60/08:88:e8:b5:0a/00:00:09:00:00/40 tag 17 ncq dma 4096 in [ 8959.335323][ T264] res 40/00:01:06:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout) [ 8959.335327][ T264] ata6.00: status: { DRDY } [ 8959.335331][ T264] ata6.00: failed command: READ FPDMA QUEUED [ 8959.335338][ T264] ata6.00: cmd 60/08:90:e8:fd:1f/00:00:00:00:00/40 tag 18 ncq dma 4096 in [ 8959.335338][ T264] res 40/00:01:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout) [ 8959.335342][ T264] ata6.00: status: { DRDY } [ 8959.335346][ T264] ata6.00: failed command: READ FPDMA QUEUED [ 8959.335354][ T264] ata6.00: cmd 60/08:98:e8:54:bf/00:00:0c:00:00/40 tag 19 ncq dma 4096 in [ 8959.335354][ T264] res 40/00:01:00:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout) [ 8959.335358][ T264] ata6.00: status: { DRDY } [ 8959.335362][ T264] ata6.00: failed command: READ FPDMA QUEUED [ 8959.335369][ T264] ata6.00: cmd 60/08:a0:f8:fd:1f/00:00:00:00:00/40 tag 20 ncq dma 4096 in [ 8959.335369][ T264] res 40/00:ff:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout) [ 8959.335373][ T264] ata6.00: status: { DRDY } [ 8959.335377][ T264] ata6.00: failed command: READ FPDMA QUEUED [ 8959.335383][ T264] ata6.00: cmd 60/18:a8:a8:cc:54/00:00:07:00:00/40 tag 21 ncq dma 12288 in [ 8959.335383][ T264] res 40/00:01:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout) [ 8959.335387][ T264] ata6.00: status: { DRDY } [ 8959.335392][ T264] ata6.00: failed command: READ FPDMA QUEUED [ 8959.335399][ T264] ata6.00: cmd 60/08:b0:20:fe:1f/00:00:00:00:00/40 tag 22 ncq dma 4096 in [ 8959.335399][ T264] res 40/00:01:00:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout) [ 8959.335403][ T264] ata6.00: status: { DRDY } [ 8959.335407][ T264] ata6.00: failed command: READ FPDMA QUEUED [ 8959.335413][ T264] ata6.00: cmd 60/20:b8:88:4c:98/00:00:07:00:00/40 tag 23 ncq dma 16384 in [ 8959.335413][ T264] res 40/00:01:00:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout) [ 8959.335417][ T264] ata6.00: status: { DRDY } [ 8959.335421][ T264] ata6.00: failed command: READ FPDMA QUEUED [ 8959.335429][ T264] ata6.00: cmd 60/30:c0:80:ff:1f/00:00:00:00:00/40 tag 24 ncq dma 24576 in [ 8959.335429][ T264] res 40/00:01:00:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout) [ 8959.335433][ T264] ata6.00: status: { DRDY } [ 8959.335438][ T264] ata6.00: failed command: READ FPDMA QUEUED [ 8959.335445][ T264] ata6.00: cmd 60/08:c8:f8:54:bf/00:00:0c:00:00/40 tag 25 ncq dma 4096 in [ 8959.335445][ T264] res 40/00:01:01:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout) [ 8959.335449][ T264] ata6.00: status: { DRDY } [ 8959.335453][ T264] ata6.00: failed command: READ FPDMA QUEUED [ 8959.335459][ T264] ata6.00: cmd 60/28:d0:b8:ff:1f/00:00:00:00:00/40 tag 26 ncq dma 20480 in [ 8959.335459][ T264] res 40/00:01:00:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout) [ 8959.335463][ T264] ata6.00: status: { DRDY } [ 8959.335467][ T264] ata6.00: failed command: READ FPDMA QUEUED [ 8959.335474][ T264] ata6.00: cmd 60/08:d8:e0:cc:54/00:00:07:00:00/40 tag 27 ncq dma 4096 in [ 8959.335474][ T264] res 40/00:01:01:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout) [ 8959.335478][ T264] ata6.00: status: { DRDY } [ 8959.335483][ T264] ata6.00: failed command: READ FPDMA QUEUED [ 8959.335489][ T264] ata6.00: cmd 60/18:e0:e8:ff:1f/00:00:00:00:00/40 tag 28 ncq dma 12288 in [ 8959.335489][ T264] res 40/00:00:00:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout) [ 8959.335494][ T264] ata6.00: status: { DRDY } [ 8959.335498][ T264] ata6.00: failed command: READ FPDMA QUEUED [ 8959.335505][ T264] ata6.00: cmd 60/08:e8:00:4c:bb/00:00:03:00:00/40 tag 29 ncq dma 4096 in [ 8959.335505][ T264] res 40/00:01:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout) [ 8959.335508][ T264] ata6.00: status: { DRDY } [ 8959.335512][ T264] ata6.00: failed command: READ FPDMA QUEUED [ 8959.335520][ T264] ata6.00: cmd 60/20:f0:10:4c:bb/00:00:03:00:00/40 tag 30 ncq dma 16384 in [ 8959.335520][ T264] res 40/00:01:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout) [ 8959.335524][ T264] ata6.00: status: { DRDY } [ 8959.335528][ T264] ata6.00: failed command: READ FPDMA QUEUED [ 8959.335535][ T264] ata6.00: cmd 60/20:f8:50:fe:1f/00:00:00:00:00/40 tag 31 ncq dma 16384 in [ 8959.335535][ T264] res 40/00:01:00:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout) [ 8959.335539][ T264] ata6.00: status: { DRDY } [ 8959.335546][ T264] ata6: hard resetting link [ 8959.810909][ T264] ata6: SATA link up 6.0 Gbps (SStatus 133 SControl 300) [ 8959.816788][ T264] ata6.00: configured for UDMA/133 [ 8959.816866][ T264] ata6: EH complete [ 8992.114805][ T264] ata6.00: exception Emask 0x0 SAct 0xffffffff SErr 0x0 action 0x6 frozen [ 8992.114815][ T264] ata6.00: failed command: READ FPDMA QUEUED [ 8992.114821][ T264] ata6.00: cmd 60/10:00:70:d4:23/00:00:0a:00:00/40 tag 0 ncq dma 8192 in [ 8992.114821][ T264] res 40/00:00:00:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout) [ 8992.114830][ T264] ata6.00: failed command: READ FPDMA QUEUED [ 8992.114835][ T264] ata6.00: cmd 60/10:08:88:d4:23/00:00:0a:00:00/40 tag 1 ncq dma 8192 in [ 8992.114835][ T264] res 40/00:01:00:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout) [ 8992.114839][ T264] ata6.00: status: { DRDY } [ 8992.114842][ T264] ata6.00: failed command: READ FPDMA QUEUED [ 8992.114850][ T264] ata6.00: cmd 60/28:10:a0:d4:23/00:00:0a:00:00/40 tag 2 ncq dma 20480 in [ 8992.114850][ T264] res 40/00:ff:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout) [ 8992.114855][ T264] ata6.00: status: { DRDY } [ 8992.114858][ T264] ata6.00: failed command: READ FPDMA QUEUED [ 8992.114862][ T264] ata6.00: cmd 60/18:18:d0:d4:23/00:00:0a:00:00/40 tag 3 ncq dma 12288 in [ 8992.114862][ T264] res 40/00:01:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout) [ 8992.114867][ T264] ata6.00: status: { DRDY } [ 8992.114870][ T264] ata6.00: failed command: READ FPDMA QUEUED [ 8992.114878][ T264] ata6.00: cmd 60/08:20:f0:d4:23/00:00:0a:00:00/40 tag 4 ncq dma 4096 in [ 8992.114878][ T264] res 40/00:ff:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout) [ 8992.114881][ T264] ata6.00: status: { DRDY } [ 8992.114886][ T264] ata6.00: failed command: READ FPDMA QUEUED [ 8992.114891][ T264] ata6.00: cmd 60/10:28:10:d5:23/00:00:0a:00:00/40 tag 5 ncq dma 8192 in [ 8992.114891][ T264] res 40/00:01:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout) [ 8992.114895][ T264] ata6.00: status: { DRDY } [ 8992.114898][ T264] ata6.00: failed command: READ FPDMA QUEUED [ 8992.114902][ T264] ata6.00: cmd 60/28:30:28:d5:23/00:00:0a:00:00/40 tag 6 ncq dma 20480 in [ 8992.114902][ T264] res 40/00:01:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout) [ 8992.114908][ T264] ata6.00: status: { DRDY } [ 8992.114911][ T264] ata6.00: failed command: READ FPDMA QUEUED [ 8992.114918][ T264] ata6.00: cmd 60/18:38:58:d5:23/00:00:0a:00:00/40 tag 7 ncq dma 12288 in [ 8992.114918][ T264] res 40/00:01:01:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout) [ 8992.114924][ T264] ata6.00: status: { DRDY } [ 8992.114927][ T264] ata6.00: failed command: READ FPDMA QUEUED [ 8992.114933][ T264] ata6.00: cmd 60/18:40:c8:d5:23/00:00:0a:00:00/40 tag 8 ncq dma 12288 in [ 8992.114933][ T264] res 40/00:00:00:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout) [ 8992.114936][ T264] ata6.00: status: { DRDY } [ 8992.114942][ T264] ata6.00: failed command: READ FPDMA QUEUED [ 8992.114947][ T264] ata6.00: cmd 60/28:48:e8:d5:23/00:00:0a:00:00/40 tag 9 ncq dma 20480 in [ 8992.114947][ T264] res 40/00:01:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout) [ 8992.114951][ T264] ata6.00: status: { DRDY } [ 8992.114954][ T264] ata6.00: failed command: READ FPDMA QUEUED [ 8992.114959][ T264] ata6.00: cmd 60/10:50:18:d6:23/00:00:0a:00:00/40 tag 10 ncq dma 8192 in [ 8992.114959][ T264] res 40/00:ff:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout) [ 8992.114964][ T264] ata6.00: status: { DRDY } [ 8992.114967][ T264] ata6.00: failed command: READ FPDMA QUEUED [ 8992.114974][ T264] ata6.00: cmd 60/18:58:30:d6:23/00:00:0a:00:00/40 tag 11 ncq dma 12288 in [ 8992.114974][ T264] res 40/00:01:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout) [ 8992.114980][ T264] ata6.00: status: { DRDY } [ 8992.114983][ T264] ata6.00: failed command: READ FPDMA QUEUED [ 8992.114987][ T264] ata6.00: cmd 60/10:60:50:d6:23/00:00:0a:00:00/40 tag 12 ncq dma 8192 in [ 8992.114987][ T264] res 40/00:01:00:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout) [ 8992.114992][ T264] ata6.00: status: { DRDY } [ 8992.114998][ T264] ata6.00: failed command: READ FPDMA QUEUED [ 8992.115003][ T264] ata6.00: cmd 60/08:68:90:d6:23/00:00:0a:00:00/40 tag 13 ncq dma 4096 in [ 8992.115003][ T264] res 40/00:01:01:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout) [ 8992.115008][ T264] ata6.00: status: { DRDY } [ 8992.115010][ T264] ata6.00: failed command: READ FPDMA QUEUED [ 8992.115015][ T264] ata6.00: cmd 60/28:70:a0:d6:23/00:00:0a:00:00/40 tag 14 ncq dma 20480 in [ 8992.115015][ T264] res 40/00:00:00:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout) [ 8992.115020][ T264] ata6.00: status: { DRDY } [ 8992.115023][ T264] ata6.00: failed command: READ FPDMA QUEUED [ 8992.115031][ T264] ata6.00: cmd 60/20:78:68:d6:7c/00:00:0a:00:00/40 tag 15 ncq dma 16384 in [ 8992.115031][ T264] res 40/00:01:01:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout) [ 8992.115033][ T264] ata6.00: status: { DRDY } [ 8992.115039][ T264] ata6.00: failed command: READ FPDMA QUEUED [ 8992.115044][ T264] ata6.00: cmd 60/10:80:d0:d6:23/00:00:0a:00:00/40 tag 16 ncq dma 8192 in [ 8992.115044][ T264] res 40/00:01:06:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout) [ 8992.115050][ T264] ata6.00: status: { DRDY } [ 8992.115055][ T264] ata6.00: failed command: READ FPDMA QUEUED [ 8992.115060][ T264] ata6.00: cmd 60/08:88:f0:d6:23/00:00:0a:00:00/40 tag 17 ncq dma 4096 in [ 8992.115060][ T264] res 40/00:01:06:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout) [ 8992.115065][ T264] ata6.00: status: { DRDY } [ 8992.115068][ T264] ata6.00: failed command: READ FPDMA QUEUED [ 8992.115073][ T264] ata6.00: cmd 60/28:90:00:d7:23/00:00:0a:00:00/40 tag 18 ncq dma 20480 in [ 8992.115073][ T264] res 40/00:01:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout) [ 8992.115078][ T264] ata6.00: status: { DRDY } [ 8992.115081][ T264] ata6.00: failed command: READ FPDMA QUEUED [ 8992.115089][ T264] ata6.00: cmd 60/08:98:30:d7:23/00:00:0a:00:00/40 tag 19 ncq dma 4096 in [ 8992.115089][ T264] res 40/00:01:00:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout) [ 8992.115091][ T264] ata6.00: status: { DRDY } [ 8992.115097][ T264] ata6.00: failed command: READ FPDMA QUEUED [ 8992.115101][ T264] ata6.00: cmd 60/10:a0:50:d7:23/00:00:0a:00:00/40 tag 20 ncq dma 8192 in [ 8992.115101][ T264] res 40/00:ff:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout) [ 8992.115104][ T264] ata6.00: status: { DRDY } [ 8992.115107][ T264] ata6.00: failed command: READ FPDMA QUEUED [ 8992.115112][ T264] ata6.00: cmd 60/10:a8:50:d7:7c/00:00:0a:00:00/40 tag 21 ncq dma 8192 in [ 8992.115112][ T264] res 40/00:01:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout) [ 8992.115114][ T264] ata6.00: status: { DRDY } [ 8992.115123][ T264] ata6.00: failed command: READ FPDMA QUEUED [ 8992.115127][ T264] ata6.00: cmd 60/08:b0:68:d7:7c/00:00:0a:00:00/40 tag 22 ncq dma 4096 in [ 8992.115127][ T264] res 40/00:01:00:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout) [ 8992.115130][ T264] ata6.00: status: { DRDY } [ 8992.115137][ T264] ata6.00: failed command: READ FPDMA QUEUED [ 8992.115142][ T264] ata6.00: cmd 60/28:b8:78:d7:7c/00:00:0a:00:00/40 tag 23 ncq dma 20480 in [ 8992.115142][ T264] res 40/00:01:00:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout) [ 8992.115147][ T264] ata6.00: status: { DRDY } [ 8992.115150][ T264] ata6.00: failed command: READ FPDMA QUEUED [ 8992.115154][ T264] ata6.00: cmd 60/20:c0:70:d7:23/00:00:0a:00:00/40 tag 24 ncq dma 16384 in [ 8992.115154][ T264] res 40/00:01:00:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout) [ 8992.115160][ T264] ata6.00: status: { DRDY } [ 8992.115164][ T264] ata6.00: failed command: READ FPDMA QUEUED [ 8992.115170][ T264] ata6.00: cmd 60/20:c8:98:d7:23/00:00:0a:00:00/40 tag 25 ncq dma 16384 in [ 8992.115170][ T264] res 40/00:01:01:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout) [ 8992.115175][ T264] ata6.00: status: { DRDY } [ 8992.115178][ T264] ata6.00: failed command: READ FPDMA QUEUED [ 8992.115183][ T264] ata6.00: cmd 60/08:d0:00:d4:23/00:00:0a:00:00/40 tag 26 ncq dma 4096 in [ 8992.115183][ T264] res 40/00:01:00:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout) [ 8992.115188][ T264] ata6.00: status: { DRDY } [ 8992.115191][ T264] ata6.00: failed command: READ FPDMA QUEUED [ 8992.115198][ T264] ata6.00: cmd 60/28:d8:10:d4:23/00:00:0a:00:00/40 tag 27 ncq dma 20480 in [ 8992.115198][ T264] res 40/00:01:01:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout) [ 8992.115200][ T264] ata6.00: status: { DRDY } [ 8992.115206][ T264] ata6.00: failed command: READ FPDMA QUEUED [ 8992.115211][ T264] ata6.00: cmd 60/20:e0:c0:d7:23/00:00:0a:00:00/40 tag 28 ncq dma 16384 in [ 8992.115211][ T264] res 40/00:00:00:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout) [ 8992.115214][ T264] ata6.00: status: { DRDY } [ 8992.115219][ T264] ata6.00: failed command: READ FPDMA QUEUED [ 8992.115224][ T264] ata6.00: cmd 60/08:e8:f8:d7:23/00:00:0a:00:00/40 tag 29 ncq dma 4096 in [ 8992.115224][ T264] res 40/00:01:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout) [ 8992.115228][ T264] ata6.00: status: { DRDY } [ 8992.115239][ T264] ata6.00: failed command: READ FPDMA QUEUED [ 8992.115244][ T264] ata6.00: cmd 60/20:f0:18:58:22/00:00:0b:00:00/40 tag 30 ncq dma 16384 in [ 8992.115244][ T264] res 40/00:01:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout) [ 8992.115246][ T264] ata6.00: status: { DRDY } [ 8992.115252][ T264] ata6.00: failed command: READ FPDMA QUEUED [ 8992.115259][ T264] ata6.00: cmd 60/28:f8:40:d4:23/00:00:0a:00:00/40 tag 31 ncq dma 20480 in [ 8992.115259][ T264] res 40/00:01:00:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout) [ 8992.115262][ T264] ata6.00: status: { DRDY } [ 8992.115268][ T264] ata6: hard resetting link [ 8992.590750][ T264] ata6: SATA link up 6.0 Gbps (SStatus 133 SControl 300) [ 8992.596667][ T264] ata6.00: configured for UDMA/133 [ 8992.596741][ T264] ata6: EH complete I know, the second log snippet is completely unrelated to the first one but both servers are crashing with 5.6.14+ and are stable with all kernel version before that.
While observing the issue I noticed that the crash might be related to me using the NFS4 server on the host. Sometimes the host would crash after I shutdown my client. Today it crashed after using NFS and then taking a small break. So I tried to stress nfs a bit. To my luck I could get a reproduction very quickly. It appears to crash within a few cycles of the script loop. #!/bin/bash while [ 1 ] do date mount /mnt/nfs ls -l /mnt/nfs umount /mnt/nfs echo sleeping... sleep 1 done the share options in fstab are nfs4 with user,intr,async,noauto Anyone else running NFS4 on their affected machines?
Our machine which is affected by this is used as an NFS server.
There is a patch on the netdev mailing list that appears to alleviate the problem partially (although not a full fix - AIUI the problem is wrong refcounting somewhere exacerbated by 090e28b229af92dc5b40786ca673999d59e73056; refcounting tends to be rather hard to debug): https://www.spinics.net/lists/netdev/msg660252.html It would be great if you could try the patch and then report if you can still repro (or not). Alternatively, just send the repro over to the mailing list, it might help. For me, it seems to happen sporadically when I use Docker. Haven't figured out a reliable reproduction yet.
I have two affected servers, both running nfsd and docker containers. One of the servers crashes in hours, the other runs about a day before crashing. The server that crashes quickly is both a busier docker server and nfs server than the the slower crasher; however, the nfs load difference is the most significant. The fast crasher also has a number of automount clients, with much more frequent mount/umount cycles. I found the following snippet in the log of the "fast crasher" about 130 minutes prior to a crash, so I don't know if it is directly related or not. Jun 07 06:24:09 kernel: NFSD: Using nfsdcld client tracking operations. Jun 07 06:24:09 kernel: NFSD: starting 90-second grace period (net f00000a8) Jun 07 06:24:12 kernel: ------------[ cut here ]------------ Jun 07 06:24:12 kernel: WARNING: CPU: 1 PID: 2018 at fs/nfsd/nfs4state.c:4873 nfsd4_process_open2+0x1291/0x1610 [nfsd] Jun 07 06:24:12 kernel: Modules linked in: [modules list removed] Jun 07 06:24:12 kernel: [modules list removed] Jun 07 06:24:12 kernel: CPU: 1 PID: 2018 Comm: nfsd Tainted: G OE 5.6.15-200.fc31.x86_64 #1 Jun 07 06:24:12 kernel: Hardware name: Supermicro X10SAE/X10SAE, BIOS 3.1 02/23/2018 Jun 07 06:24:12 kernel: RIP: 0010:nfsd4_process_open2+0x1291/0x1610 [nfsd] Jun 07 06:24:12 kernel: Code: 9a c0 c6 07 00 0f 1f 40 00 8b 54 24 48 85 d2 0f 85 b0 02 00 00 8b 05 22 19 e9 ff 4c 8b 64 24 58 89 c1 83 e1 10 e9 f4 f8 ff ff <0f> 0b e9 5a f4 ff ff 0f 0b e9 1b f2 ff ff 48 c7 44 24 50 00 00 00 Jun 07 06:24:12 kernel: RSP: 0018:ffffbd97811f7cc0 EFLAGS: 00010246 Jun 07 06:24:12 kernel: RAX: 0000000000000000 RBX: ffff98fcc16611e0 RCX: ffff98fcb10b6a50 Jun 07 06:24:12 kernel: RDX: 0000000000000001 RSI: ffff98fcb7b05498 RDI: ffff98fcb7b05444 Jun 07 06:24:12 kernel: RBP: ffffbd97811f7da0 R08: ffff98fcc1b020e0 R09: ffff98fcb10b6a50 Jun 07 06:24:12 kernel: R10: ffff98fcc1b02098 R11: 0000000000000000 R12: ffff98fcb7b05444 Jun 07 06:24:12 kernel: R13: ffff98fd39c04398 R14: ffff98fcb7b05440 R15: ffff98fcb7b05440 Jun 07 06:24:12 kernel: FS: 0000000000000000(0000) GS:ffff98fd3fa40000(0000) knlGS:0000000000000000 Jun 07 06:24:12 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Jun 07 06:24:12 kernel: CR2: 00007f41dfc0e790 CR3: 000000014660a004 CR4: 00000000001606e0 Jun 07 06:24:12 kernel: Call Trace: Jun 07 06:24:12 kernel: ? nfsd_permission+0x63/0xe0 [nfsd] Jun 07 06:24:12 kernel: ? fh_verify+0x133/0x550 [nfsd] Jun 07 06:24:12 kernel: nfsd4_open+0x35b/0x730 [nfsd] Jun 07 06:24:12 kernel: nfsd4_proc_compound+0x3f2/0x760 [nfsd] Jun 07 06:24:12 kernel: ? nfs4svc_decode_compoundargs+0x407/0x4c0 [nfsd] Jun 07 06:24:12 kernel: nfsd_dispatch+0xc1/0x200 [nfsd] Jun 07 06:24:12 kernel: svc_process_common+0x459/0x6d0 [sunrpc] Jun 07 06:24:12 kernel: ? svc_recv+0x313/0x9c0 [sunrpc] Jun 07 06:24:12 kernel: ? nfsd_svc+0x2d0/0x2d0 [nfsd] Jun 07 06:24:12 kernel: svc_process+0xd4/0x110 [sunrpc] Jun 07 06:24:12 kernel: nfsd+0xe3/0x140 [nfsd] Jun 07 06:24:12 kernel: kthread+0xf9/0x130 Jun 07 06:24:12 kernel: ? nfsd_destroy+0x50/0x50 [nfsd] Jun 07 06:24:12 kernel: ? kthread_park+0x90/0x90 Jun 07 06:24:12 kernel: ret_from_fork+0x35/0x40 Jun 07 06:24:12 kernel: ---[ end trace 1d0afbeba344bcd4 ]---
I'm running the patched kernel (https://www.spinics.net/lists/netdev/msg660252.html) now. The Arch Linux stock kernel (5.7.6) crashed within 30 seconds. The patched vanilla 5.7.6 is running for 45 minutes already.
One full week without any crash running 5.4.47 with the patch (https://www.spinics.net/lists/netdev/msg660252.html). Without the patch, the kernel crash in less than 12h.
The patch referenced above seems to be successful. Is there anyone who can address when the patch might make it into the kernel?
Patch is being discussed here: https://patchwork.ozlabs.org/project/netdev/patch/20200616180352.18602-1-xiyou.wangcong@gmail.com/ Sorry for my outdated information earlier.
https://patchwork.ozlabs.org/project/netdev/patch/20200702185256.17917-1-xiyou.wangcong@gmail.com/ Seems like it will be merged into mainline soon.
"group: fix cgroup_sk_alloc() for sk_clone_lock()" was merged into 5.8-rc5 about a few hours after my previous comment: https://lore.kernel.org/lkml/CAHk-=wgX5+Q_trdMPaaQZmko0Og_eqAYoyMa_8S3ie+1Us6rkw@mail.gmail.com/
Any chance it will make it into the current 5.7? It's kind of a major issue to leave for an additional few weeks.
It should be in 5.7.10, which will probably be released in a few more days. It's currently in the stable queue[1]. [1]: https://git.kernel.org/pub/scm/linux/kernel/git/stable/stable-queue.git/tree/queue-5.7/cgroup-fix-cgroup_sk_alloc-for-sk_clone_lock.patch
Seems that the version 5.4.53 includes the fix: 94886c86e833dbc8995202b6c6aaff592b7abd24 https://cdn.kernel.org/pub/linux/kernel/v5.x/ChangeLog-5.4.53
Fix is in 5.7.10. I tested for multiple hours with the mount/umount script and no crashes. Marking the bug as resolved. Thanks to all involved.