Bug 208107 - hard lock with CONFIG_CGROUP_NET_PRIO enabled
Summary: hard lock with CONFIG_CGROUP_NET_PRIO enabled
Status: NEW
Alias: None
Product: Networking
Classification: Unclassified
Component: Other (show other bugs)
Hardware: All Linux
: P1 normal
Assignee: Stephen Hemminger
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2020-06-08 21:53 UTC by Cameron Berkenpas
Modified: 2020-06-24 22:44 UTC (History)
2 users (show)

See Also:
Kernel Version: 5.4.42
Subsystem:
Regression: No
Bisected commit-id:


Attachments
The patch fixes the issue (614 bytes, patch)
2020-06-16 16:48 UTC, Cameron Berkenpas
Details | Diff

Description Cameron Berkenpas 2020-06-08 21:53:11 UTC
A bug was introduced with commit fc800ec491c39e42b65df72dc9ede3bb2d4a3755 where my NFS server (which supports Kerberos) will lock up the system after a client has mounted volumes export by the NFS server.

Unsetting CONFIG_CGROUP_NET_PRIO or reverting fc800ec491c39e42b65df72dc9ede3bb2d4a3755 resolves the issue.

I get nothing on the console or in the logs once this occurs. The system is locked until it finally restarts on its own or through user intervention. The client machine I used for testing has CONFIG_CGROUP_NET_PRIO enabled in the kernel, but it causes no issues there.

I suspect the Kerberos functionality has no impact on this issue, but I'm presently unable to disable it.

I'm not actually using the functionality provided by CONFIG_CGROUP_NET_PRIO, so I've disabled it for now.

All kernels since v5.4.42 are affected. By disabling this feature or reverting fc800ec491c39e42b65df72dc9ede3bb2d4a3755 allows 5.4.2+ to work. I'm currently running stable with v5.4.45.

The 5.5 and 5.6 series are unsurprisingly affected, and I suspect both series would work with either of the fixes above.

Some info about the NFS server:
Debian 10 Buster
AMD EPYC 3251 8-core processor
128GB memory
Intel X540 10GB NIC PCIE nic (only 1 port is used)
2x Intel Corporation I350 Gigabit onboard NICs (both are used)

All NIC's are used in individual bridge interfaces for a total of 3 interfacs (br0, br1, br2) to facilitate virtualization.

More info furnished upon request.
Comment 1 Cong Wang 2020-06-09 17:49:10 UTC
Do you have LOCKDEP enabled, that is CONFIG_LOCKDEP=y in your kernel config? It helps a lot to debug deadlocks. If not, can you enable it and test again?

How reproducible is this? If you can find a minimum reproducer, that would help a lot to narrow down the problem.

Thanks.
Comment 2 Cong Wang 2020-06-09 19:01:18 UTC
And make sure you have lockup detectors enabled:

CONFIG_LOCKUP_DETECTOR=y
CONFIG_SOFTLOCKUP_DETECTOR=y
CONFIG_HARDLOCKUP_DETECTOR=y
Comment 3 Cameron Berkenpas 2020-06-09 19:07:14 UTC
LOCKDEP support is definiteld already enabled:
CONFIG_LOCKDEP_SUPPORT=y

It's very reproducible, but slightly intermittent. Sometimes the box 
will stay up even 30 minutes... But usually I'm able to get it to lock 
up in under a minute.

I am mounting the volumes via automount, and I'm using NFSv4. The 
problem seems to occur no matter if it's 4.0, 4.1, or 4.2. I haven't 
tried NFSv3 as it doesn't appear to be an issue caused by NFS at this point.

How I reproduce:
I simply rm the kerberos file for my user from /tmp, mount the volumes 
by trying to access them as a regular user (I'm using automounter), and 
if it doesn't locked up within around 10 seconds after mounting the 
volumes, I umount them all, and repeat the process. 95% of the time, I'm 
able to reproduce the crash after repeating this process 1-3 times.

Removing the kerberos file is probably unnecessary.

If I get a crash, clearly the kernel is bad... If I don't, I try to 
leave the box up for at least an hour... My initial bisection attempt 
failed because I wasn't able to reproduce the issue within the first 10 
minutes for what turned out to be a bad revision.

Even though it seems to not lockup nearly as fast ~5% of the time, the 
box will still eventually lock up given enough time.

Let me know if there are any other kernel options you'd like me to check 
for.

Thanks!

On 6/9/2020 10:49 AM, bugzilla-daemon@bugzilla.kernel.org wrote:
> https://bugzilla.kernel.org/show_bug.cgi?id=208107
>
> Cong Wang (xiyou.wangcong@gmail.com) changed:
>
>             What    |Removed                     |Added
> ----------------------------------------------------------------------------
>                   CC|                            |xiyou.wangcong@gmail.com
>
> --- Comment #1 from Cong Wang (xiyou.wangcong@gmail.com) ---
> Do you have LOCKDEP enabled, that is CONFIG_LOCKDEP=y in your kernel config?
> It
> helps a lot to debug deadlocks. If not, can you enable it and test again?
>
> How reproducible is this? If you can find a minimum reproducer, that would
> help
> a lot to narrow down the problem.
>
> Thanks.
>
Comment 4 Cameron Berkenpas 2020-06-11 17:53:59 UTC
I enabled those options, but I still get no output on the console or in the logs.

I also sent you my kernel config directly in case it would be faster to figure out what, if anything, is still missing.
Comment 5 Cameron Berkenpas 2020-06-13 00:00:44 UTC
Setup kernel crash dumps.

Here's the relevant output of the dmesg captured from the crash:
[  457.038422] BUG: kernel NULL pointer dereference, address: 0000000000000010
[  457.038479] #PF: supervisor read access in kernel mode
[  457.038513] #PF: error_code(0x0000) - not-present page
[  457.038547] PGD 0 P4D 0
[  457.038568] Oops: 0000 [#1] SMP
[  457.038592] CPU: 3 PID: 0 Comm: swapper/3 Kdump: loaded Not tainted 5.4.46-broken #2
[  457.038640] Hardware name: Supermicro Super Server/M11SDV-8C+-LN4F, BIOS 1.0 01/30/2019
[  457.038696] RIP: 0010:__cgroup_bpf_run_filter_skb+0xe7/0x3e0
[  457.038735] Code: 4e 70 41 2b 4e 74 48 89 5c 24 20 48 01 c8 41 83 fd 01 49 89 46 50 0f 84 96 01 00 00 44 89 ea 48 8d 84 d6 18 06 00 00 48 8b 00 <4c> 8b 78 10 4c 8d 68 10 4d 85 ff 0f 84 c8 02 00 00 49 8d 46 30 bb
[  457.038845] RSP: 0018:ffff99b39ecc5780 EFLAGS: 00010297
[  457.038878] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 000000000000013c
[  457.038922] RDX: 0000000000000000 RSI: ffff99b36aebc000 RDI: ffff99b33fad3400
[  457.038966] RBP: ffff99b33fad3400 R08: 0000000000000001 R09: ffff99b24684e500
[  457.039009] R10: 0000000000000000 R11: ffff99b397d800a0 R12: ffff99b33fad3400
[  457.039053] R13: 0000000000000000 R14: ffff99b24684e500 R15: ffff99b35e4f60e2
[  457.039097] FS:  0000000000000000(0000) GS:ffff99b39ecc0000(0000) knlGS:0000000000000000
[  457.039146] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  457.039182] CR2: 0000000000000010 CR3: 0000001d9e26b000 CR4: 00000000003406e0
[  457.039225] Call Trace:
[  457.039245]  <IRQ>
[  457.039265]  ? ixgbe_xmit_frame_ring+0x509/0xea0
[  457.039299]  sk_filter_trim_cap+0x10c/0x230
[  457.039331]  ? tcp_v4_inbound_md5_hash+0x58/0x190
[  457.039362]  tcp_v4_rcv+0xa66/0xc50
[  457.039391]  ip_protocol_deliver_rcu+0x2c/0x1c0
[  457.039423]  ip_local_deliver_finish+0x44/0x50
[  457.039453]  ip_local_deliver+0xe0/0xf0
[  457.039481]  ? ip_protocol_deliver_rcu+0x1c0/0x1c0
[  457.039516]  ip_sabotage_in+0x55/0x60
[  457.041021]  nf_hook_slow+0x52/0xd0
[  457.042513]  ip_rcv+0x9c/0xe0
[  457.043994]  ? ip_rcv_finish_core.isra.22+0x3b0/0x3b0
[  457.045474]  __netif_receive_skb_one_core+0x85/0xa0
[  457.046939]  netif_receive_skb_internal+0x2f/0xa0
[  457.048401]  netif_receive_skb+0x1b/0xb0
[  457.049816]  br_pass_frame_up+0x104/0x110
[  457.051188]  ? br_handle_local_finish+0x20/0x20
[  457.052544]  br_handle_frame_finish+0x2b3/0x420
[  457.053896]  ? br_nf_forward_finish+0x129/0x1b0
[  457.055240]  ? br_dev_queue_push_xmit+0x150/0x150
[  457.056546]  ? br_pass_frame_up+0x110/0x110
[  457.057808]  br_nf_hook_thresh+0xda/0xf0
[  457.059049]  ? br_pass_frame_up+0x110/0x110
[  457.060286]  br_nf_pre_routing_finish+0x142/0x340
[  457.061524]  ? br_pass_frame_up+0x110/0x110
[  457.062725]  ? nf_nat_ipv4_in+0x2d/0x80 [nf_nat]
[  457.063885]  br_nf_pre_routing+0x224/0x4e8
[  457.065023]  ? br_nf_forward_ip+0x480/0x480
[  457.066146]  br_handle_frame+0x1d4/0x370
[  457.067266]  ? br_pass_frame_up+0x110/0x110
[  457.068381]  __netif_receive_skb_core+0x283/0xc50
[  457.069507]  __netif_receive_skb_one_core+0x3c/0xa0
[  457.070610]  netif_receive_skb_internal+0x2f/0xa0
[  457.071696]  napi_gro_receive+0xed/0x150
[  457.072761]  ixgbe_poll+0x6f1/0x1280
[  457.073791]  ? enqueue_entity+0x410/0x8f0
[  457.074786]  ? check_preempt_curr+0x7a/0x90
[  457.075757]  net_rx_action+0x136/0x370
[  457.076716]  __do_softirq+0xda/0x2d1
[  457.077667]  irq_exit+0xa5/0xb0
[  457.078605]  do_IRQ+0x59/0xf0
[  457.079529]  common_interrupt+0xf/0xf
[  457.080455]  </IRQ>
[  457.081360] RIP: 0010:cpuidle_enter_state+0xb4/0x440
[  457.082269] Code: 24 0f 1f 44 00 00 31 ff e8 69 7c 90 ff 80 7c 24 13 00 74 12 9c 58 f6 c4 02 0f 85 5c 03 00 00 31 ff e8 30 90 96 ff fb 45 85 e4 <0f> 88 8f 02 00 00 49 63 cc 48 8b 34 24 48 2b 74 24 08 48 8d 04 49
[  457.084188] RSP: 0018:ffff99b398aabe78 EFLAGS: 00000202 ORIG_RAX: ffffffffffffffda
[  457.085182] RAX: ffff99b39ece8f00 RBX: ffffffffb52c0b80 RCX: 000000000000001f
[  457.086190] RDX: 0000006a699c1a5b RSI: 000000003333348b RDI: 0000000000000000
[  457.087203] RBP: ffff99b393419400 R08: 0000000000000002 R09: 0000000000028780
[  457.088217] R10: 00000159023d5e8a R11: ffff99b39ece7fc0 R12: 0000000000000002
[  457.089237] R13: ffffffffb52c0c58 R14: 0000000000000002 R15: 0000000000000000
[  457.090265]  ? cpuidle_enter_state+0x97/0x440
[  457.091294]  cpuidle_enter+0x35/0x50
[  457.092328]  do_idle+0x1f8/0x230
[  457.093353]  cpu_startup_entry+0x20/0x30
[  457.094381]  start_secondary+0x143/0x170
[  457.095407]  secondary_startup_64+0xa4/0xb0
[  457.096436] Modules linked in: vhost_net vhost tap xt_conntrack nft_counter nft_chain_nat xt_MASQUERADE nf_nat nft_compat nf_tables nfnetlink ipmi_si ipmi_devintf ipmi_msghandler btrfs zstd_decompress zstd_compress zlib_deflate raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq raid0 multipath linear raid1 md_mod
[  457.099853] CR2: 0000000000000010


Here's the result of the crash command:
crash /usr/lib/debug/lib/modules/5.4.46-broken/vmlinux /var/crash/202006121154/dump.202006121154

crash 7.2.5
Copyright (C) 2002-2019  Red Hat, Inc.
Copyright (C) 2004, 2005, 2006, 2010  IBM Corporation
Copyright (C) 1999-2006  Hewlett-Packard Co
Copyright (C) 2005, 2006, 2011, 2012  Fujitsu Limited
Copyright (C) 2006, 2007  VA Linux Systems Japan K.K.
Copyright (C) 2005, 2011  NEC Corporation
Copyright (C) 1999, 2002, 2007  Silicon Graphics, Inc.
Copyright (C) 1999, 2000, 2001, 2002  Mission Critical Linux, Inc.
This program is free software, covered by the GNU General Public License,
and you are welcome to change it and/or distribute copies of it under
certain conditions.  Enter "help copying" to see the conditions.
This program has absolutely no warranty.  Enter "help warranty" for details.

GNU gdb (GDB) 7.6
Copyright (C) 2013 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-unknown-linux-gnu"...

WARNING: kernel relocated [814MB]: patching 103638 gdb minimal_symbol values

WARNING: could not find MAGIC_START!
crash: page excluded: kernel virtual address: ffffffffb40f8560 type: "framepointer check"
crash: page excluded: kernel virtual address: ffffffffb485ebb0 type: "gdb_readmem_callback"

crash: recursive temporary file usage
Comment 6 Cameron Berkenpas 2020-06-16 16:48:36 UTC
Created attachment 289707 [details]
The patch fixes the issue

This is the patch Cong Wang provided to me. Apparently this was a known upstream issue.

My box has been up for ~4 days now using this patch with this feature enabled in the kernel without issue.

Note You need to log in before you can comment on or make changes to this bug.