[Thu Nov 5 13:14:27 2020] ------------[ cut here ]------------ [Thu Nov 5 13:14:27 2020] WARNING: CPU: 4 PID: 133 at mm/page_counter.c:57 page_counter_uncharge+0x34/0x40 [Thu Nov 5 13:14:27 2020] Modules linked in: tcp_diag udp_diag raw_diag inet_diag netlink_diag nfnetlink_queue xt_nat macvlan veth ip6table_filter ip6_tables nf_conntrack_netlink nfnetlink bridge stp llc 8021q iptable_raw nf_log_ipv4 nf_log_common xt_LOG xt_limit xt_NFQUEUE ipt_REJECT nf_reject_ipv4 xt_mark xt_tcpudp xt_conntrack iptable_filter xt_MASQUERADE xt_addrtype iptable_nat nf_nat iptable_mangle ip_tables x_ tables sch_fq_codel ext4 mbcache jbd2 dm_crypt encrypted_keys wireguard curve25519_x86_64 libcurve25519_generic libchacha20poly1305 chacha_x86_64 poly1305_x86_64 ip6_udp_tunnel udp_tunnel libblake2s blake2s_x86 _64 libblake2s_generic ipv6 libchacha amdgpu snd_hda_codec_hdmi nf_conntrack_sip nf_conntrack nf_defrag_ipv6 mfd_core snd_hda_intel gpu_sched nf_defrag_ipv4 snd_intel_dspcfg ttm kvm_amd snd_hda_codec drm_kms_he lper efi_pstore tun snd_hda_core syscopyarea kvm sysfillrect sysimgblt snd_pcm fb_sys_fops irqbypass snd_timer efivars drm snd ccp k10temp tcp_cubic backlight soundcore [Thu Nov 5 13:14:27 2020] sha1_generic cp210x input_leds evdev led_class usbserial bfq acpi_cpufreq button [Thu Nov 5 13:14:27 2020] CPU: 4 PID: 133 Comm: kworker/u16:9 Not tainted 5.9.5 #85 [Thu Nov 5 13:14:27 2020] Hardware name: System manufacturer System Product Name/PRIME X370-PRO, BIOS 5220 09/12/2019 [Thu Nov 5 13:14:27 2020] Workqueue: netns cleanup_net [Thu Nov 5 13:14:27 2020] RIP: 0010:page_counter_uncharge+0x34/0x40 [Thu Nov 5 13:14:27 2020] Code: 85 ff 74 24 4c 89 c8 f0 48 0f c1 07 4c 29 c0 48 89 c1 48 89 c6 e8 7c fe ff ff 48 85 c9 78 0b 48 8b 7f 28 48 85 ff 75 dc f3 c3 <0f> 0b eb f1 0f 1f 84 00 00 00 00 00 48 8b 17 48 3 9 d6 72 41 41 54 [Thu Nov 5 13:14:27 2020] RSP: 0018:ffffa225007c7d30 EFLAGS: 00010082 [Thu Nov 5 13:14:27 2020] RAX: fffffffffffffffe RBX: ffff93a9cd087000 RCX: fffffffffffffffe [Thu Nov 5 13:14:27 2020] RDX: 0000000000000200 RSI: fffffffffffffffe RDI: ffff93a9cd087248 [Thu Nov 5 13:14:27 2020] RBP: 0000000000000002 R08: 0000000000000002 R09: fffffffffffffffe [Thu Nov 5 13:14:27 2020] R10: 0000000000000246 R11: 0000000000000000 R12: 0000000000000518 [Thu Nov 5 13:14:27 2020] R13: 0000000000000488 R14: ffffffffb96869f5 R15: ffff93a9cb992c00 [Thu Nov 5 13:14:27 2020] FS: 0000000000000000(0000) GS:ffff93a9cef00000(0000) knlGS:0000000000000000 [Thu Nov 5 13:14:27 2020] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [Thu Nov 5 13:14:27 2020] CR2: 00007f362472c008 CR3: 00000003f108a000 CR4: 00000000003506e0 [Thu Nov 5 13:14:27 2020] Call Trace: [Thu Nov 5 13:14:27 2020] __memcg_kmem_uncharge+0x4a/0x80 [Thu Nov 5 13:14:27 2020] drain_obj_stock+0x72/0x90 [Thu Nov 5 13:14:27 2020] refill_obj_stock+0x95/0xb0 [Thu Nov 5 13:14:27 2020] kmem_cache_free+0x194/0x390 [Thu Nov 5 13:14:27 2020] __sk_destruct+0x125/0x180 [Thu Nov 5 13:14:27 2020] inet_release+0x48/0x90 [Thu Nov 5 13:14:27 2020] sock_release+0x26/0x80 [Thu Nov 5 13:14:27 2020] ops_exit_list+0x2e/0x60 [Thu Nov 5 13:14:27 2020] cleanup_net+0x1eb/0x310 [Thu Nov 5 13:14:27 2020] process_one_work+0x1b1/0x310 [Thu Nov 5 13:14:27 2020] worker_thread+0x4b/0x400 [Thu Nov 5 13:14:27 2020] ? process_one_work+0x310/0x310 [Thu Nov 5 13:14:27 2020] kthread+0x112/0x130 [Thu Nov 5 13:14:27 2020] ? __kthread_bind_mask+0x90/0x90 [Thu Nov 5 13:14:27 2020] ret_from_fork+0x22/0x30 [Thu Nov 5 13:14:27 2020] ---[ end trace a17bbc8650d8c295 ]---
(switched to email. Please respond via emailed reply-to-all, not via the bugzilla web interface). On Thu, 05 Nov 2020 21:18:05 +0000 bugzilla-daemon@bugzilla.kernel.org wrote: > https://bugzilla.kernel.org/show_bug.cgi?id=210075 > > Bug ID: 210075 > Summary: [Thu Nov 5 13:14:27 2020] WARNING: CPU: 4 PID: 133 at > mm/page_counter.c:57 page_counter_uncharge+0x34/0x40 > Product: Memory Management > Version: 2.5 > Kernel Version: 5.9.5 > Hardware: x86-64 > OS: Linux > Tree: Mainline > Status: NEW > Severity: normal > Priority: P1 > Component: Page Allocator > Assignee: akpm@linux-foundation.org > Reporter: vladi@aresgate.net > Regression: No I'm assuming this is a bug in the networking code. I've seen a number of possibly-related emails fly past - is it familiar to anyone? > [Thu Nov 5 13:14:27 2020] ------------[ cut here ]------------ > [Thu Nov 5 13:14:27 2020] WARNING: CPU: 4 PID: 133 at mm/page_counter.c:57 > page_counter_uncharge+0x34/0x40 > [Thu Nov 5 13:14:27 2020] Modules linked in: tcp_diag udp_diag raw_diag > inet_diag netlink_diag nfnetlink_queue xt_nat macvlan veth ip6table_filter > ip6_tables nf_conntrack_netlink nfnetlink bridge stp llc 8021q > iptable_raw nf_log_ipv4 nf_log_common xt_LOG xt_limit xt_NFQUEUE ipt_REJECT > nf_reject_ipv4 xt_mark xt_tcpudp xt_conntrack iptable_filter xt_MASQUERADE > xt_addrtype iptable_nat nf_nat iptable_mangle ip_tables x_ > tables sch_fq_codel ext4 mbcache jbd2 dm_crypt encrypted_keys wireguard > curve25519_x86_64 libcurve25519_generic libchacha20poly1305 chacha_x86_64 > poly1305_x86_64 ip6_udp_tunnel udp_tunnel libblake2s blake2s_x86 > _64 libblake2s_generic ipv6 libchacha amdgpu snd_hda_codec_hdmi > nf_conntrack_sip nf_conntrack nf_defrag_ipv6 mfd_core snd_hda_intel gpu_sched > nf_defrag_ipv4 snd_intel_dspcfg ttm kvm_amd snd_hda_codec drm_kms_he > lper efi_pstore tun snd_hda_core syscopyarea kvm sysfillrect sysimgblt > snd_pcm > fb_sys_fops irqbypass snd_timer efivars drm snd ccp k10temp tcp_cubic > backlight > soundcore > [Thu Nov 5 13:14:27 2020] sha1_generic cp210x input_leds evdev led_class > usbserial bfq acpi_cpufreq button > [Thu Nov 5 13:14:27 2020] CPU: 4 PID: 133 Comm: kworker/u16:9 Not tainted > 5.9.5 #85 > [Thu Nov 5 13:14:27 2020] Hardware name: System manufacturer System Product > Name/PRIME X370-PRO, BIOS 5220 09/12/2019 > [Thu Nov 5 13:14:27 2020] Workqueue: netns cleanup_net > [Thu Nov 5 13:14:27 2020] RIP: 0010:page_counter_uncharge+0x34/0x40 > [Thu Nov 5 13:14:27 2020] Code: 85 ff 74 24 4c 89 c8 f0 48 0f c1 07 4c 29 c0 > 48 89 c1 48 89 c6 e8 7c fe ff ff 48 85 c9 78 0b 48 8b 7f 28 48 85 ff 75 dc f3 > c3 <0f> 0b eb f1 0f 1f 84 00 00 00 00 00 48 8b 17 48 3 > 9 d6 72 41 41 54 > [Thu Nov 5 13:14:27 2020] RSP: 0018:ffffa225007c7d30 EFLAGS: 00010082 > [Thu Nov 5 13:14:27 2020] RAX: fffffffffffffffe RBX: ffff93a9cd087000 RCX: > fffffffffffffffe > [Thu Nov 5 13:14:27 2020] RDX: 0000000000000200 RSI: fffffffffffffffe RDI: > ffff93a9cd087248 > [Thu Nov 5 13:14:27 2020] RBP: 0000000000000002 R08: 0000000000000002 R09: > fffffffffffffffe > [Thu Nov 5 13:14:27 2020] R10: 0000000000000246 R11: 0000000000000000 R12: > 0000000000000518 > [Thu Nov 5 13:14:27 2020] R13: 0000000000000488 R14: ffffffffb96869f5 R15: > ffff93a9cb992c00 > [Thu Nov 5 13:14:27 2020] FS: 0000000000000000(0000) > GS:ffff93a9cef00000(0000) knlGS:0000000000000000 > [Thu Nov 5 13:14:27 2020] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > [Thu Nov 5 13:14:27 2020] CR2: 00007f362472c008 CR3: 00000003f108a000 CR4: > 00000000003506e0 > [Thu Nov 5 13:14:27 2020] Call Trace: > [Thu Nov 5 13:14:27 2020] __memcg_kmem_uncharge+0x4a/0x80 > [Thu Nov 5 13:14:27 2020] drain_obj_stock+0x72/0x90 > [Thu Nov 5 13:14:27 2020] refill_obj_stock+0x95/0xb0 > [Thu Nov 5 13:14:27 2020] kmem_cache_free+0x194/0x390 > [Thu Nov 5 13:14:27 2020] __sk_destruct+0x125/0x180 > [Thu Nov 5 13:14:27 2020] inet_release+0x48/0x90 > [Thu Nov 5 13:14:27 2020] sock_release+0x26/0x80 > [Thu Nov 5 13:14:27 2020] ops_exit_list+0x2e/0x60 > [Thu Nov 5 13:14:27 2020] cleanup_net+0x1eb/0x310 > [Thu Nov 5 13:14:27 2020] process_one_work+0x1b1/0x310 > [Thu Nov 5 13:14:27 2020] worker_thread+0x4b/0x400 > [Thu Nov 5 13:14:27 2020] ? process_one_work+0x310/0x310 > [Thu Nov 5 13:14:27 2020] kthread+0x112/0x130 > [Thu Nov 5 13:14:27 2020] ? __kthread_bind_mask+0x90/0x90 > [Thu Nov 5 13:14:27 2020] ret_from_fork+0x22/0x30 > [Thu Nov 5 13:14:27 2020] ---[ end trace a17bbc8650d8c295 ]--- > > -- > You are receiving this mail because: > You are the assignee for the bug.
Looking to jump back into some kernel hacking again so thought I'd take a quick rusty look. Pattern matching a bit but I wonder whether f2fe7b09 (mm: memcg/slab: charge individual slab objects instead of pages) may have had a role in this bug as it adds an obj_cgroup_uncharge() invocation to memcg_slab_free_hook() invoked from kmem_cache_free(). sk_prot_free() also invokes mem_cgroup_sk_free() before kmem_cache_free() so perhaps an uncharge is getting doubled up here? I traced through mem_cgroup_sk_free() (which invokes css_put()) but couldn't see where it would result in an additional uncharge so I may be barking up the wrong tree here. I'd be more than happy to have a deeper look at this if vladi has some code that repro's this + a .config, if that'd be helpful. Best, Lorenzo
Created attachment 293573 [details] Kernel Config
Not sure on what reproduces this, I see it when the server boots. 5.9.x is not stable for me might be due to this not sure I went back to 5.8. Let me know how else I can help. Thanks!
[Cc Roman and Shakeel] On Fri 06-11-20 21:13:00, Andrew Morton wrote: > (switched to email. Please respond via emailed reply-to-all, not via the > bugzilla web interface). > > On Thu, 05 Nov 2020 21:18:05 +0000 bugzilla-daemon@bugzilla.kernel.org wrote: > > > https://bugzilla.kernel.org/show_bug.cgi?id=210075 > > > > Bug ID: 210075 > > Summary: [Thu Nov 5 13:14:27 2020] WARNING: CPU: 4 PID: 133 at > > mm/page_counter.c:57 page_counter_uncharge+0x34/0x40 > > Product: Memory Management > > Version: 2.5 > > Kernel Version: 5.9.5 > > Hardware: x86-64 > > OS: Linux > > Tree: Mainline > > Status: NEW > > Severity: normal > > Priority: P1 > > Component: Page Allocator > > Assignee: akpm@linux-foundation.org > > Reporter: vladi@aresgate.net > > Regression: No > > I'm assuming this is a bug in the networking code. I've seen a number > of possibly-related emails fly past - is it familiar to anyone? Looks similar to 8de15e920dc8 ("mm: memcg: link page counters to root if use_hierarchy is false"). The path is different so the underlying reason might be something else. > > [Thu Nov 5 13:14:27 2020] ------------[ cut here ]------------ > > [Thu Nov 5 13:14:27 2020] WARNING: CPU: 4 PID: 133 at mm/page_counter.c:57 > > page_counter_uncharge+0x34/0x40 > > [Thu Nov 5 13:14:27 2020] Modules linked in: tcp_diag udp_diag raw_diag > > inet_diag netlink_diag nfnetlink_queue xt_nat macvlan veth ip6table_filter > > ip6_tables nf_conntrack_netlink nfnetlink bridge stp llc 8021q > > iptable_raw nf_log_ipv4 nf_log_common xt_LOG xt_limit xt_NFQUEUE > ipt_REJECT > > nf_reject_ipv4 xt_mark xt_tcpudp xt_conntrack iptable_filter xt_MASQUERADE > > xt_addrtype iptable_nat nf_nat iptable_mangle ip_tables x_ > > tables sch_fq_codel ext4 mbcache jbd2 dm_crypt encrypted_keys wireguard > > curve25519_x86_64 libcurve25519_generic libchacha20poly1305 chacha_x86_64 > > poly1305_x86_64 ip6_udp_tunnel udp_tunnel libblake2s blake2s_x86 > > _64 libblake2s_generic ipv6 libchacha amdgpu snd_hda_codec_hdmi > > nf_conntrack_sip nf_conntrack nf_defrag_ipv6 mfd_core snd_hda_intel > gpu_sched > > nf_defrag_ipv4 snd_intel_dspcfg ttm kvm_amd snd_hda_codec drm_kms_he > > lper efi_pstore tun snd_hda_core syscopyarea kvm sysfillrect sysimgblt > snd_pcm > > fb_sys_fops irqbypass snd_timer efivars drm snd ccp k10temp tcp_cubic > backlight > > soundcore > > [Thu Nov 5 13:14:27 2020] sha1_generic cp210x input_leds evdev led_class > > usbserial bfq acpi_cpufreq button > > [Thu Nov 5 13:14:27 2020] CPU: 4 PID: 133 Comm: kworker/u16:9 Not tainted > > 5.9.5 #85 > > [Thu Nov 5 13:14:27 2020] Hardware name: System manufacturer System > Product > > Name/PRIME X370-PRO, BIOS 5220 09/12/2019 > > [Thu Nov 5 13:14:27 2020] Workqueue: netns cleanup_net > > [Thu Nov 5 13:14:27 2020] RIP: 0010:page_counter_uncharge+0x34/0x40 > > [Thu Nov 5 13:14:27 2020] Code: 85 ff 74 24 4c 89 c8 f0 48 0f c1 07 4c 29 > c0 > > 48 89 c1 48 89 c6 e8 7c fe ff ff 48 85 c9 78 0b 48 8b 7f 28 48 85 ff 75 dc > f3 > > c3 <0f> 0b eb f1 0f 1f 84 00 00 00 00 00 48 8b 17 48 3 > > 9 d6 72 41 41 54 > > [Thu Nov 5 13:14:27 2020] RSP: 0018:ffffa225007c7d30 EFLAGS: 00010082 > > [Thu Nov 5 13:14:27 2020] RAX: fffffffffffffffe RBX: ffff93a9cd087000 RCX: > > fffffffffffffffe > > [Thu Nov 5 13:14:27 2020] RDX: 0000000000000200 RSI: fffffffffffffffe RDI: > > ffff93a9cd087248 > > [Thu Nov 5 13:14:27 2020] RBP: 0000000000000002 R08: 0000000000000002 R09: > > fffffffffffffffe > > [Thu Nov 5 13:14:27 2020] R10: 0000000000000246 R11: 0000000000000000 R12: > > 0000000000000518 > > [Thu Nov 5 13:14:27 2020] R13: 0000000000000488 R14: ffffffffb96869f5 R15: > > ffff93a9cb992c00 > > [Thu Nov 5 13:14:27 2020] FS: 0000000000000000(0000) > > GS:ffff93a9cef00000(0000) knlGS:0000000000000000 > > [Thu Nov 5 13:14:27 2020] CS: 0010 DS: 0000 ES: 0000 CR0: > 0000000080050033 > > [Thu Nov 5 13:14:27 2020] CR2: 00007f362472c008 CR3: 00000003f108a000 CR4: > > 00000000003506e0 > > [Thu Nov 5 13:14:27 2020] Call Trace: > > [Thu Nov 5 13:14:27 2020] __memcg_kmem_uncharge+0x4a/0x80 > > [Thu Nov 5 13:14:27 2020] drain_obj_stock+0x72/0x90 > > [Thu Nov 5 13:14:27 2020] refill_obj_stock+0x95/0xb0 > > [Thu Nov 5 13:14:27 2020] kmem_cache_free+0x194/0x390 > > [Thu Nov 5 13:14:27 2020] __sk_destruct+0x125/0x180 > > [Thu Nov 5 13:14:27 2020] inet_release+0x48/0x90 > > [Thu Nov 5 13:14:27 2020] sock_release+0x26/0x80 > > [Thu Nov 5 13:14:27 2020] ops_exit_list+0x2e/0x60 > > [Thu Nov 5 13:14:27 2020] cleanup_net+0x1eb/0x310 > > [Thu Nov 5 13:14:27 2020] process_one_work+0x1b1/0x310 > > [Thu Nov 5 13:14:27 2020] worker_thread+0x4b/0x400 > > [Thu Nov 5 13:14:27 2020] ? process_one_work+0x310/0x310 > > [Thu Nov 5 13:14:27 2020] kthread+0x112/0x130 > > [Thu Nov 5 13:14:27 2020] ? __kthread_bind_mask+0x90/0x90 > > [Thu Nov 5 13:14:27 2020] ret_from_fork+0x22/0x30 > > [Thu Nov 5 13:14:27 2020] ---[ end trace a17bbc8650d8c295 ]--- > > > > -- > > You are receiving this mail because: > > You are the assignee for the bug.
On Mon, Nov 9, 2020 at 12:16 AM Michal Hocko <mhocko@suse.com> wrote: > > [Cc Roman and Shakeel] > > On Fri 06-11-20 21:13:00, Andrew Morton wrote: > > (switched to email. Please respond via emailed reply-to-all, not via the > > bugzilla web interface). > > > > On Thu, 05 Nov 2020 21:18:05 +0000 bugzilla-daemon@bugzilla.kernel.org > wrote: > > > > > https://bugzilla.kernel.org/show_bug.cgi?id=210075 > > > > > > Bug ID: 210075 > > > Summary: [Thu Nov 5 13:14:27 2020] WARNING: CPU: 4 PID: 133 > at > > > mm/page_counter.c:57 page_counter_uncharge+0x34/0x40 > > > Product: Memory Management > > > Version: 2.5 > > > Kernel Version: 5.9.5 > > > Hardware: x86-64 > > > OS: Linux > > > Tree: Mainline > > > Status: NEW > > > Severity: normal > > > Priority: P1 > > > Component: Page Allocator > > > Assignee: akpm@linux-foundation.org > > > Reporter: vladi@aresgate.net > > > Regression: No > > > > I'm assuming this is a bug in the networking code. I've seen a number > > of possibly-related emails fly past - is it familiar to anyone? > > Looks similar to 8de15e920dc8 ("mm: memcg: link page counters to root if > use_hierarchy is false"). The path is different so the underlying reason > might be something else. > The commit 8de15e920dc8 is not in 5.9.5. Is the issue reproducible and bisectable?
Issue seems gone with 5.9.9.