210075 – [Thu Nov 5 13:14:27 2020] WARNING: CPU: 4 PID: 133 at mm/page_counter.c:57 page_counter_uncharge+0x34/0x40

Bug 210075 - [Thu Nov 5 13:14:27 2020] WARNING: CPU: 4 PID: 133 at mm/page_counter.c:57 page_counter_uncharge+0x34/0x40

Summary: [Thu Nov 5 13:14:27 2020] WARNING: CPU: 4 PID: 133 at mm/page_counter.c:57 p...

Status:	NEW

Alias:	None

Product:	Memory Management
Classification:	Unclassified
Component:	Page Allocator (show other bugs)
Hardware:	x86-64 Linux

Importance:	P1 normal
Assignee:	Andrew Morton

URL:
Keywords:

Depends on:
Blocks:

Reported:	2020-11-05 21:18 UTC by vladi
Modified:	2020-11-27 01:25 UTC (History)
CC List:	0 users

See Also:
Kernel Version:	5.9.5
Subsystem:
Regression:	No
Bisected commit-id:

Attachments
Kernel Config (112.50 KB, text/plain) 2020-11-08 18:00 UTC, vladi	Details
Add an attachment (proposed patch, testcase, etc.)

Description vladi 2020-11-05 21:18:05 UTC

[Thu Nov  5 13:14:27 2020] ------------[ cut here ]------------
[Thu Nov  5 13:14:27 2020] WARNING: CPU: 4 PID: 133 at mm/page_counter.c:57 page_counter_uncharge+0x34/0x40
[Thu Nov  5 13:14:27 2020] Modules linked in: tcp_diag udp_diag raw_diag inet_diag netlink_diag nfnetlink_queue xt_nat macvlan veth ip6table_filter ip6_tables nf_conntrack_netlink nfnetlink bridge stp llc 8021q
 iptable_raw nf_log_ipv4 nf_log_common xt_LOG xt_limit xt_NFQUEUE ipt_REJECT nf_reject_ipv4 xt_mark xt_tcpudp xt_conntrack iptable_filter xt_MASQUERADE xt_addrtype iptable_nat nf_nat iptable_mangle ip_tables x_
tables sch_fq_codel ext4 mbcache jbd2 dm_crypt encrypted_keys wireguard curve25519_x86_64 libcurve25519_generic libchacha20poly1305 chacha_x86_64 poly1305_x86_64 ip6_udp_tunnel udp_tunnel libblake2s blake2s_x86
_64 libblake2s_generic ipv6 libchacha amdgpu snd_hda_codec_hdmi nf_conntrack_sip nf_conntrack nf_defrag_ipv6 mfd_core snd_hda_intel gpu_sched nf_defrag_ipv4 snd_intel_dspcfg ttm kvm_amd snd_hda_codec drm_kms_he
lper efi_pstore tun snd_hda_core syscopyarea kvm sysfillrect sysimgblt snd_pcm fb_sys_fops irqbypass snd_timer efivars drm snd ccp k10temp tcp_cubic backlight soundcore
[Thu Nov  5 13:14:27 2020]  sha1_generic cp210x input_leds evdev led_class usbserial bfq acpi_cpufreq button
[Thu Nov  5 13:14:27 2020] CPU: 4 PID: 133 Comm: kworker/u16:9 Not tainted 5.9.5 #85
[Thu Nov  5 13:14:27 2020] Hardware name: System manufacturer System Product Name/PRIME X370-PRO, BIOS 5220 09/12/2019
[Thu Nov  5 13:14:27 2020] Workqueue: netns cleanup_net
[Thu Nov  5 13:14:27 2020] RIP: 0010:page_counter_uncharge+0x34/0x40
[Thu Nov  5 13:14:27 2020] Code: 85 ff 74 24 4c 89 c8 f0 48 0f c1 07 4c 29 c0 48 89 c1 48 89 c6 e8 7c fe ff ff 48 85 c9 78 0b 48 8b 7f 28 48 85 ff 75 dc f3 c3 <0f> 0b eb f1 0f 1f 84 00 00 00 00 00 48 8b 17 48 3
9 d6 72 41 41 54
[Thu Nov  5 13:14:27 2020] RSP: 0018:ffffa225007c7d30 EFLAGS: 00010082
[Thu Nov  5 13:14:27 2020] RAX: fffffffffffffffe RBX: ffff93a9cd087000 RCX: fffffffffffffffe
[Thu Nov  5 13:14:27 2020] RDX: 0000000000000200 RSI: fffffffffffffffe RDI: ffff93a9cd087248
[Thu Nov  5 13:14:27 2020] RBP: 0000000000000002 R08: 0000000000000002 R09: fffffffffffffffe
[Thu Nov  5 13:14:27 2020] R10: 0000000000000246 R11: 0000000000000000 R12: 0000000000000518
[Thu Nov  5 13:14:27 2020] R13: 0000000000000488 R14: ffffffffb96869f5 R15: ffff93a9cb992c00
[Thu Nov  5 13:14:27 2020] FS:  0000000000000000(0000) GS:ffff93a9cef00000(0000) knlGS:0000000000000000
[Thu Nov  5 13:14:27 2020] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[Thu Nov  5 13:14:27 2020] CR2: 00007f362472c008 CR3: 00000003f108a000 CR4: 00000000003506e0
[Thu Nov  5 13:14:27 2020] Call Trace:
[Thu Nov  5 13:14:27 2020]  __memcg_kmem_uncharge+0x4a/0x80
[Thu Nov  5 13:14:27 2020]  drain_obj_stock+0x72/0x90
[Thu Nov  5 13:14:27 2020]  refill_obj_stock+0x95/0xb0
[Thu Nov  5 13:14:27 2020]  kmem_cache_free+0x194/0x390
[Thu Nov  5 13:14:27 2020]  __sk_destruct+0x125/0x180
[Thu Nov  5 13:14:27 2020]  inet_release+0x48/0x90
[Thu Nov  5 13:14:27 2020]  sock_release+0x26/0x80
[Thu Nov  5 13:14:27 2020]  ops_exit_list+0x2e/0x60
[Thu Nov  5 13:14:27 2020]  cleanup_net+0x1eb/0x310
[Thu Nov  5 13:14:27 2020]  process_one_work+0x1b1/0x310
[Thu Nov  5 13:14:27 2020]  worker_thread+0x4b/0x400
[Thu Nov  5 13:14:27 2020]  ? process_one_work+0x310/0x310
[Thu Nov  5 13:14:27 2020]  kthread+0x112/0x130
[Thu Nov  5 13:14:27 2020]  ? __kthread_bind_mask+0x90/0x90
[Thu Nov  5 13:14:27 2020]  ret_from_fork+0x22/0x30
[Thu Nov  5 13:14:27 2020] ---[ end trace a17bbc8650d8c295 ]---

Comment 1 Andrew Morton 2020-11-07 05:13:02 UTC

(switched to email.  Please respond via emailed reply-to-all, not via the
bugzilla web interface).

On Thu, 05 Nov 2020 21:18:05 +0000 bugzilla-daemon@bugzilla.kernel.org wrote:

> https://bugzilla.kernel.org/show_bug.cgi?id=210075
> 
>             Bug ID: 210075
>            Summary: [Thu Nov  5 13:14:27 2020] WARNING: CPU: 4 PID: 133 at
>                     mm/page_counter.c:57 page_counter_uncharge+0x34/0x40
>            Product: Memory Management
>            Version: 2.5
>     Kernel Version: 5.9.5
>           Hardware: x86-64
>                 OS: Linux
>               Tree: Mainline
>             Status: NEW
>           Severity: normal
>           Priority: P1
>          Component: Page Allocator
>           Assignee: akpm@linux-foundation.org
>           Reporter: vladi@aresgate.net
>         Regression: No

I'm assuming this is a bug in the networking code.  I've seen a number
of possibly-related emails fly past - is it familiar to anyone?


> [Thu Nov  5 13:14:27 2020] ------------[ cut here ]------------
> [Thu Nov  5 13:14:27 2020] WARNING: CPU: 4 PID: 133 at mm/page_counter.c:57
> page_counter_uncharge+0x34/0x40
> [Thu Nov  5 13:14:27 2020] Modules linked in: tcp_diag udp_diag raw_diag
> inet_diag netlink_diag nfnetlink_queue xt_nat macvlan veth ip6table_filter
> ip6_tables nf_conntrack_netlink nfnetlink bridge stp llc 8021q
>  iptable_raw nf_log_ipv4 nf_log_common xt_LOG xt_limit xt_NFQUEUE ipt_REJECT
> nf_reject_ipv4 xt_mark xt_tcpudp xt_conntrack iptable_filter xt_MASQUERADE
> xt_addrtype iptable_nat nf_nat iptable_mangle ip_tables x_
> tables sch_fq_codel ext4 mbcache jbd2 dm_crypt encrypted_keys wireguard
> curve25519_x86_64 libcurve25519_generic libchacha20poly1305 chacha_x86_64
> poly1305_x86_64 ip6_udp_tunnel udp_tunnel libblake2s blake2s_x86
> _64 libblake2s_generic ipv6 libchacha amdgpu snd_hda_codec_hdmi
> nf_conntrack_sip nf_conntrack nf_defrag_ipv6 mfd_core snd_hda_intel gpu_sched
> nf_defrag_ipv4 snd_intel_dspcfg ttm kvm_amd snd_hda_codec drm_kms_he
> lper efi_pstore tun snd_hda_core syscopyarea kvm sysfillrect sysimgblt
> snd_pcm
> fb_sys_fops irqbypass snd_timer efivars drm snd ccp k10temp tcp_cubic
> backlight
> soundcore
> [Thu Nov  5 13:14:27 2020]  sha1_generic cp210x input_leds evdev led_class
> usbserial bfq acpi_cpufreq button
> [Thu Nov  5 13:14:27 2020] CPU: 4 PID: 133 Comm: kworker/u16:9 Not tainted
> 5.9.5 #85
> [Thu Nov  5 13:14:27 2020] Hardware name: System manufacturer System Product
> Name/PRIME X370-PRO, BIOS 5220 09/12/2019
> [Thu Nov  5 13:14:27 2020] Workqueue: netns cleanup_net
> [Thu Nov  5 13:14:27 2020] RIP: 0010:page_counter_uncharge+0x34/0x40
> [Thu Nov  5 13:14:27 2020] Code: 85 ff 74 24 4c 89 c8 f0 48 0f c1 07 4c 29 c0
> 48 89 c1 48 89 c6 e8 7c fe ff ff 48 85 c9 78 0b 48 8b 7f 28 48 85 ff 75 dc f3
> c3 <0f> 0b eb f1 0f 1f 84 00 00 00 00 00 48 8b 17 48 3
> 9 d6 72 41 41 54
> [Thu Nov  5 13:14:27 2020] RSP: 0018:ffffa225007c7d30 EFLAGS: 00010082
> [Thu Nov  5 13:14:27 2020] RAX: fffffffffffffffe RBX: ffff93a9cd087000 RCX:
> fffffffffffffffe
> [Thu Nov  5 13:14:27 2020] RDX: 0000000000000200 RSI: fffffffffffffffe RDI:
> ffff93a9cd087248
> [Thu Nov  5 13:14:27 2020] RBP: 0000000000000002 R08: 0000000000000002 R09:
> fffffffffffffffe
> [Thu Nov  5 13:14:27 2020] R10: 0000000000000246 R11: 0000000000000000 R12:
> 0000000000000518
> [Thu Nov  5 13:14:27 2020] R13: 0000000000000488 R14: ffffffffb96869f5 R15:
> ffff93a9cb992c00
> [Thu Nov  5 13:14:27 2020] FS:  0000000000000000(0000)
> GS:ffff93a9cef00000(0000) knlGS:0000000000000000
> [Thu Nov  5 13:14:27 2020] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [Thu Nov  5 13:14:27 2020] CR2: 00007f362472c008 CR3: 00000003f108a000 CR4:
> 00000000003506e0
> [Thu Nov  5 13:14:27 2020] Call Trace:
> [Thu Nov  5 13:14:27 2020]  __memcg_kmem_uncharge+0x4a/0x80
> [Thu Nov  5 13:14:27 2020]  drain_obj_stock+0x72/0x90
> [Thu Nov  5 13:14:27 2020]  refill_obj_stock+0x95/0xb0
> [Thu Nov  5 13:14:27 2020]  kmem_cache_free+0x194/0x390
> [Thu Nov  5 13:14:27 2020]  __sk_destruct+0x125/0x180
> [Thu Nov  5 13:14:27 2020]  inet_release+0x48/0x90
> [Thu Nov  5 13:14:27 2020]  sock_release+0x26/0x80
> [Thu Nov  5 13:14:27 2020]  ops_exit_list+0x2e/0x60
> [Thu Nov  5 13:14:27 2020]  cleanup_net+0x1eb/0x310
> [Thu Nov  5 13:14:27 2020]  process_one_work+0x1b1/0x310
> [Thu Nov  5 13:14:27 2020]  worker_thread+0x4b/0x400
> [Thu Nov  5 13:14:27 2020]  ? process_one_work+0x310/0x310
> [Thu Nov  5 13:14:27 2020]  kthread+0x112/0x130
> [Thu Nov  5 13:14:27 2020]  ? __kthread_bind_mask+0x90/0x90
> [Thu Nov  5 13:14:27 2020]  ret_from_fork+0x22/0x30
> [Thu Nov  5 13:14:27 2020] ---[ end trace a17bbc8650d8c295 ]---
> 
> -- 
> You are receiving this mail because:
> You are the assignee for the bug.

Comment 2 Lorenzo Stoakes 2020-11-08 17:49:40 UTC

Looking to jump back into some kernel hacking again so thought I'd
take a quick rusty look.

Pattern matching a bit but I wonder whether f2fe7b09 (mm: memcg/slab:
charge individual slab objects instead of pages) may have had a role
in this bug as it adds an obj_cgroup_uncharge() invocation to
memcg_slab_free_hook() invoked from kmem_cache_free(). sk_prot_free()
also invokes mem_cgroup_sk_free() before kmem_cache_free() so perhaps
an uncharge is getting doubled up here? I traced through
mem_cgroup_sk_free() (which invokes css_put()) but couldn't see where
it would result in an additional uncharge so I may be barking up the
wrong tree here.

I'd be more than happy to have a deeper look at this if vladi has some
code that repro's this + a .config, if that'd be helpful.

Best, Lorenzo

Comment 3 vladi 2020-11-08 18:00:22 UTC

Created attachment 293573 [details]
Kernel Config

Comment 4 vladi 2020-11-08 18:01:30 UTC

Not sure on what reproduces this, I see it when the server boots. 5.9.x is not stable for me might be due to this not sure I went back to 5.8. Let me know how else I can help. Thanks!

Comment 5 mhocko 2020-11-09 08:16:16 UTC

[Cc Roman and Shakeel]

On Fri 06-11-20 21:13:00, Andrew Morton wrote:
> (switched to email.  Please respond via emailed reply-to-all, not via the
> bugzilla web interface).
> 
> On Thu, 05 Nov 2020 21:18:05 +0000 bugzilla-daemon@bugzilla.kernel.org wrote:
> 
> > https://bugzilla.kernel.org/show_bug.cgi?id=210075
> > 
> >             Bug ID: 210075
> >            Summary: [Thu Nov  5 13:14:27 2020] WARNING: CPU: 4 PID: 133 at
> >                     mm/page_counter.c:57 page_counter_uncharge+0x34/0x40
> >            Product: Memory Management
> >            Version: 2.5
> >     Kernel Version: 5.9.5
> >           Hardware: x86-64
> >                 OS: Linux
> >               Tree: Mainline
> >             Status: NEW
> >           Severity: normal
> >           Priority: P1
> >          Component: Page Allocator
> >           Assignee: akpm@linux-foundation.org
> >           Reporter: vladi@aresgate.net
> >         Regression: No
> 
> I'm assuming this is a bug in the networking code.  I've seen a number
> of possibly-related emails fly past - is it familiar to anyone?

Looks similar to 8de15e920dc8 ("mm: memcg: link page counters to root if
use_hierarchy is false"). The path is different so the underlying reason
might be something else.
 
> > [Thu Nov  5 13:14:27 2020] ------------[ cut here ]------------
> > [Thu Nov  5 13:14:27 2020] WARNING: CPU: 4 PID: 133 at mm/page_counter.c:57
> > page_counter_uncharge+0x34/0x40
> > [Thu Nov  5 13:14:27 2020] Modules linked in: tcp_diag udp_diag raw_diag
> > inet_diag netlink_diag nfnetlink_queue xt_nat macvlan veth ip6table_filter
> > ip6_tables nf_conntrack_netlink nfnetlink bridge stp llc 8021q
> >  iptable_raw nf_log_ipv4 nf_log_common xt_LOG xt_limit xt_NFQUEUE
> ipt_REJECT
> > nf_reject_ipv4 xt_mark xt_tcpudp xt_conntrack iptable_filter xt_MASQUERADE
> > xt_addrtype iptable_nat nf_nat iptable_mangle ip_tables x_
> > tables sch_fq_codel ext4 mbcache jbd2 dm_crypt encrypted_keys wireguard
> > curve25519_x86_64 libcurve25519_generic libchacha20poly1305 chacha_x86_64
> > poly1305_x86_64 ip6_udp_tunnel udp_tunnel libblake2s blake2s_x86
> > _64 libblake2s_generic ipv6 libchacha amdgpu snd_hda_codec_hdmi
> > nf_conntrack_sip nf_conntrack nf_defrag_ipv6 mfd_core snd_hda_intel
> gpu_sched
> > nf_defrag_ipv4 snd_intel_dspcfg ttm kvm_amd snd_hda_codec drm_kms_he
> > lper efi_pstore tun snd_hda_core syscopyarea kvm sysfillrect sysimgblt
> snd_pcm
> > fb_sys_fops irqbypass snd_timer efivars drm snd ccp k10temp tcp_cubic
> backlight
> > soundcore
> > [Thu Nov  5 13:14:27 2020]  sha1_generic cp210x input_leds evdev led_class
> > usbserial bfq acpi_cpufreq button
> > [Thu Nov  5 13:14:27 2020] CPU: 4 PID: 133 Comm: kworker/u16:9 Not tainted
> > 5.9.5 #85
> > [Thu Nov  5 13:14:27 2020] Hardware name: System manufacturer System
> Product
> > Name/PRIME X370-PRO, BIOS 5220 09/12/2019
> > [Thu Nov  5 13:14:27 2020] Workqueue: netns cleanup_net
> > [Thu Nov  5 13:14:27 2020] RIP: 0010:page_counter_uncharge+0x34/0x40
> > [Thu Nov  5 13:14:27 2020] Code: 85 ff 74 24 4c 89 c8 f0 48 0f c1 07 4c 29
> c0
> > 48 89 c1 48 89 c6 e8 7c fe ff ff 48 85 c9 78 0b 48 8b 7f 28 48 85 ff 75 dc
> f3
> > c3 <0f> 0b eb f1 0f 1f 84 00 00 00 00 00 48 8b 17 48 3
> > 9 d6 72 41 41 54
> > [Thu Nov  5 13:14:27 2020] RSP: 0018:ffffa225007c7d30 EFLAGS: 00010082
> > [Thu Nov  5 13:14:27 2020] RAX: fffffffffffffffe RBX: ffff93a9cd087000 RCX:
> > fffffffffffffffe
> > [Thu Nov  5 13:14:27 2020] RDX: 0000000000000200 RSI: fffffffffffffffe RDI:
> > ffff93a9cd087248
> > [Thu Nov  5 13:14:27 2020] RBP: 0000000000000002 R08: 0000000000000002 R09:
> > fffffffffffffffe
> > [Thu Nov  5 13:14:27 2020] R10: 0000000000000246 R11: 0000000000000000 R12:
> > 0000000000000518
> > [Thu Nov  5 13:14:27 2020] R13: 0000000000000488 R14: ffffffffb96869f5 R15:
> > ffff93a9cb992c00
> > [Thu Nov  5 13:14:27 2020] FS:  0000000000000000(0000)
> > GS:ffff93a9cef00000(0000) knlGS:0000000000000000
> > [Thu Nov  5 13:14:27 2020] CS:  0010 DS: 0000 ES: 0000 CR0:
> 0000000080050033
> > [Thu Nov  5 13:14:27 2020] CR2: 00007f362472c008 CR3: 00000003f108a000 CR4:
> > 00000000003506e0
> > [Thu Nov  5 13:14:27 2020] Call Trace:
> > [Thu Nov  5 13:14:27 2020]  __memcg_kmem_uncharge+0x4a/0x80
> > [Thu Nov  5 13:14:27 2020]  drain_obj_stock+0x72/0x90
> > [Thu Nov  5 13:14:27 2020]  refill_obj_stock+0x95/0xb0
> > [Thu Nov  5 13:14:27 2020]  kmem_cache_free+0x194/0x390
> > [Thu Nov  5 13:14:27 2020]  __sk_destruct+0x125/0x180
> > [Thu Nov  5 13:14:27 2020]  inet_release+0x48/0x90
> > [Thu Nov  5 13:14:27 2020]  sock_release+0x26/0x80
> > [Thu Nov  5 13:14:27 2020]  ops_exit_list+0x2e/0x60
> > [Thu Nov  5 13:14:27 2020]  cleanup_net+0x1eb/0x310
> > [Thu Nov  5 13:14:27 2020]  process_one_work+0x1b1/0x310
> > [Thu Nov  5 13:14:27 2020]  worker_thread+0x4b/0x400
> > [Thu Nov  5 13:14:27 2020]  ? process_one_work+0x310/0x310
> > [Thu Nov  5 13:14:27 2020]  kthread+0x112/0x130
> > [Thu Nov  5 13:14:27 2020]  ? __kthread_bind_mask+0x90/0x90
> > [Thu Nov  5 13:14:27 2020]  ret_from_fork+0x22/0x30
> > [Thu Nov  5 13:14:27 2020] ---[ end trace a17bbc8650d8c295 ]---
> > 
> > -- 
> > You are receiving this mail because:
> > You are the assignee for the bug.

Comment 6 shakeelb 2020-11-09 17:39:50 UTC

On Mon, Nov 9, 2020 at 12:16 AM Michal Hocko <mhocko@suse.com> wrote:
>
> [Cc Roman and Shakeel]
>
> On Fri 06-11-20 21:13:00, Andrew Morton wrote:
> > (switched to email.  Please respond via emailed reply-to-all, not via the
> > bugzilla web interface).
> >
> > On Thu, 05 Nov 2020 21:18:05 +0000 bugzilla-daemon@bugzilla.kernel.org
> wrote:
> >
> > > https://bugzilla.kernel.org/show_bug.cgi?id=210075
> > >
> > >             Bug ID: 210075
> > >            Summary: [Thu Nov  5 13:14:27 2020] WARNING: CPU: 4 PID: 133
> at
> > >                     mm/page_counter.c:57 page_counter_uncharge+0x34/0x40
> > >            Product: Memory Management
> > >            Version: 2.5
> > >     Kernel Version: 5.9.5
> > >           Hardware: x86-64
> > >                 OS: Linux
> > >               Tree: Mainline
> > >             Status: NEW
> > >           Severity: normal
> > >           Priority: P1
> > >          Component: Page Allocator
> > >           Assignee: akpm@linux-foundation.org
> > >           Reporter: vladi@aresgate.net
> > >         Regression: No
> >
> > I'm assuming this is a bug in the networking code.  I've seen a number
> > of possibly-related emails fly past - is it familiar to anyone?
>
> Looks similar to 8de15e920dc8 ("mm: memcg: link page counters to root if
> use_hierarchy is false"). The path is different so the underlying reason
> might be something else.
>

The commit 8de15e920dc8 is not in 5.9.5.

Is the issue reproducible and bisectable?

Comment 7 vladi 2020-11-27 01:25:02 UTC

Issue seems gone with 5.9.9.

Note You need to log in before you can comment on or make changes to this bug.