We have been getting memleaks in offset_ctx->xa in our networking tests: unreferenced object 0xffff8881004cd080 (size 576): comm "systemd", pid 1, jiffies 4294893373 (age 1992.864s) hex dump (first 32 bytes): 00 00 06 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 38 5c 7c 02 81 88 ff ff 98 d0 4c 00 81 88 ff ff 8\|.......L..... backtrace: [<000000000f554608>] xas_alloc+0x306/0x430 [<0000000075537d52>] xas_create+0x4b4/0xc80 [<00000000a927aab2>] xas_store+0x73/0x1680 [<0000000020a61203>] __xa_alloc+0x1d8/0x2d0 [<00000000ae300af2>] __xa_alloc_cyclic+0xf1/0x310 [<000000001032332c>] simple_offset_add+0xd8/0x170 [<0000000073229fad>] shmem_mknod+0xbf/0x180 [<00000000242520ce>] vfs_mknod+0x3b0/0x5c0 [<000000001ef218dd>] unix_bind+0x2c2/0xdb0 [<0000000009b9a8dd>] __sys_bind+0x127/0x1e0 [<000000003c949fbb>] __x64_sys_bind+0x6e/0xb0 [<00000000b8a767c7>] do_syscall_64+0x3d/0x90 [<000000006132ae0d>] entry_SYSCALL_64_after_hwframe+0x46/0xb0 Memleak trace points to some syscall performed by systemd and none of our tests do anything more advanced with it than 'systemctl restart ovs-vswitchd'. Basically it is a setup with Fedora and an upstream kernel that executes bunch of network offload tests with Open vSwitch, iproute2 tc, Linux bridge, etc. It looks like those may be caused by recent commit 6faddda69f62 ("libfs: Add directory operations for stable offsets") but we don't have a proper reproduction, just sometimes arbitrary getting the memleak complains during/after the regression run.
(In reply to vladbu from comment #0) > We have been getting memleaks in offset_ctx->xa in our networking tests: > > unreferenced object 0xffff8881004cd080 (size 576): > comm "systemd", pid 1, jiffies 4294893373 (age 1992.864s) > hex dump (first 32 bytes): > 00 00 06 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ > 38 5c 7c 02 81 88 ff ff 98 d0 4c 00 81 88 ff ff 8\|.......L..... > backtrace: > [<000000000f554608>] xas_alloc+0x306/0x430 > [<0000000075537d52>] xas_create+0x4b4/0xc80 > [<00000000a927aab2>] xas_store+0x73/0x1680 > [<0000000020a61203>] __xa_alloc+0x1d8/0x2d0 > [<00000000ae300af2>] __xa_alloc_cyclic+0xf1/0x310 > [<000000001032332c>] simple_offset_add+0xd8/0x170 > [<0000000073229fad>] shmem_mknod+0xbf/0x180 > [<00000000242520ce>] vfs_mknod+0x3b0/0x5c0 > [<000000001ef218dd>] unix_bind+0x2c2/0xdb0 > [<0000000009b9a8dd>] __sys_bind+0x127/0x1e0 > [<000000003c949fbb>] __x64_sys_bind+0x6e/0xb0 > [<00000000b8a767c7>] do_syscall_64+0x3d/0x90 > [<000000006132ae0d>] entry_SYSCALL_64_after_hwframe+0x46/0xb0 > > Memleak trace points to some syscall performed by systemd and none of our > tests do anything more advanced with it than 'systemctl restart > ovs-vswitchd'. Basically it is a setup with Fedora and an upstream kernel > that executes bunch of network offload tests with Open vSwitch, iproute2 tc, > Linux bridge, etc. > > It looks like those may be caused by recent commit 6faddda69f62 ("libfs: Add > directory operations for stable offsets") but we don't have a proper > reproduction, just sometimes arbitrary getting the memleak complains > during/after the regression run. Does reverting 6faddda69f62 fix this issue?
(In reply to Bagas Sanjaya from comment #1) > (In reply to vladbu from comment #0) > > We have been getting memleaks in offset_ctx->xa in our networking tests: > > > > unreferenced object 0xffff8881004cd080 (size 576): > > comm "systemd", pid 1, jiffies 4294893373 (age 1992.864s) > > hex dump (first 32 bytes): > > 00 00 06 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ > > 38 5c 7c 02 81 88 ff ff 98 d0 4c 00 81 88 ff ff 8\|.......L..... > > backtrace: > > [<000000000f554608>] xas_alloc+0x306/0x430 > > [<0000000075537d52>] xas_create+0x4b4/0xc80 > > [<00000000a927aab2>] xas_store+0x73/0x1680 > > [<0000000020a61203>] __xa_alloc+0x1d8/0x2d0 > > [<00000000ae300af2>] __xa_alloc_cyclic+0xf1/0x310 > > [<000000001032332c>] simple_offset_add+0xd8/0x170 > > [<0000000073229fad>] shmem_mknod+0xbf/0x180 > > [<00000000242520ce>] vfs_mknod+0x3b0/0x5c0 > > [<000000001ef218dd>] unix_bind+0x2c2/0xdb0 > > [<0000000009b9a8dd>] __sys_bind+0x127/0x1e0 > > [<000000003c949fbb>] __x64_sys_bind+0x6e/0xb0 > > [<00000000b8a767c7>] do_syscall_64+0x3d/0x90 > > [<000000006132ae0d>] entry_SYSCALL_64_after_hwframe+0x46/0xb0 > > > > Memleak trace points to some syscall performed by systemd and none of our > > tests do anything more advanced with it than 'systemctl restart > > ovs-vswitchd'. Basically it is a setup with Fedora and an upstream kernel > > that executes bunch of network offload tests with Open vSwitch, iproute2 > tc, > > Linux bridge, etc. > > > > It looks like those may be caused by recent commit 6faddda69f62 ("libfs: > Add > > directory operations for stable offsets") but we don't have a proper > > reproduction, just sometimes arbitrary getting the memleak complains > > during/after the regression run. > > Does reverting 6faddda69f62 fix this issue? Unless I'm missing something neither simple_offset_add() function nor offset_ctx->xa xarray existed before 6faddda69f62, so reverting it would obviously fix the issue.
Reverting 6faddda69f62 has unwanted side-effects, not the least of which is re-introducing a bug in the way tmpfs handles directory offsets.
Here is a copy of ovs-vswitchd.service from github: > [Unit] > Description=Open vSwitch Forwarding Unit > After=ovsdb-server.service network-pre.target systemd-udev-settle.service > Before=network.target network.service > Requires=ovsdb-server.service > ReloadPropagatedFrom=ovsdb-server.service > AssertPathIsReadWrite=/run/openvswitch/db.sock > PartOf=openvswitch.service > > [Service] > Type=forking > PIDFile=/run/openvswitch/ovs-vswitchd.pid > Restart=on-failure > Environment=XDG_RUNTIME_DIR=/run/openvswitch > EnvironmentFile=/etc/openvswitch/default.conf > EnvironmentFile=-/etc/sysconfig/openvswitch > EnvironmentFile=-/run/openvswitch.useropts > LimitSTACK=2M > @begin_dpdk@ > ExecStartPre=-/bin/sh -c '/usr/bin/chown :$${OVS_USER_ID##*:} /dev/hugepages' > ExecStartPre=-/usr/bin/chmod 0775 /dev/hugepages > @end_dpdk@ > ExecStart=/usr/share/openvswitch/scripts/ovs-ctl \ > --no-ovsdb-server --no-monitor --system-id=random \ > ${OVS_USER_OPT} \ > start $OPTIONS > ExecStop=/usr/share/openvswitch/scripts/ovs-ctl --no-ovsdb-server stop > ExecReload=/usr/share/openvswitch/scripts/ovs-ctl --no-ovsdb-server \ > --no-monitor --system-id=random \ > ${OVS_USER_OPT} \ > restart $OPTIONS > TimeoutSec=300 Vlad, have you tried running any of these ovs-ctl commands individually? My working theory is that one of these is doing something unexpected with a tmpfs file which exposes a missing kfree() in the shmemfs code that 6faddda69f62 converted to use simple directory offsets.
Ilya Maximets has broken down ovs-ctl even further: https://mail.openvswitch.org/pipermail/ovs-discuss/2023-November/052783.html One or more of these individual steps could be the culprit.
(In reply to Chuck Lever from comment #4) > Here is a copy of ovs-vswitchd.service from github: > > > [Unit] > > Description=Open vSwitch Forwarding Unit > > After=ovsdb-server.service network-pre.target systemd-udev-settle.service > > Before=network.target network.service > > Requires=ovsdb-server.service > > ReloadPropagatedFrom=ovsdb-server.service > > AssertPathIsReadWrite=/run/openvswitch/db.sock > > PartOf=openvswitch.service > > > > [Service] > > Type=forking > > PIDFile=/run/openvswitch/ovs-vswitchd.pid > > Restart=on-failure > > Environment=XDG_RUNTIME_DIR=/run/openvswitch > > EnvironmentFile=/etc/openvswitch/default.conf > > EnvironmentFile=-/etc/sysconfig/openvswitch > > EnvironmentFile=-/run/openvswitch.useropts > > LimitSTACK=2M > > @begin_dpdk@ > > ExecStartPre=-/bin/sh -c '/usr/bin/chown :$${OVS_USER_ID##*:} > /dev/hugepages' > > ExecStartPre=-/usr/bin/chmod 0775 /dev/hugepages > > @end_dpdk@ > > ExecStart=/usr/share/openvswitch/scripts/ovs-ctl \ > > --no-ovsdb-server --no-monitor --system-id=random \ > > ${OVS_USER_OPT} \ > > start $OPTIONS > > ExecStop=/usr/share/openvswitch/scripts/ovs-ctl --no-ovsdb-server stop > > ExecReload=/usr/share/openvswitch/scripts/ovs-ctl --no-ovsdb-server \ > > --no-monitor --system-id=random \ > > ${OVS_USER_OPT} \ > > restart $OPTIONS > > TimeoutSec=300 > > Vlad, have you tried running any of these ovs-ctl commands individually? My > working theory is that one of these is doing something unexpected with a > tmpfs file which exposes a missing kfree() in the shmemfs code that > 6faddda69f62 converted to use simple directory offsets. Just starting/restarting/reloading/stopping ovs-vswitchd service doesn't reproduce the issue for me.
At this point I cannot proceed without a reproducer. Vlad, please continue trying to identify a trigger for this issue. I expect as v6.6 is adopted by leading-edge distros, someone else might encounter this issue and be able to provide a more specific trigger.
(In reply to Chuck Lever from comment #7) > At this point I cannot proceed without a reproducer. Vlad, please continue > trying to identify a trigger for this issue. Vlad, any progress? FWIW, I noticed that Chuck wrote a change that contains a fixes tag pointing to the culprit of this regression ( https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/commit/?h=master&id=796432efab1e372d404e7a71cc6891a53f105051 ). I wonder if that in some way might be relevant for this problem.
(In reply to The Linux kernel's regression tracker (Thorsten Leemhuis) from comment #8) > (In reply to Chuck Lever from comment #7) > > At this point I cannot proceed without a reproducer. Vlad, please continue > > trying to identify a trigger for this issue. > > Vlad, any progress? No. As Chuck suggested in his previous comment: I expect as v6.6 is adopted by leading-edge distros, someone else might encounter this issue and be able to provide a more specific trigger.
(In reply to The Linux kernel's regression tracker (Thorsten Leemhuis) from comment #8) > FWIW, I noticed that Chuck wrote a change that contains a fixes tag pointing > to the culprit of this regression ( > https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/commit/ > ?h=master&id=796432efab1e372d404e7a71cc6891a53f105051 ). I wonder if that in > some way might be relevant for this problem. That's possible, but not probable. Worth a try, though.