Not necessarily ppc64le specific, but so far I'm only seeing this on my Talos II system with XFS and none of my amd64 systems. 5.10.10 is the only 5.10.x series kernel tested so far. Issue may have started earlier. These are the only leaks so far with the box up for several hours. Not sure how serious this is. Specs: 2x 18 Core POWER9 512 GB memory The XFS filesystems on this system: df -h / /home Filesystem Size Used Avail Use% Mounted on /dev/md1 1.8T 59G 1.8T 4% / /dev/bcache0 14T 634G 14T 5% /home cat /sys/kernel/debug/kmemleak unreferenced object 0xc0000011c63af400 (size 512): comm "worker", pid 7351, jiffies 4295245272 (age 21394.586s) hex dump (first 32 bytes): c0 e0 58 3c 00 00 00 c0 08 f4 3a c6 11 00 00 c0 ..X<......:..... 08 f4 3a c6 11 00 00 c0 18 2a 3a c6 11 00 00 c0 ..:......*:..... backtrace: [<000000005f1fe84c>] blkg_alloc+0x58/0x260 [<00000000bb469d61>] blkg_create+0x3b0/0x570 [<000000007d35bf0d>] bio_associate_blkg_from_css+0x318/0x480 [<00000000a4cfa6ed>] bio_associate_blkg+0x44/0xb0 [<0000000014c40666>] cached_dev_submit_bio+0x140/0x1090 [<000000001e375f40>] submit_bio_noacct+0x12c/0x5e0 [<000000005d621ecf>] submit_bio+0x5c/0x270 [<000000000d4d6bf5>] iomap_readahead+0xdc/0x230 [<00000000b0093137>] xfs_vm_readahead+0x28/0x40 [<00000000c7837a39>] read_pages+0xcc/0x370 [<00000000ace2d2cc>] page_cache_ra_unbounded+0x1a4/0x280 [<00000000ea5f8116>] generic_file_buffered_read+0x4cc/0xbd0 [<00000000ce5a2b3b>] xfs_file_buffered_aio_read+0x70/0x130 [<0000000019bddea7>] xfs_file_read_iter+0xa0/0x150 [<0000000082a5c085>] new_sync_read+0x14c/0x1d0 [<00000000abee86d0>] vfs_read+0x1a0/0x210 unreferenced object 0xc00000001772a840 (size 64): comm "worker", pid 7351, jiffies 4295245272 (age 21394.586s) hex dump (first 32 bytes): dc 5f 00 00 00 00 00 00 50 97 80 00 00 00 00 c0 ._......P....... 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ backtrace: [<000000006e666d7e>] percpu_ref_init+0x7c/0x150 [<00000000ac923962>] blkg_alloc+0x84/0x260 [<00000000bb469d61>] blkg_create+0x3b0/0x570 [<000000007d35bf0d>] bio_associate_blkg_from_css+0x318/0x480 [<00000000a4cfa6ed>] bio_associate_blkg+0x44/0xb0 [<0000000014c40666>] cached_dev_submit_bio+0x140/0x1090 [<000000001e375f40>] submit_bio_noacct+0x12c/0x5e0 [<000000005d621ecf>] submit_bio+0x5c/0x270 [<000000000d4d6bf5>] iomap_readahead+0xdc/0x230 [<00000000b0093137>] xfs_vm_readahead+0x28/0x40 [<00000000c7837a39>] read_pages+0xcc/0x370 [<00000000ace2d2cc>] page_cache_ra_unbounded+0x1a4/0x280 [<00000000ea5f8116>] generic_file_buffered_read+0x4cc/0xbd0 [<00000000ce5a2b3b>] xfs_file_buffered_aio_read+0x70/0x130 [<0000000019bddea7>] xfs_file_read_iter+0xa0/0x150 [<0000000082a5c085>] new_sync_read+0x14c/0x1d0
This also has not happened (yet) on my 2nd ppc64le box. Exact same kernel configuration. Specs: Raptor CS Blackbird motherbord 1x 8 core POWER9 64 GB of memory df -h / Filesystem Size Used Avail Use% Mounted on /dev/sdb3 470G 149G 321G 32% /
At first glance, the allocation in question (blkg_alloc) is in the block/cgroup code, not XFS. What leads you to believe that this is unique to XFS?
My poor interpretation of the stacktraces apparently. My bad! I stopped and started one of the LXC containers and another 2 leaks were detected. Which product/component would it fall under virtualization?
Depends where the leaks are, are they all from blkg_alloc? It'd be block layer, I'm not sure which (if any) component is appropriate, perhaps IO/storage. -Eric
Yes, they're all blkg_alloc.
The issue appears to be transient. Leaks are detected, and after a clear and a a re-scan, leaks are no longer present. I'm thinking these are false positives.
Is also seems to me that you are using bacache. Did you try to bisect the problem? Or did you try to reproduce it on the membacked null_blk with xfs to make sure it is not the device before coming to the conclusion that xfs is the problem?