Bug 217466

Summary: vmalloc error: btrfs-delalloc btrfs_work_helper [btrfs] in kernel 6.3.x
Product: File System Reporter: a1bert (a1bert)
Component: btrfsAssignee: BTRFS virtual assignee (fs_btrfs)
Status: NEW ---    
Severity: low CC: bagasdotme, dsterba, forza, regressions
Priority: P3    
Hardware: Intel   
OS: Linux   
Kernel Version: Subsystem:
Regression: No Bisected commit-id:
Attachments: dmesg

Description a1bert 2023-05-20 09:47:41 UTC
Created attachment 304291 [details]
dmesg

after updating from 6.2.x to 6.3.x, vmalloc error messages started to appear in the dmesg



# free 
               total        used        free      shared  buff/cache   available
Mem:        16183724     1473068      205664       33472    14504992    14335700
Swap:       16777212      703596    16073616


(zswap enabled)
Comment 1 Bagas Sanjaya 2023-05-20 13:30:39 UTC
(In reply to a1bert from comment #0)
> Created attachment 304291 [details]
> dmesg
> 
> after updating from 6.2.x to 6.3.x, vmalloc error messages started to appear
> in the dmesg
> 
> 
> 
> # free 
>                total        used        free      shared  buff/cache  
> available
> Mem:        16183724     1473068      205664       33472    14504992   
> 14335700
> Swap:       16777212      703596    16073616
> 
> 
> (zswap enabled)

What setup is your computer? Can you bisect this to find the culprit?
Comment 2 a1bert 2023-05-21 10:38:28 UTC
it is small home server/gateway: (NAS/lxc/qemu/DVR/nfs/backup):


/dev/md0 on / type ext4 (rw,noatime,nodiratime,errors=remount-ro)
/dev/mapper/sopa-motion on /data/motion type xfs (rw,noatime,attr2,inode64,logbufs=8,logbsize=32k,sunit=128,swidth=256,noquota)
/dev/sdb3 on /mnt/raid1 type btrfs (rw,noatime,compress=zstd:15,space_cache,skip_balance,subvolid=5,subvol=/)
/dev/sdb3 on /data/backup type btrfs (rw,noatime,compress=zstd:15,space_cache,skip_balance,subvolid=12088,subvol=/@backup)
/dev/sdb3 on /data/sopa type btrfs (rw,noatime,compress=zstd:15,space_cache,skip_balance,subvolid=8323,subvol=/sopa)
/dev/sdb3 on /data/www type btrfs (rw,noatime,compress=zstd:15,space_cache,skip_balance,subvolid=8163,subvol=/www)
/dev/sdb3 on /data/tftp type btrfs (rw,noatime,compress=zstd:15,space_cache,skip_balance,subvolid=8164,subvol=/tftp)
/dev/sdb3 on /data/media type btrfs (rw,noatime,compress=zstd:15,space_cache,skip_balance,subvolid=8165,subvol=/media)
/dev/sdb3 on /data/nfs type btrfs (rw,noatime,compress=zstd:15,space_cache,skip_balance,subvolid=1021,subvol=/nfs)
/dev/sdb3 on /data/lxc type btrfs (rw,noatime,compress=zstd:15,space_cache,skip_balance,subvolid=9264,subvol=/lxc)
/dev/sdb3 on /data/libvirt type btrfs (rw,noatime,compress=zstd:15,space_cache,skip_balance,subvolid=8126,subvol=/libvirt)
/dev/sdb3 on /home type btrfs (rw,noatime,compress=zstd:15,space_cache,skip_balance,subvolid=12042,subvol=/@home)


btrfs sub list /home | wc -l
495


Overall:
    Device size:		   4.49TiB
    Device allocated:		   4.40TiB
    Device unallocated:		  93.94GiB
    Device missing:		     0.00B
    Device slack:		     0.00B
    Used:			   4.23TiB
    Free (estimated):		 134.98GiB	(min: 134.98GiB)
    Free (statfs, df):		 134.98GiB
    Data ratio:			      2.00
    Metadata ratio:		      2.00
    Global reserve:		 512.00MiB	(used: 0.00B)
    Multiple profiles:		        no

Data,RAID1: Size:2.19TiB, Used:2.11TiB (96.08%)
   /dev/sdb3	   2.19TiB
   /dev/sda3	   2.19TiB

Metadata,RAID1: Size:7.00GiB, Used:6.26GiB (89.41%)
   /dev/sdb3	   7.00GiB
   /dev/sda3	   7.00GiB

System,RAID1: Size:32.00MiB, Used:416.00KiB (1.27%)
   /dev/sdb3	  32.00MiB
   /dev/sda3	  32.00MiB

Unallocated:
   /dev/sdb3	  46.97GiB
   /dev/sda3	  46.97GiB


(sorry, I cannot bisect)
Comment 3 The Linux kernel's regression tracker (Thorsten Leemhuis) 2023-05-21 10:56:50 UTC
(In reply to a1bert from comment #2)

> (sorry, I cannot bisect)

With a bit of luck the btrfs maintainer (which is known to look into bugzilla reports) might have an idea about the cause or how to find it without a bisection; or somebody else might run into this a bisect the problem.

But be warned, if neither happens there is a decent chance that this won't be fixed.
Comment 4 Bagas Sanjaya 2023-05-21 11:04:41 UTC
(In reply to a1bert from comment #2)
> (sorry, I cannot bisect)

See Documentation/admin-guide/bug-bisect.rst for how to perform bisection.

Remember: if you'd like to see your regression (like this) being fixed,
you'll have to bisect.
Comment 5 David Sterba 2023-05-22 13:51:39 UTC
This not a regression, the memory allocation failures could happen for various reasons and depend on actual state of the system, how fragmented the memory is, if there are virtual mapping slots available. What could affect it comparing 6.2 and 6.3 is some internal memory allocator strategy or even an indirect change, that's a speculative territory.

The report is from inside zstd_alloc_workspace, that's calling kmalloc (with fallback to vmalloc). The allocated size is 2097152 (about 2MiB) and per comment 2 you're using compress=zstd:15. The level 15 is indeed the most memory hungry. The workspaces are preallocated or allocated on demand, which could lead to the warnings.

In case the on-demand allocation fails the thread waits until one is free, so the fix is to avoid the allocation warnings.
Comment 6 Forza 2023-05-22 17:01:34 UTC
Hi!

I reported a similar issue only a couple of weeks ago. I could at that time reliably create the vmalloc error within a few minutes by running bees. This only happened on kernel 6.3.x. This system has 24GB of RAM and even if I load several VMs, large databases, run compilations, etc, at the same time as bees, all is good on kernels <6.3, but if I run bees without any other service on 6.3, this vmalloc error happen even though there is 15-20GB free ram and I trigger /proc/sys/vm/compact_memory.

Not saying it is a fault in Btrfs, and it is probably somewhere else in the kernel. I was able to reproduce the issue in a QEMU VM. 

https://lore.kernel.org/all/d11418b6-38e5-eb78-1537-c39245dc0b78@tnonline.net/T/
Comment 7 David Sterba 2023-05-22 18:59:02 UTC
Thanks for the pointer. The cause and symptoms are the same but only in a different place (ioctl). We can add the NOWARN flag but something might be going on in MM that would be of interest of the developers.