Created attachment 306882 [details] dmesg showing lockup exercising xattr with stress-ng on ext4 when it's been created with ea_inode option causes a kernel hang. Kernel: 6.11.0-7, AMD64 virtual machine 8 thread virtual machine (important, must be multiple CPU threads to trigger the regression) 20GB virtio drive on /dev/vdb, 1 partition /dev/vdb1 sudo mkfs.ext4 /dev/vdb1 -O ea_inode sudo mount /dev/vdb1 /mnt git clone https://github.com/ColinIanKing/stress-ng cd stress-ng make clean; make -j $(nproc) sudo ./stress-ng --xattr 8 -t 120 --vmstat 1 --file-path /mnt ..wait a couple of minutes, you will see that the number of running processes is not 8 as expected (from the --vmstat output of stress-ng) cannot ^C stop stress-ng because of a kernel lockup, use another tty and check dmesg, I get the following: [ 247.028846] INFO: task jbd2/vdb1-8:1548 blocked for more than 122 seconds. [ 247.030830] Not tainted 6.11.0-7-generic #7-Ubuntu [ 247.032667] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 247.034170] task:jbd2/vdb1-8 state:D stack:0 pid:1548 tgid:1548 ppid:2 flags:0x00004000 [ 247.034176] Call Trace: [ 247.034178] <TASK> [ 247.034182] __schedule+0x277/0x6c0 [ 247.034199] schedule+0x29/0xd0 [ 247.034203] jbd2_journal_wait_updates+0x77/0xf0 [ 247.034207] ? __pfx_autoremove_wake_function+0x10/0x10 [ 247.034213] jbd2_journal_commit_transaction+0x290/0x1a10 [ 247.034223] kjournald2+0xa8/0x250 [ 247.034228] ? __pfx_autoremove_wake_function+0x10/0x10 [ 247.034233] ? __pfx_kjournald2+0x10/0x10 [ 247.034236] kthread+0xe1/0x110 [ 247.034241] ? __pfx_kthread+0x10/0x10 [ 247.034244] ret_from_fork+0x44/0x70 [ 247.034247] ? __pfx_kthread+0x10/0x10 [ 247.034251] ret_from_fork_asm+0x1a/0x30 [ 247.034257] </TASK> NOTE: this works fine for Linux 6.8.0, so this looks like a regression for 6.11.0 Attached is the full kernel log.
Oops, typo, the reproducer should be: sudo ./stress-ng --xattr 8 -t 120 --vmstat 1 --temp-path /mnt
Seems to occur on 6.10 but not 6.9, so it's a 6.10 regression.
OK, so you've isolated it to something between 6.9 and 6.10. Any chance you could do further bisection work? Thanks!!
I'm doing some bisect builds, but my system is a bit slow, so may take a few days.
Thanks! I'm traveling for the next two weeks, so anything you can do to narrow it down will be much appreciated.
(In reply to Colin Ian King from comment #4) > I'm doing some bisect builds, but my system is a bit slow, so may take a few > days. Use ext4 in RAM, e.g. using tmpfs. For building the kernel you could use ccache. To avoid reboots, have your root fs on any other FS, e.g. xfs and build ext4 as a module. Or even perform everything in a VM. It's not a hardware module, so the whole test could be performed in a VM.
$ git bisect bad 0a46ef234756dca04623b7591e8ebb3440622f0b is the first bad commit commit 0a46ef234756dca04623b7591e8ebb3440622f0b (HEAD) Author: Jan Kara <jack@suse.cz> Date: Thu Mar 21 17:26:50 2024 +0100 ext4: do not create EA inode under buffer lock ext4_xattr_set_entry() creates new EA inodes while holding buffer lock on the external xattr block. This is problematic as it nests all the allocation locking (which acquires locks on other buffers) under the buffer lock. This can even deadlock when the filesystem is corrupted and e.g. quota file is setup to contain xattr block as data block. Move the allocation of EA inode out of ext4_xattr_set_entry() into the callers. Reported-by: syzbot+a43d4f48b8397d0e41a9@syzkaller.appspotmail.com Signed-off-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20240321162657.27420-2-jack@suse.cz Signed-off-by: Theodore Ts'o <tytso@mit.edu> fs/ext4/xattr.c | 113 ++++++++++++++++++++++++++++++++++--------------------------------------- 1 file changed, 53 insertions(+), 60 deletions(-)
And reverting 0a46ef234756dca04623b7591e8ebb3440622f0b on 6.10.10 makes the test run OK, without reverting it the kernel will hang.
Jan, Please take a look.
Yeah, I'm already watching it because when you narrowed down the range to 6.9-6.10 I started suspecting this commit :) I'll try to reproduce to debug what's going on.
I think I see the problem and AFAICT commit 0a46ef234756dca is just making the latent deadlock easier to hit. The problem is shown by these two stacktraces from your dmesg: Task 1 [ 247.045575] __wait_on_freeing_inode+0xba/0x140 [ 247.045584] find_inode_fast+0xa4/0xe0 [ 247.045588] iget_locked+0x71/0x200 [ 247.045597] __ext4_iget+0x148/0x1080 [ 247.045615] ext4_xattr_inode_cache_find+0xe2/0x220 [ 247.045621] ext4_xattr_inode_lookup_create+0x122/0x240 [ 247.045626] ext4_xattr_block_set+0xc2/0xeb0 [ 247.045633] ext4_xattr_set_handle+0x4ba/0x650 [ 247.045641] ext4_xattr_set+0x80/0x160 Task 2 [ 247.043719] mb_cache_entry_wait_unused+0x9a/0xd0 [ 247.043729] ext4_evict_ea_inode+0x64/0xb0 [ 247.043733] ext4_evict_inode+0x35c/0x6d0 [ 247.043739] evict+0x108/0x2c0 [ 247.043745] iput+0x14a/0x260 [ 247.043749] ext4_xattr_ibody_set+0x175/0x1d0 [ 247.043754] ext4_xattr_set_handle+0x297/0x650 [ 247.043762] ext4_xattr_set+0x80/0x160 These two tasks are deadlocked against each other. One has dropped the last reference to xattr inode and is trying to remove it from memory and waits for corresponding mbcache entry to get unused while another task is holding the mbcache entry reference and is waiting for inode to be evicted from memory. Commit 0a46ef234756dca removed synchronization on buffer lock for one of the hot paths and thus hitting this race is now much more likely. I just have to make up my mind how to best fix this ABBA deadlock.