Bug 216961

Summary: Severe IO scheduling starvation issues with btrfs
Product: File System Reporter: Wyatt Childers (kernel.dbwta)
Component: btrfsAssignee: Josef Bacik (josef)
Status: NEW ---    
Severity: high CC: jd.girard, josef, ngompa13, oleksandr, pmacedo, regressions
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: 6.0.12-300.fc37.x86_64 Subsystem:
Regression: No Bisected commit-id:

Description Wyatt Childers 2023-01-23 17:44:13 UTC
There seems to be an issue with btrfs in Kernel 6 (first seen in at least 6.0.12-300.fc37.x86_64, possibly earlier) and still observed in 6.1 (6.1.5-200.fc37.x86_64).

This has manifested in prolonged user space hangs. I first noticed this in internal tooling for my employer where simple IO operations like a trivial cp commands that typically take under 100ms (when run with many other concurrent IO operations of a similar ilk) are observing "scheduling starvation" being stalled in a disk sleep state for nearly a minute.

This also seems to be manifesting in hangs to Kwin that correlate with btrfs usage and things as routine and trivial (even on high performance systems) as system updates https://bugs.kde.org/show_bug.cgi?id=463353

I would also not be surprised if this behavior is responsible for things like this reddit thread: https://www.reddit.com/r/btrfs/comments/102or0t/btrfs_on_luks_whole_system_freezing_under_high_io/
Comment 1 Wyatt Childers 2023-01-23 17:57:47 UTC
As additional context, here are some sample kernel traces (echo w > /proc/sysrq-trigger):

Dec 16 14:10:02 localhost kernel: sysrq: Show Blocked State
Dec 16 14:10:02 localhost kernel: task:kworker/u64:6   state:D stack:    0 pid:2673507 ppid:     2 flags:0x00004000
Dec 16 14:10:02 localhost kernel: Workqueue: btrfs-worker btrfs_work_helper
Dec 16 14:10:02 localhost kernel: Call Trace:
Dec 16 14:10:02 localhost kernel:  <TASK>
Dec 16 14:10:02 localhost kernel:  __schedule+0x332/0x12c0
Dec 16 14:10:02 localhost kernel:  ? __sbitmap_get_word+0x37/0x80
Dec 16 14:10:02 localhost kernel:  schedule+0x5d/0xe0
Dec 16 14:10:02 localhost kernel:  io_schedule+0x42/0x70
Dec 16 14:10:02 localhost kernel:  blk_mq_get_tag+0x10c/0x290
Dec 16 14:10:02 localhost kernel:  ? dequeue_task_stop+0x70/0x70
Dec 16 14:10:02 localhost kernel:  __blk_mq_alloc_requests+0x183/0x2b0
Dec 16 14:10:02 localhost kernel:  blk_mq_submit_bio+0x2b7/0x5b0
Dec 16 14:10:02 localhost kernel:  __submit_bio+0xf5/0x180
Dec 16 14:10:02 localhost kernel:  submit_bio_noacct_nocheck+0x20d/0x2a0
Dec 16 14:10:02 localhost kernel:  btrfs_submit_bio+0x17a/0x3d0
Dec 16 14:10:02 localhost kernel:  btrfs_work_helper+0x14b/0x380
Dec 16 14:10:02 localhost kernel:  process_one_work+0x1c7/0x380
Dec 16 14:10:02 localhost kernel:  worker_thread+0x4d/0x380
Dec 16 14:10:02 localhost kernel:  ? _raw_spin_lock_irqsave+0x23/0x50
Dec 16 14:10:02 localhost kernel:  ? rescuer_thread+0x380/0x380
Dec 16 14:10:02 localhost kernel:  kthread+0xe9/0x110
Dec 16 14:10:02 localhost kernel:  ? kthread_complete_and_exit+0x20/0x20
Dec 16 14:10:02 localhost kernel:  ret_from_fork+0x22/0x30
Dec 16 14:10:02 localhost kernel:  </TASK>
Dec 16 14:10:02 localhost kernel: task:cp              state:D stack:    0 pid:2422766 ppid:2410777 flags:0x00000002
Dec 16 14:10:02 localhost kernel: Call Trace:
Dec 16 14:10:02 localhost kernel:  <TASK>
Dec 16 14:10:02 localhost kernel:  __schedule+0x332/0x12c0
Dec 16 14:10:02 localhost kernel:  ? btrfs_delayed_update_inode+0x102/0x1f0
Dec 16 14:10:02 localhost kernel:  schedule+0x5d/0xe0
Dec 16 14:10:02 localhost kernel:  io_schedule+0x42/0x70
Dec 16 14:10:02 localhost kernel:  folio_wait_bit_common+0x12d/0x3a0
Dec 16 14:10:02 localhost kernel:  ? filemap_alloc_folio+0xc0/0xc0
Dec 16 14:10:02 localhost kernel:  folio_wait_writeback+0x28/0x80
Dec 16 14:10:02 localhost kernel:  __filemap_fdatawait_range+0x7f/0x100
Dec 16 14:10:02 localhost kernel:  filemap_fdatawait_range+0xe/0x20
Dec 16 14:10:02 localhost kernel:  btrfs_wait_ordered_range+0x7a/0x120
Dec 16 14:10:02 localhost kernel:  btrfs_remap_file_range+0x139/0x550
Dec 16 14:10:02 localhost kernel:  do_clone_file_range+0xe9/0x230
Dec 16 14:10:02 localhost kernel:  vfs_clone_file_range+0x4d/0x140
Dec 16 14:10:02 localhost kernel:  ovl_copyfile+0x15d/0x180 [overlay]
Dec 16 14:10:02 localhost kernel:  ovl_remap_file_range+0x6e/0xa0 [overlay]
Dec 16 14:10:02 localhost kernel:  do_clone_file_range+0xe9/0x230
Dec 16 14:10:02 localhost kernel:  vfs_clone_file_range+0x4d/0x140
Dec 16 14:10:02 localhost kernel:  ioctl_file_clone+0x49/0xb0
Dec 16 14:10:02 localhost kernel:  do_vfs_ioctl+0x77/0x950
Dec 16 14:10:02 localhost kernel:  ? security_file_ioctl+0x3c/0x60
Dec 16 14:10:02 localhost kernel:  __x64_sys_ioctl+0x6e/0xd0
Dec 16 14:10:02 localhost kernel:  do_syscall_64+0x5b/0x80
Dec 16 14:10:02 localhost kernel:  ? syscall_exit_to_user_mode+0x17/0x40
Dec 16 14:10:02 localhost kernel:  ? do_syscall_64+0x67/0x80
Dec 16 14:10:02 localhost kernel:  ? syscall_exit_to_user_mode+0x17/0x40
Dec 16 14:10:02 localhost kernel:  ? do_syscall_64+0x67/0x80
Dec 16 14:10:02 localhost kernel:  ? do_user_addr_fault+0x1ef/0x690
Dec 16 14:10:02 localhost kernel:  ? do_syscall_64+0x67/0x80
Dec 16 14:10:02 localhost kernel:  ? exc_page_fault+0x70/0x170
Dec 16 14:10:02 localhost kernel:  entry_SYSCALL_64_after_hwframe+0x63/0xcd
Dec 16 14:10:02 localhost kernel: RIP: 0033:0x7f07d2d7dbaf
Dec 16 14:10:02 localhost kernel: RSP: 002b:00007ffc4ba5bf40 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
Dec 16 14:10:02 localhost kernel: RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 00007f07d2d7dbaf
Dec 16 14:10:02 localhost kernel: RDX: 0000000000000003 RSI: 0000000040049409 RDI: 0000000000000004
Dec 16 14:10:02 localhost kernel: RBP: 00007ffc4ba5d8cc R08: 0000000000000001 R09: 00000000000001a4
Dec 16 14:10:02 localhost kernel: R10: 0000000000000000 R11: 0000000000000246 R12: 00007ffc4ba5c3e0
Dec 16 14:10:02 localhost kernel: R13: 0000000000000001 R14: 0000000000000014 R15: 00007ffc4ba5c7b0
Dec 16 14:10:02 localhost kernel:  </TASK>
Dec 16 14:10:02 localhost kernel: task:cp              state:D stack:    0 pid:2422815 ppid:2419287 flags:0x00000002
Dec 16 14:10:02 localhost kernel: Call Trace:
Dec 16 14:10:02 localhost kernel:  <TASK>
Dec 16 14:10:02 localhost kernel:  __schedule+0x332/0x12c0
Dec 16 14:10:02 localhost kernel:  ? btrfs_dirty_inode+0x91/0xe0
Dec 16 14:10:02 localhost kernel:  schedule+0x5d/0xe0
Dec 16 14:10:02 localhost kernel:  io_schedule+0x42/0x70
Dec 16 14:10:02 localhost kernel:  folio_wait_bit_common+0x12d/0x3a0
Dec 16 14:10:02 localhost kernel:  ? filemap_alloc_folio+0xc0/0xc0
Dec 16 14:10:02 localhost kernel:  folio_wait_writeback+0x28/0x80
Dec 16 14:10:02 localhost kernel:  __filemap_fdatawait_range+0x7f/0x100
Dec 16 14:10:02 localhost kernel:  filemap_fdatawait_range+0xe/0x20
Dec 16 14:10:02 localhost kernel:  btrfs_wait_ordered_range+0x7a/0x120
Dec 16 14:10:02 localhost kernel:  btrfs_remap_file_range+0x139/0x550
Dec 16 14:10:02 localhost kernel:  do_clone_file_range+0xe9/0x230
Dec 16 14:10:02 localhost kernel:  vfs_clone_file_range+0x4d/0x140
Dec 16 14:10:02 localhost kernel:  ovl_copyfile+0x15d/0x180 [overlay]
Dec 16 14:10:02 localhost kernel:  ovl_remap_file_range+0x6e/0xa0 [overlay]
Dec 16 14:10:02 localhost kernel:  do_clone_file_range+0xe9/0x230
Dec 16 14:10:02 localhost kernel:  vfs_clone_file_range+0x4d/0x140
Dec 16 14:10:02 localhost kernel:  ioctl_file_clone+0x49/0xb0
Dec 16 14:10:02 localhost kernel:  do_vfs_ioctl+0x77/0x950
Dec 16 14:10:02 localhost kernel:  __x64_sys_ioctl+0x6e/0xd0
Dec 16 14:10:02 localhost kernel:  do_syscall_64+0x5b/0x80
Dec 16 14:10:02 localhost kernel:  ? exc_page_fault+0x70/0x170
Dec 16 14:10:02 localhost kernel:  entry_SYSCALL_64_after_hwframe+0x63/0xcd
Dec 16 14:10:02 localhost kernel: RIP: 0033:0x7f1e8dc22baf
Dec 16 14:10:02 localhost kernel: RSP: 002b:00007ffd65479d50 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
Dec 16 14:10:02 localhost kernel: RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 00007f1e8dc22baf
Dec 16 14:10:02 localhost kernel: RDX: 0000000000000003 RSI: 0000000040049409 RDI: 0000000000000004
Dec 16 14:10:02 localhost kernel: RBP: 00007ffd6547b8cc R08: 0000000000000001 R09: 00000000000001a4
Dec 16 14:10:02 localhost kernel: R10: 0000000000000000 R11: 0000000000000246 R12: 00007ffd6547a1f0
Dec 16 14:10:02 localhost kernel: R13: 0000000000000001 R14: 0000000000000014 R15: 00007ffd6547a5c0
Dec 16 14:10:02 localhost kernel:  </TASK>
Dec 16 14:10:02 localhost kernel: task:cp              state:D stack:    0 pid:2422818 ppid:2421995 flags:0x00000002
Dec 16 14:10:02 localhost kernel: Call Trace:
Dec 16 14:10:02 localhost kernel:  <TASK>
Dec 16 14:10:02 localhost kernel:  __schedule+0x332/0x12c0
Dec 16 14:10:02 localhost kernel:  schedule+0x5d/0xe0
Dec 16 14:10:02 localhost kernel:  io_schedule+0x42/0x70
Dec 16 14:10:02 localhost kernel:  folio_wait_bit_common+0x12d/0x3a0
Dec 16 14:10:02 localhost kernel:  ? filemap_alloc_folio+0xc0/0xc0
Dec 16 14:10:02 localhost kernel:  folio_wait_writeback+0x28/0x80
Dec 16 14:10:02 localhost kernel:  __filemap_fdatawait_range+0x7f/0x100
Dec 16 14:10:02 localhost kernel:  filemap_fdatawait_range+0xe/0x20
Dec 16 14:10:02 localhost kernel:  btrfs_wait_ordered_range+0x7a/0x120
Dec 16 14:10:02 localhost kernel:  btrfs_remap_file_range+0x139/0x550
Dec 16 14:10:02 localhost kernel:  do_clone_file_range+0xe9/0x230
Dec 16 14:10:02 localhost kernel:  vfs_clone_file_range+0x4d/0x140
Dec 16 14:10:02 localhost kernel:  ovl_copyfile+0x15d/0x180 [overlay]
Dec 16 14:10:02 localhost kernel:  ovl_remap_file_range+0x6e/0xa0 [overlay]
Dec 16 14:10:02 localhost kernel:  do_clone_file_range+0xe9/0x230
Dec 16 14:10:02 localhost kernel:  vfs_clone_file_range+0x4d/0x140
Dec 16 14:10:02 localhost kernel:  ioctl_file_clone+0x49/0xb0
Dec 16 14:10:02 localhost kernel:  do_vfs_ioctl+0x77/0x950
Dec 16 14:10:02 localhost kernel:  __x64_sys_ioctl+0x6e/0xd0
Dec 16 14:10:02 localhost kernel:  do_syscall_64+0x5b/0x80
Dec 16 14:10:02 localhost kernel:  ? syscall_exit_to_user_mode+0x17/0x40
Dec 16 14:10:02 localhost kernel:  ? do_syscall_64+0x67/0x80
Dec 16 14:10:02 localhost kernel:  entry_SYSCALL_64_after_hwframe+0x63/0xcd
Dec 16 14:10:02 localhost kernel: RIP: 0033:0x7ffa2a678baf
Dec 16 14:10:02 localhost kernel: RSP: 002b:00007ffe77c03de0 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
Dec 16 14:10:02 localhost kernel: RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 00007ffa2a678baf
Dec 16 14:10:02 localhost kernel: RDX: 0000000000000003 RSI: 0000000040049409 RDI: 0000000000000004
Dec 16 14:10:02 localhost kernel: RBP: 00007ffe77c058cc R08: 0000000000000001 R09: 00000000000001a4
Dec 16 14:10:02 localhost kernel: R10: 0000000000000000 R11: 0000000000000246 R12: 00007ffe77c04280
Dec 16 14:10:02 localhost kernel: R13: 0000000000000001 R14: 0000000000000014 R15: 00007ffe77c04650
Dec 16 14:10:02 localhost kernel:  </TASK>
Dec 16 14:10:02 localhost kernel: task:cp              state:D stack:    0 pid:2422821 ppid:2420167 flags:0x00000002
Dec 16 14:10:02 localhost kernel: Call Trace:
Dec 16 14:10:02 localhost kernel:  <TASK>
Dec 16 14:10:02 localhost kernel:  __schedule+0x332/0x12c0
Dec 16 14:10:02 localhost kernel:  ? _raw_spin_unlock_irqrestore+0x23/0x40
Dec 16 14:10:02 localhost kernel:  schedule+0x5d/0xe0
Dec 16 14:10:02 localhost kernel:  io_schedule+0x42/0x70
Dec 16 14:10:02 localhost kernel:  folio_wait_bit_common+0x12d/0x3a0
Dec 16 14:10:02 localhost kernel:  ? filemap_alloc_folio+0xc0/0xc0
Dec 16 14:10:02 localhost kernel:  folio_wait_writeback+0x28/0x80
Dec 16 14:10:02 localhost kernel:  __filemap_fdatawait_range+0x7f/0x100
Dec 16 14:10:02 localhost kernel:  filemap_fdatawait_range+0xe/0x20
Dec 16 14:10:02 localhost kernel:  btrfs_wait_ordered_range+0x7a/0x120
Dec 16 14:10:02 localhost kernel:  btrfs_remap_file_range+0x139/0x550
Dec 16 14:10:02 localhost kernel:  do_clone_file_range+0xe9/0x230
Dec 16 14:10:02 localhost kernel:  vfs_clone_file_range+0x4d/0x140
Dec 16 14:10:02 localhost kernel:  ovl_copyfile+0x15d/0x180 [overlay]
Dec 16 14:10:02 localhost kernel:  ovl_remap_file_range+0x6e/0xa0 [overlay]
Dec 16 14:10:02 localhost kernel:  do_clone_file_range+0xe9/0x230
Dec 16 14:10:02 localhost kernel:  vfs_clone_file_range+0x4d/0x140
Dec 16 14:10:02 localhost kernel:  ioctl_file_clone+0x49/0xb0
Dec 16 14:10:02 localhost kernel:  do_vfs_ioctl+0x77/0x950
Dec 16 14:10:02 localhost kernel:  ? security_file_ioctl+0x3c/0x60
Dec 16 14:10:02 localhost kernel:  __x64_sys_ioctl+0x6e/0xd0
Dec 16 14:10:02 localhost kernel:  do_syscall_64+0x5b/0x80
Dec 16 14:10:02 localhost kernel:  ? do_syscall_64+0x67/0x80
Dec 16 14:10:02 localhost kernel:  entry_SYSCALL_64_after_hwframe+0x63/0xcd
Dec 16 14:10:02 localhost kernel: RIP: 0033:0x7fafe34febaf
Dec 16 14:10:02 localhost kernel: RSP: 002b:00007fff79bee930 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
Dec 16 14:10:02 localhost kernel: RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 00007fafe34febaf
Dec 16 14:10:02 localhost kernel: RDX: 0000000000000003 RSI: 0000000040049409 RDI: 0000000000000004
Dec 16 14:10:02 localhost kernel: RBP: 00007fff79bef8f6 R08: 0000000000000001 R09: 00000000000001a4
Dec 16 14:10:02 localhost kernel: R10: 0000000000000004 R11: 0000000000000246 R12: 00007fff79beedd0
Dec 16 14:10:02 localhost kernel: R13: 0000000000000001 R14: 0000000000000002 R15: 00007fff79bef1a0
Dec 16 14:10:02 localhost kernel:  </TASK>
Dec 16 14:10:02 localhost kernel: task:cp              state:D stack:    0 pid:2422911 ppid:2401445 flags:0x00000002
Dec 16 14:10:02 localhost kernel: Call Trace:
Dec 16 14:10:02 localhost kernel:  <TASK>
Dec 16 14:10:02 localhost kernel:  __schedule+0x332/0x12c0
Dec 16 14:10:02 localhost kernel:  ? __btrfs_end_transaction+0xf8/0x240
Dec 16 14:10:02 localhost kernel:  schedule+0x5d/0xe0
Dec 16 14:10:02 localhost kernel:  io_schedule+0x42/0x70
Dec 16 14:10:02 localhost kernel:  folio_wait_bit_common+0x12d/0x3a0
Dec 16 14:10:02 localhost kernel:  ? filemap_alloc_folio+0xc0/0xc0
Dec 16 14:10:02 localhost kernel:  folio_wait_writeback+0x28/0x80
Dec 16 14:10:02 localhost kernel:  __filemap_fdatawait_range+0x7f/0x100
Dec 16 14:10:02 localhost kernel:  filemap_fdatawait_range+0xe/0x20
Dec 16 14:10:02 localhost kernel:  btrfs_wait_ordered_range+0x7a/0x120
Dec 16 14:10:02 localhost kernel:  btrfs_remap_file_range+0x139/0x550
Dec 16 14:10:02 localhost kernel:  do_clone_file_range+0xe9/0x230
Dec 16 14:10:02 localhost kernel:  vfs_clone_file_range+0x4d/0x140
Dec 16 14:10:02 localhost kernel:  ovl_copyfile+0x15d/0x180 [overlay]
Dec 16 14:10:02 localhost kernel:  ovl_remap_file_range+0x6e/0xa0 [overlay]
Dec 16 14:10:02 localhost kernel:  do_clone_file_range+0xe9/0x230
Dec 16 14:10:02 localhost kernel:  vfs_clone_file_range+0x4d/0x140
Dec 16 14:10:02 localhost kernel:  ioctl_file_clone+0x49/0xb0
Dec 16 14:10:02 localhost kernel:  do_vfs_ioctl+0x77/0x950
Dec 16 14:10:02 localhost kernel:  __x64_sys_ioctl+0x6e/0xd0
Dec 16 14:10:02 localhost kernel:  do_syscall_64+0x5b/0x80
Dec 16 14:10:02 localhost kernel:  ? exc_page_fault+0x70/0x170
Dec 16 14:10:02 localhost kernel:  entry_SYSCALL_64_after_hwframe+0x63/0xcd
Dec 16 14:10:02 localhost kernel: RIP: 0033:0x7ff280aeebaf
Dec 16 14:10:02 localhost kernel: RSP: 002b:00007ffdbfa595c0 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
Dec 16 14:10:02 localhost kernel: RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 00007ff280aeebaf
Dec 16 14:10:02 localhost kernel: RDX: 0000000000000003 RSI: 0000000040049409 RDI: 0000000000000004
Dec 16 14:10:02 localhost kernel: RBP: 00007ffdbfa5a8cc R08: 0000000000000001 R09: 00000000000001a4
Dec 16 14:10:02 localhost kernel: R10: 0000000000000000 R11: 0000000000000246 R12: 00007ffdbfa59a60
Dec 16 14:10:02 localhost kernel: R13: 0000000000000001 R14: 0000000000000014 R15: 00007ffdbfa59e30
Dec 16 14:10:02 localhost kernel:  </TASK>
Comment 2 The Linux kernel's regression tracker (Thorsten Leemhuis) 2023-01-26 11:13:25 UTC
(In reply to Wyatt Childers from comment #0)
> There seems to be an issue with btrfs in Kernel 6 (first seen in at least
> 6.0.12-300.fc37.x86_64, possibly earlier) and still observed in 6.1
> (6.1.5-200.fc37.x86_64).

Does that mean 5.19.y worked fine, or did the early 6.0.y versions work fine, too?
Comment 3 Wyatt Childers 2023-01-26 16:19:24 UTC
> Does that mean 5.19.y worked fine, or did the early 6.0.y versions work fine,
> too?

I unfortunately don't have that information. There was a delay between moving to Kernel 6.0 and the investigation that lead to discovering this issue.

I'm quite confident the problem was not in the 5.19.y series (though that's based on memory, not hard data and testing).
Comment 4 pmacedo 2023-02-07 12:47:31 UTC
This matches my experience on Debian with kernel from their unstable repository (6.1.4-1). I've rolled back to the current stable kernel from Bullseye (5.10.0-20-amd64, which according to the changelog is based on 5.10.158-2) about a week ago after finding this bug and the long IO stalls have disappeared.

Looks like Debian had 6.0 in backports, so I'll give it a try and report back.
Comment 5 pmacedo 2023-02-07 19:22:55 UTC
From my testing, this seems to be between 5.19 and 6.0 - on 6.0 I've seen mkdir take up to 40 seconds, while on 5.19 it doesn't seem to ever go above half a second.

My setup is raid 6 for data, raid1c3 for metadata; in my case the stalls seem to happen more frequently after a large number of deletions and btrfs-transaction will appear to be very busy according to iotop, with everything else just stalling.
Comment 6 Josef Bacik 2023-02-16 21:52:35 UTC
Can I get the output from the following commands

btrfs filesystem show <mountpoint>
lsblk -o name,sched,mountpoints
Comment 7 pmacedo 2023-02-17 01:23:36 UTC
For the affected fs and filtering unrelated FSes:

root@zeus:~# btrfs filesystem show /gigante/
Label: 'gigante'  uuid: a767e5e1-ee9e-4769-ac9b-c14fdc5b7502
        Total devices 13 FS bytes used 104.61TiB
        devid    1 size 16.37TiB used 13.23TiB path /dev/sdp
        devid    2 size 16.37TiB used 13.23TiB path /dev/sdq
        devid    3 size 16.37TiB used 13.24TiB path /dev/sda
        devid    4 size 16.37TiB used 13.20TiB path /dev/sdf
        devid    5 size 9.10TiB used 9.09TiB path /dev/sdn
        devid    6 size 9.10TiB used 9.09TiB path /dev/sdk
        devid    7 size 9.10TiB used 9.09TiB path /dev/sdo
        devid    8 size 9.10TiB used 9.09TiB path /dev/sdj
        devid    9 size 9.10TiB used 9.09TiB path /dev/sdl
        devid   10 size 9.10TiB used 9.09TiB path /dev/sdd
        devid   11 size 9.10TiB used 9.09TiB path /dev/sdm
        devid   12 size 9.10TiB used 9.09TiB path /dev/sdb
        devid   13 size 9.10TiB used 9.09TiB path /dev/sdi

root@zeus:~# lsblk -o name,sched,mountpoints |egrep -i "sdp|sdq|sda|sdf|sdn|sdk|sdo|sdj|sdl|sdd|sdm|sdb|sdi"
sda                       bfq
sdb                       bfq
sdd                       bfq
sdf                       bfq
sdi                       bfq
sdj                       bfq
sdk                       bfq
sdl                       bfq
sdm                       bfq
sdn                       bfq
sdo                       bfq
sdp                       bfq         /gigante
sdq                       bfq
root@zeus:~#
Comment 8 Josef Bacik 2023-02-17 01:32:37 UTC
Ok this matches other reports I’ve seen. Please change the io scheduler to mq-deadline and see if the problem goes away.
Comment 9 Wyatt Childers 2023-02-20 20:14:56 UTC
$> btrfs filesystem show /
Label: none  uuid: d9ab45ee-2acf-40eb-961a-4098ef36aa4e
        Total devices 1 FS bytes used 396.21GiB
        devid    1 size 476.85GiB used 440.02GiB path /dev/sdc2

$> btrfs filesystem show /a
Label: none  uuid: 1015de5d-d62d-4719-8dc0-858126326045
        Total devices 1 FS bytes used 199.28GiB
        devid    1 size 307.40GiB used 202.02GiB path /dev/sdb2

$> btrfs filesystem show /b
Label: none  uuid: cc9a3fc6-0f75-4966-969e-2678c04ada0d
        Total devices 1 FS bytes used 147.55GiB
        devid    1 size 472.53GiB used 150.01GiB path /dev/sda2

$> btrfs filesystem show /c
Label: none  uuid: e341a7af-43ee-4c55-81bb-79fe2604658d
        Total devices 1 FS bytes used 94.13GiB
        devid    1 size 624.11GiB used 353.01GiB path /dev/sdb1

----

NAME   SCHED MOUNTPOINTS
sda    bfq
├─sda1 bfq   /d
└─sda2 bfq   /b
sdb    bfq
├─sdb1 bfq   /c
└─sdb2 bfq   /a
sdc    bfq
├─sdc1 bfq   /boot/efi
└─sdc2 bfq   /home
             /
sr0    bfq
zram0        [SWAP]

Following some testing of different IO schedulers, I can confirm the issue goes away when leaving bfq. 

For others that might find themselves here: none and kyber seem to be the best options for my SSDs. None is giving me the best workload throughput but I'm going to go with kyber for the moment, it seems marginally worse (but Pop!_OS has chosen it as the default to reduce latency). I'm going to assume the Pop!_OS folks know what they're doing a bit more than me, and switch to that (at least for now).

I'll additionally note, I did notice music stuttering on kyber, noop, and mq-deadline. I don't remembering experience that with bfq before or after this regression.
Comment 10 Neal Gompa 2023-02-20 23:54:13 UTC
For what it's worth, Fedora defaults to mq-deadline for everything except drives that use sdX notation. There's currently a udev rule override that sets bfq for those by default[1].

There are two options here:

a. Request Fedora to drop the override. There's a bug report for it already[2].
b. Work with the maintainer of bfq to fix the problems we're seeing here.

I'd obviously prefer to see problems fixed, but barring that, it'd be worth opening a discussion downstream to drop the bfq default for sdX drives.

[1]: https://bugzilla.redhat.com/show_bug.cgi?id=1738828
[2]: https://bugzilla.redhat.com/show_bug.cgi?id=1851783
Comment 11 Neal Gompa 2023-02-20 23:55:12 UTC
Err, here's the actual udev rule override patch in place: https://src.fedoraproject.org/rpms/systemd/blob/rawhide/f/use-bfq-scheduler.patch
Comment 12 The Linux kernel's regression tracker (Thorsten Leemhuis) 2023-02-28 11:08:55 UTC
So what's the way forward here? Changing the io scheduler is a nice workaround, but from the report it sounds like this didn't happen in earlier kernels -- and thus is a regression that should be fixed (who actually needs to fix this obviously depends on which change caused this; but nobody bisected this, or?). Or am I missing something?
Comment 13 Neal Gompa 2023-02-28 11:10:36 UTC
The problem isn't in btrfs, it's in bfq. So from a perf/regression perspective, this bug belongs to bfq.

From the downstream Fedora perspective, if nothing gets fixed, it's worth having that conversation again about the bfq enablement patch.
Comment 14 The Linux kernel's regression tracker (Thorsten Leemhuis) 2023-02-28 11:23:14 UTC
(In reply to Neal Gompa from comment #13)
> The problem isn't in btrfs, it's in bfq. So from a perf/regression
> perspective, this bug belongs to bfq.

That's what's I suspected when skimming this (sorry, I can't follow each regression report closely). So:

Did anyone tell the bfq developers about this? Or ideally bisect this?
Comment 15 The Linux kernel's regression tracker (Thorsten Leemhuis) 2023-03-13 14:51:35 UTC
Hmm, this looks stalled.

(In reply to The Linux kernel's regression tracker (Thorsten Leemhuis) from comment #14)
> 
> Did anyone tell the bfq developers about this?

I did that a bit later through the mailing list:
https://lore.kernel.org/lkml/d7fe697d-09eb-3a59-6cb3-8eec480d1407@leemhuis.info/

No reply, from a quick lore search it sees Paolo is a bit MIA currently. Happens.

> Or ideally bisect this?

I guess I have to assume silence means "no".

Does anyone still care? If yes: please check if latest mainline is still affected; and ideally confirm that early 6.0 versions or 5.19 are working fine, to ensure this is really a regression.
Comment 16 Wyatt Childers 2023-03-23 15:58:23 UTC
I just wanted to note, even with things switched over to Kyber; I'm still seeing the occasional prolonged stall... I'm not entirely sure that a change made to BTRFS didn't play a significant role in this.