Bug 218158

Summary: fdatasync to a block device seems to block writes on unrelated devices
Product: IO/Storage Reporter: Matthew Stapleton (matthew4196)
Component: Block LayerAssignee: Jens Axboe (axboe)
Status: NEW ---    
Severity: normal CC: bagasdotme, sam, sirius
Priority: P3    
Hardware: AMD   
OS: Linux   
Kernel Version: Subsystem:
Regression: No Bisected commit-id:
Attachments: config-6.1.53-capsicum

Description Matthew Stapleton 2023-11-18 23:08:29 UTC
I was running nwipe on a failing hard drive that was running very slow and while nwipe was running fdatasync it seemed to cause delays for the filesystems on the other drives.  The other drives are attached to the same onboard ahci sata adapter if that is important.  After stopping nwipe, performance returned to normal

The system is using ext4 filesystems on top of LVM on top of Linux RAID6 and the kernel is 6.1.53.

Is this a design problem with fdatasync or could it be something else?

Nov 18 08:10:27 server kernel: sysrq: Show Blocked State
Nov 18 08:10:27 server kernel: task:nwipe           state:D stack:0     pid:61181 ppid:42337  flags:0x00004000
Nov 18 08:10:27 server kernel: Call Trace:
Nov 18 08:10:27 server kernel:  <TASK>
Nov 18 08:10:27 server kernel:  __schedule+0x2f8/0x870
Nov 18 08:10:27 server kernel:  schedule+0x55/0xc0
Nov 18 08:10:27 server kernel:  io_schedule+0x3d/0x60
Nov 18 08:10:27 server kernel:  folio_wait_bit_common+0x12c/0x300
Nov 18 08:10:27 server kernel:  ? filemap_invalidate_unlock_two+0x30/0x30
Nov 18 08:10:27 server kernel:  write_cache_pages+0x1c6/0x460
Nov 18 08:10:27 server kernel:  ? dirty_background_bytes_handler+0x20/0x20
Nov 18 08:10:27 server kernel:  generic_writepages+0x76/0xa0
Nov 18 08:10:27 server kernel:  do_writepages+0xbb/0x1c0
Nov 18 08:10:27 server kernel:  filemap_fdatawrite_wbc+0x56/0x80
Nov 18 08:10:27 server kernel:  __filemap_fdatawrite_range+0x53/0x70
Nov 18 08:10:27 server kernel:  file_write_and_wait_range+0x3c/0x90
Nov 18 08:10:27 server kernel:  blkdev_fsync+0xe/0x30
Nov 18 08:10:27 server kernel:  __x64_sys_fdatasync+0x46/0x80
Nov 18 08:10:27 server kernel:  do_syscall_64+0x3a/0xb0
Nov 18 08:10:27 server kernel:  entry_SYSCALL_64_after_hwframe+0x5e/0xc8
Nov 18 08:10:27 server kernel: RIP: 0033:0x7f02a735f00b
Nov 18 08:10:27 server kernel: RSP: 002b:00007f02a6858c80 EFLAGS: 00000293 ORIG_RAX: 000000000000004b
Nov 18 08:10:27 server kernel: RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f02a735f00b
Nov 18 08:10:27 server kernel: RDX: 0000000000000002 RSI: 00007f02a6858d80 RDI: 0000000000000004
Nov 18 08:10:27 server kernel: RBP: 00000118badb6000 R08: 0000000000000000 R09: 00007f02a0000080
Nov 18 08:10:27 server kernel: R10: 0000000000001000 R11: 0000000000000293 R12: 00000000000186a0
Nov 18 08:10:27 server kernel: R13: 00000000000186a0 R14: 0000000000001000 R15: 000055b7a0775850
Nov 18 08:10:27 server kernel:  </TASK>
Nov 18 08:10:27 server kernel: task:kworker/u64:4   state:D stack:0     pid:7842  ppid:2      flags:0x00004000
Nov 18 08:10:27 server kernel: Workqueue: writeback wb_workfn (flush-8:0)
Nov 18 08:10:27 server kernel: Call Trace:
Nov 18 08:10:27 server kernel:  <TASK>
Nov 18 08:10:27 server kernel:  __schedule+0x2f8/0x870
Nov 18 08:10:27 server kernel:  schedule+0x55/0xc0
Nov 18 08:10:27 server kernel:  io_schedule+0x3d/0x60
Nov 18 08:10:27 server kernel:  blk_mq_get_tag+0x115/0x2a0
Nov 18 08:10:27 server kernel:  ? destroy_sched_domains_rcu+0x20/0x20
Nov 18 08:10:27 server kernel:  __blk_mq_alloc_requests+0x18c/0x2e0
Nov 18 08:10:27 server kernel:  blk_mq_submit_bio+0x3dc/0x590
Nov 18 08:10:27 server kernel:  __submit_bio+0xec/0x170
Nov 18 08:10:27 server kernel:  submit_bio_noacct_nocheck+0x2bd/0x2f0
Nov 18 08:10:27 server kernel:  ? submit_bio_noacct+0x68/0x440
Nov 18 08:10:27 server kernel:  __block_write_full_page+0x1ef/0x4c0
Nov 18 08:10:27 server kernel:  ? bh_uptodate_or_lock+0x70/0x70
Nov 18 08:10:27 server kernel:  ? blkdev_write_begin+0x20/0x20
Nov 18 08:10:27 server kernel:  __writepage+0x14/0x60
Nov 18 08:10:27 server kernel:  write_cache_pages+0x172/0x460
Nov 18 08:10:27 server kernel:  ? dirty_background_bytes_handler+0x20/0x20
Nov 18 08:10:27 server kernel:  generic_writepages+0x76/0xa0
Nov 18 08:10:27 server kernel:  do_writepages+0xbb/0x1c0
Nov 18 08:10:27 server kernel:  ? __wb_calc_thresh+0x46/0x130
Nov 18 08:10:27 server kernel:  __writeback_single_inode+0x30/0x1a0
Nov 18 08:10:27 server kernel:  writeback_sb_inodes+0x205/0x4a0
Nov 18 08:10:27 server kernel:  __writeback_inodes_wb+0x47/0xe0
Nov 18 08:10:27 server kernel:  wb_writeback.isra.0+0x189/0x1d0
Nov 18 08:10:27 server kernel:  wb_workfn+0x1d0/0x3a0
Nov 18 08:10:27 server kernel:  process_one_work+0x1e5/0x320
Nov 18 08:10:27 server kernel:  worker_thread+0x45/0x3a0
Nov 18 08:10:27 server kernel:  ? rescuer_thread+0x390/0x390
Nov 18 08:10:27 server kernel:  kthread+0xd5/0x100
Nov 18 08:10:27 server kernel:  ? kthread_complete_and_exit+0x20/0x20
Nov 18 08:10:27 server kernel:  ret_from_fork+0x22/0x30
Nov 18 08:10:27 server kernel:  </TASK>
Nov 18 08:10:27 server kernel: task:rm              state:D stack:0     pid:54615 ppid:54597  flags:0x00004000
Nov 18 08:10:27 server kernel: Call Trace:
Nov 18 08:10:27 server kernel:  <TASK>
Nov 18 08:10:27 server kernel:  __schedule+0x2f8/0x870
Nov 18 08:10:27 server kernel:  schedule+0x55/0xc0
Nov 18 08:10:27 server kernel:  io_schedule+0x3d/0x60
Nov 18 08:10:27 server kernel:  bit_wait_io+0x8/0x50
Nov 18 08:10:27 server kernel:  __wait_on_bit+0x46/0x100
Nov 18 08:10:27 server kernel:  ? bit_wait+0x50/0x50
Nov 18 08:10:27 server kernel:  out_of_line_wait_on_bit+0x8c/0xb0
Nov 18 08:10:27 server kernel:  ? sugov_start+0x140/0x140
Nov 18 08:10:27 server kernel:  ext4_read_bh+0x6e/0x80
Nov 18 08:10:27 server kernel:  ext4_bread+0x45/0x60
Nov 18 08:10:27 server kernel:  __ext4_read_dirblock+0x4d/0x330
Nov 18 08:10:27 server kernel:  htree_dirblock_to_tree+0xa7/0x370
Nov 18 08:10:27 server kernel:  ? path_lookupat+0x92/0x190
Nov 18 08:10:27 server kernel:  ? filename_lookup+0xdf/0x1e0
Nov 18 08:10:27 server kernel:  ext4_htree_fill_tree+0x108/0x3c0
Nov 18 08:10:27 server kernel:  ext4_readdir+0x725/0xb40
Nov 18 08:10:27 server kernel:  iterate_dir+0x16a/0x1b0
Nov 18 08:10:27 server kernel:  __x64_sys_getdents64+0x7f/0x120
Nov 18 08:10:27 server kernel:  ? compat_filldir+0x180/0x180
Nov 18 08:10:27 server kernel:  do_syscall_64+0x3a/0xb0
Nov 18 08:10:27 server kernel:  entry_SYSCALL_64_after_hwframe+0x5e/0xc8
Nov 18 08:10:27 server kernel: RIP: 0033:0x7f8e32834897
Nov 18 08:10:27 server kernel: RSP: 002b:00007fffa3fb78c8 EFLAGS: 00000293 ORIG_RAX: 00000000000000d9
Nov 18 08:10:27 server kernel: RAX: ffffffffffffffda RBX: 0000558d8c4f8a70 RCX: 00007f8e32834897
Nov 18 08:10:27 server kernel: RDX: 0000000000008000 RSI: 0000558d8c4f8aa0 RDI: 0000000000000004
Nov 18 08:10:27 server kernel: RBP: 0000558d8c4f8aa0 R08: 0000000000000030 R09: 00007f8e3292da60
Nov 18 08:10:27 server kernel: R10: 00007f8e3292e140 R11: 0000000000000293 R12: ffffffffffffff88
Nov 18 08:10:27 server kernel: R13: 0000558d8c4f8a74 R14: 0000000000000000 R15: 0000558d8c503c78
Nov 18 08:10:27 server kernel:  </TASK>
Comment 1 Matthew Stapleton 2023-11-18 23:11:04 UTC
Created attachment 305422 [details]
config-6.1.53-capsicum

Here is my kernel config
Comment 2 Matthew Stapleton 2023-11-18 23:22:24 UTC
Also, I was running badblocks -b 4096 -w -s -v on the failing hard drive for a few days before trying nwipe and it didn't seem to be causing slowdowns on the server and the man page for badblocks says it uses Direct I/O by default.  I decided to try nwipe as it provides the option disable read verifying.

I could probably try removing fdatasync from nwipe or modifying it to use Direct I/O, but I haven't done that yet.
Comment 3 Bagas Sanjaya 2023-11-20 09:05:33 UTC
(In reply to Matthew Stapleton from comment #0)
> I was running nwipe on a failing hard drive that was running very slow and
> while nwipe was running fdatasync it seemed to cause delays for the
> filesystems on the other drives.  The other drives are attached to the same
> onboard ahci sata adapter if that is important.  After stopping nwipe,
> performance returned to normal
> 
> The system is using ext4 filesystems on top of LVM on top of Linux RAID6 and
> the kernel is 6.1.53.
> 

Can you check latest mainline (currently v6.7-rc2)?
Comment 4 Matthew Stapleton 2023-11-20 22:46:54 UTC
I might be able to, but it might be a while before I can try that as I will need to setup a new config for the newer kernels.  I might setup a throw away VM to see if I can easily replicate the problem there as it looks like qemu has some options to throttle virtual drive speeds.
Comment 5 Matthew Stapleton 2023-11-22 08:32:47 UTC
I've found this: https://lore.kernel.org/linux-btrfs/CAJCQCtSh2WT3fijK4sYEdfYpp09ehA+SA75rLyiJ6guUtyWjyw@mail.gmail.com/ which looks similar to the problem I'm having (See https://bugzilla.kernel.org/show_bug.cgi?id=218161 as well).  I am using the bfs io scheduler.

I have setup a VM with kernel 6.1.53, but so far haven't been able to trigger the problem.
Comment 6 Matthew Stapleton 2023-11-26 21:55:21 UTC
I've setup kernel 6.6.2 on the host server now as I wasn't able to get the VM to deadlock and so far it has been running for nearly days 3 without problems.  I have also changed preempt from full to voluntary to see if that helps.

I should probably mention my custom capsicum kernel is based on the gentoo sources base and extra patches and then a few extra custom patches such as the it87 driver from here: https://github.com/frankcrawford/it87 , enabling Intel user copy and PPro checksums for generic cpus, and an old custom patch to enable hpet on some additional nforce chipsets, but most of those patches were also enabled in my 5.10 kernel build.