Bug 206643 - fuse freeze and usb devices
Summary: fuse freeze and usb devices
Status: NEW
Alias: None
Product: File System
Classification: Unclassified
Component: Other (show other bugs)
Hardware: i386 Linux
: P1 normal
Assignee: io_other
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2020-02-23 12:50 UTC by taz.007
Modified: 2021-07-15 08:06 UTC (History)
1 user (show)

See Also:
Kernel Version: 5.8.10
Subsystem:
Regression: No
Bisected commit-id:


Attachments

Description taz.007 2020-02-23 12:50:14 UTC
Hello kernel experts,

I'm experiencing some freezing from a fuse userspace daemon (mergerfs).

setup:
10 hard drives (ext3 or ext4) mounted on the system.
7 of those are sata under usb enclosures, (usb2 only).
2 of them are usbkeys (usb1 & usb2).
1 of them is a regular sata drive directly connected.
I use mergerfs to gather all of them under a common mount point.

scenario :
the machine is cpu loaded, (2C/4T) nearly fully used.
rsync is running in a loop (in order to reproduce the issue), copying some files (several GB) from the mergerfs mount point to another drive (that is not part of the pool, also a regular ext4 mounted drive).
some background processes are doing "light" (~50KB/sec) IO on the same mergerfs pool.
after a while , any access to the mergerfs mount point is frozen.
This is because mergerfs itself is stuck in a syscall (if I understand correctly) that is never returning.
Depending if slice is used or not, mergerfs is stuck inside either  pread64() or splice().

However I can access (by doing an "ls" for example) the underlying mounted hard drives fine!
And in this case, accessing the underlying hard drives via "ls" somehow unfreezes the previously blocked syscall from the mergerfs daemon!
It is not even needed to use "ls", doing hdparm -tT on the drives directly also permits to unfreeze mergerfs.
Calling manually "sync" also unfreezes mergerfs.

I thought it might be related to usb :
When I tweak the values of /sys/block/sdX/device/max_sectors I can alter the behaviour.
With the values of 128 or 240, I'm unable to reproduce the issue.
With the value of 512 it reproduces the issue after around 4-5hours.
With the value of 1024 it reproduces the issue after around 2hours.
(maybe those are statistically insignificant numbers and I'm just unlucky)

There are no errors from the kernel, and the drives still seem to be working fine in fact.
I'm using Linux 5.5.3, but I tried back the 5.1.15, and the issue is already there.

It is not related to a disk sleeping: I got the case there doing hdparm -tT on the sata drive (where there is also a swap partition & it was busy swapping a bit) was enough to wake up mergerfs.

For more detailed info on the mergerfs callstack, see the original bugreport thread :
https://github.com/trapexit/mergerfs/issues/708
Comment 1 Christian Kujau 2020-02-25 05:58:58 UTC
Unable to reproduce this with mergerfs 2.29.0 and vanilla kernel 5.5.0 here. Reading your description above, this is how my setup looks like:

```
$ for i in {0..4}; do pv -Ss 512m < /dev/zero > /var/tmp/test_${i}.img; done
$ for f in /var/tmp/test_*; do losetup -f $f; done
$ for b in /dev/loop[0-4]; do echo $b; mkfs.ext4 -qF $b; done
$ mkdir /mnt/test/{0..4}
$ for i in {0..4}; do echo $i; mount -t ext4 /dev/loop${i} /mnt/test/${i}; done
$ mergerfs -o allow_other,threads=2,moveonenospc=true,ignorepponrename=true,category.create=mfs,func.unlink=ff,minfreespace=0,dropcacheonclose=true /mnt/test/0:/mnt/test/1:/mnt/test/2:/mnt/test/3:/mnt/test/4 /mnt/disk/

$ mount | tail -1
0:1:2:3:4 on /mnt/disk type fuse.mergerfs (rw,nosuid,nodev,relatime,user_id=0,group_id=0,default_permissions,allow_other)
```


With that in place, rsync is writing, reading and removing data to/from the mergerfs mount, but no lockups so far:


```
$ while true; do date; rsync -ax /usr/ /mnt/disk; tar -cf - /mnt/disk/ | pv > /dev/null; rm -rf /mnt/disk/*; echo; done
```

Can you maybe share your exacty commands, and try with a vanilla kernel perhaps? Enable more DEBUG options in the kernel config to see what's going on?
Comment 2 taz.007 2020-02-25 15:27:33 UTC
The Arch kernel is very close to vanilla, with some patches related to drm, & wifi , so I don't think it is the culprit (https://git.archlinux.org/linux.git/log/?h=v5.5.3-arch1)

the cmdline for rsync is :
rsync -av --delete /media/WDALL/$DIRWITHLOTSOFBIGFILES/ /media/WDBACK/$SAMEDIRNAME/
with /media/WDALL the mergerfs mount point and /media/WDBACK a regular ext4 mount point

As I can't reproduce unless I tweak the max_sectors, I think you would need to test with a real usb2 drive.
Also, at the same time as rsync running, there is some light read/write happening on the same mergerfs mount (the mergerfs mount is shared via smbd). This probably increases the race condition.

Do you have some specific CONFIG_*DEBUG* you want me to try ?
Comment 3 Christian Kujau 2020-02-25 20:34:11 UTC
Hm, yeah maybe it's USB related after all. Maybe set USB_STORAGE_DEBUG, and once CONFIG_DEBUG_KERNEL is set maybe enable some "Lock Debugging" and "Debug Oops, Lockups and Hangs" things, or even CONFIG_EXT4_DEBUG, or lower kernel.hung_task_timeout_secs, anything really that may trigger some messages instead of locking up quietly.
Comment 4 taz.007 2020-03-01 13:45:06 UTC
I had to disable USB_STORAGE_DEBUG, way too much logs, system not that much usable while rsync was running with this kernel settings.
I could reproduce several times, but no lock debug message in the kernel, until some hours ago, where the system seemed "more" stuck than usual.
"sync" didn't suffice to unfreeze the system, i had to hdparm -tT each drive, then it unfreezed, and while doing this finally got a message from the kernel:
 
[102601.484027] smbd            D    0  1787      1 0x80004006
[102601.484039] Call Trace:
[102601.484064]  __schedule+0x305/0x8b0
[102601.484077]  ? prepare_to_wait_event+0x59/0xe0
[102601.484097]  schedule+0x43/0xf0
[102601.484124]  fuse_simple_request+0x37d/0x510 [fuse]
[102601.484140]  ? wait_woken+0x70/0x70
[102601.484156]  ? wait_woken+0x70/0x70
[102601.484180]  fuse_flush+0x144/0x170 [fuse]
[102601.484218]  filp_close+0x2a/0x60
[102601.484231]  put_files_struct+0x6a/0xb0
[102601.484245]  exit_files+0x3c/0x50
[102601.484257]  do_exit+0x317/0xa10
[102601.484278]  do_group_exit+0x36/0x90
[102601.484291]  get_signal+0x14c/0x950
[102601.484307]  ? mntput_no_expire+0x4c/0x280
[102601.484321]  do_signal+0x2d/0x520
[102601.484330]  ? mntput_no_expire+0x66/0x280
[102601.484362]  exit_to_usermode_loop+0x85/0x140
[102601.484374]  do_fast_syscall_32+0x1c7/0x270
[102601.484388]  entry_SYSENTER_32+0xa5/0xf8
[102601.484396] EIP: 0xb7fd48e9
[102601.484405] Code: 5d c3 8d b4 26 00 00 00 00 b8 80 96 98 00 eb b5 8b 04 24 c3 8b 14 24 c3 8b 1c 24 c3 8b 3c 24 c3 90 51 52 55 89 e5 0f 34 cd 80 <5d> 5a 59 c3 90 90 90 90 8d 76 00 58 b8 77 00 00 00 cd 80 90 8d 76
[102601.484412] EAX: fffffffc EBX: 0044bb60 ECX: bffff38c EDX: b7f4ab8c
[102601.484418] ESI: 00451e70 EDI: bffff47c EBP: b7f50194 ESP: bffff30c
[102601.484424] DS: 007b ES: 007b FS: 0000 GS: 0033 SS: 007b EFLAGS: 00000282
[102601.484443]  ? xas_create+0x370/0x3b0
[102601.484457]  ? nmi+0xc5/0x2ac
[102601.484487] 
                Showing all locks held in the system:
[102601.484512] 1 lock held by khungtaskd/43:
[102601.484516]  #0: c3bd1d9c (rcu_read_lock){....}, at: debug_show_all_locks+0x1f/0xe6
[102601.484638] 1 lock held by x264/930:
[102601.484642]  #0: e409043c (&f->f_pos_lock){....}, at: __fdget_pos+0x3d/0x50

[102601.484689] =============================================

smbd is sharing the mergerfs mountpoint
x264 is one of the background tasks doing light io on the same mergerfs mountpoint. 
This was with kernel 5.5.6
Comment 5 taz.007 2020-03-08 19:35:56 UTC
got a freeze recently. max_sectors was set to 240, rsync was not even running.
No dump related messages in the kernel
Comment 6 taz.007 2020-05-12 02:40:12 UTC
It is still happening with a recent kernel.

running sync or sometimes hdparm -t (on all drives) is still needed to unfreeze the system.
This bug is infuriating.
Comment 7 taz.007 2020-07-16 11:30:34 UTC
while running hdparm -t in parallel to workaround the hang; this happened : 

[1090302.387637] request_module: kmod_concurrent_max (0) close to 0 (max_modprobes: 50), for module net-pf-16-proto-9, throttling...
[1090307.407113] __request_module: 361 callbacks suppressed
[1090307.407120] request_module: kmod_concurrent_max (0) close to 0 (max_modprobes: 50), for module net-pf-16-proto-9, throttling...
[1090307.407687] request_module: kmod_concurrent_max (0) close to 0 (max_modprobes: 50), for module net-pf-16-proto-9, throttling...
[1090307.413541] request_module: kmod_concurrent_max (0) close to 0 (max_modprobes: 50), for module net-pf-16-proto-9, throttling...
[1090307.413558] request_module: kmod_concurrent_max (0) close to 0 (max_modprobes: 50), for module net-pf-16-proto-9, throttling...
[1090307.421367] request_module: kmod_concurrent_max (0) close to 0 (max_modprobes: 50), for module net-pf-16-proto-9, throttling...
[1090307.422975] request_module: kmod_concurrent_max (0) close to 0 (max_modprobes: 50), for module net-pf-16-proto-9, throttling...
[1090307.430159] request_module: kmod_concurrent_max (0) close to 0 (max_modprobes: 50), for module net-pf-16-proto-9, throttling...
[1090307.430243] request_module: kmod_concurrent_max (0) close to 0 (max_modprobes: 50), for module net-pf-16-proto-9, throttling...
[1090307.431028] request_module: kmod_concurrent_max (0) close to 0 (max_modprobes: 50), for module net-pf-16-proto-9, throttling...
[1090307.432136] request_module: kmod_concurrent_max (0) close to 0 (max_modprobes: 50), for module net-pf-16-proto-9, throttling...
[1090312.523610] INFO: task hdparm:30741 blocked for more than 122 seconds.
[1090312.523621]       Not tainted 5.7.5-arch1-1 #2
[1090312.523624] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[1090312.523630] hdparm          D    0 30741  30714 0x00000000
[1090312.523640] Call Trace:
[1090312.523664]  __schedule+0x25e/0x7c0
[1090312.523675]  schedule+0x5a/0xd0
[1090312.523688]  rwsem_down_write_slowpath+0x226/0x410
[1090312.523695]  down_write+0x2f/0x40
[1090312.523701]  sync_inodes_sb+0x77/0x210
[1090312.523709]  sync_inodes_one_sb+0x18/0x20
[1090312.523715]  iterate_supers+0x82/0xd0
[1090312.523720]  ? __ia32_sys_splice+0xf0/0xf0
[1090312.523724]  ksys_sync+0x2a/0x80
[1090312.523728]  __do_sys_sync+0x8/0x10
[1090312.523734]  do_fast_syscall_32+0x61/0x220
[1090312.523740]  entry_SYSENTER_32+0xa5/0xf8
[1090312.523744] EIP: 0xb7fd2c71
[1090312.523753] Code: 89 da 89 f3 e8 18 00 00 00 89 d3 5b 5e 5d c3 8b 04 24 c3 8b 1c 24 c3 8b 34 24 c3 8b 3c 24 c3 90 90 51 52 55 89 e5 0f 34 cd 80 <5d> 5a 59 c3 90 90 90 90 8d 76 00 58 b8 77 00 00 00 cd 80 90 8d 76
[1090312.523757] EAX: ffffffda EBX: 00425e98 ECX: 00200000 EDX: b7be0000
[1090312.523762] ESI: 00000003 EDI: 00000014 EBP: 00425e98 ESP: bffff61c
[1090312.523767] DS: 007b ES: 007b FS: 0000 GS: 0033 SS: 007b EFLAGS: 00000286
[1090312.523779]  ? x86_vector_free_irqs+0xb3/0xe0
Comment 8 taz.007 2021-07-15 08:06:50 UTC
FWIW, I cant reproduce the issue anymore. However the setup has changed in several ways :

- the highly fragmented ext3 drive has been converted to ext4.
- lots of files have been removed from that drive, reducing the overall fragmentation.
that step "seemed" to have the biggest improvement.

then 
- switched to a 64bits kernel
- switched currently to a 5.11.16 kernel

Note You need to log in before you can comment on or make changes to this bug.