Created attachment 298247 [details] v1 mod-refcnt-race.sh While testing with fstests generic/108 and trying to address many low hanging fruit module removal races with scsi_debug through korg#212337 [0] I noticed what could be perceived as a generic kernel issue: userspace seeing a refcnt of 0 and still not being able to remove the module. Since this issue is really difficult to reproduce I ended up rewriting a test case for this to demonstrate the issue using any filesystem -- the script picks one at random: xfs, ext4 or btrfs. The attached script mod-refcnt-race.sh can be used to reproduce the issue easily now. I'll follow up soon on this bug entry with my findings and resolutions. Three use cases: 1) The refcnt not being 0 when you actually expect it to, thus calling for the need for a patient module removal option. mod-refcnt-race.sh --modprobe-verbose --use-patience --max-loops 1 2) refcnt being 0 as observed in userspace and a subsequent request to remove the module fails, after using pvremove mod-refcnt-race.sh --modprobe-verbose --use-patience A failure rate of about 1/16 test can be observed. 3) same race as 2) but without creating a filesystem or mounting it mod-refcnt-race.sh --modprobe-verbose --use-patience --skip-mount [0] https://bugzilla.kernel.org/show_bug.cgi?id=212337 [1] https://bugzilla.kernel.org/show_bug.cgi?id=212337
I have patches for kmod which addresses these issues. I'll post patches soon, but using this to track the issue and the script which helps to reproduce the issues.
Created attachment 298249 [details] busy-open-block-device-sleep.c The attached can be used with the script to induce the race where the race for the refcnt is force and then a few module removal attempts should fail after the refcnt is 0.
v2 Patches posted on: https://lkml.kernel.org/r/20210810051602.3067384-1-mcgrof@kernel.org