Bug 214015 - module refcnt being 0 after testing fstests generic/108 prevents module removal
Summary: module refcnt being 0 after testing fstests generic/108 prevents module removal
Status: ASSIGNED
Alias: None
Product: Other
Classification: Unclassified
Component: Modules (show other bugs)
Hardware: All Linux
: P5 normal
Assignee: Luis Chamberlain
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2021-08-10 02:31 UTC by Luis Chamberlain
Modified: 2021-08-10 05:18 UTC (History)
0 users

See Also:
Kernel Version:
Tree: Mainline
Regression: No


Attachments
v1 mod-refcnt-race.sh (7.72 KB, application/x-shellscript)
2021-08-10 02:31 UTC, Luis Chamberlain
Details
busy-open-block-device-sleep.c (496 bytes, text/x-csrc)
2021-08-10 04:38 UTC, Luis Chamberlain
Details

Description Luis Chamberlain 2021-08-10 02:31:10 UTC
Created attachment 298247 [details]
v1 mod-refcnt-race.sh

While testing with fstests generic/108 and trying to address many low hanging fruit module removal races with scsi_debug through korg#212337 [0] I noticed what could be perceived as a generic kernel issue: userspace seeing a refcnt of 0 and still not being able to remove the module.

Since this issue is really difficult to reproduce I ended up rewriting a test case for this to demonstrate the issue using any filesystem -- the script picks one at random: xfs, ext4 or btrfs. The attached script mod-refcnt-race.sh can be used to reproduce the issue easily now.

I'll follow up soon on this bug entry with my findings and resolutions.

Three use cases:

1) The refcnt not being 0 when you actually expect it to, thus calling for the need for a patient module removal option.

mod-refcnt-race.sh --modprobe-verbose --use-patience --max-loops 1


2) refcnt being 0 as observed in userspace and a subsequent request to remove the module fails, after using pvremove

mod-refcnt-race.sh --modprobe-verbose --use-patience

A failure rate of about 1/16 test can be observed.

3) same race as 2) but without creating  a filesystem or mounting it

mod-refcnt-race.sh --modprobe-verbose --use-patience --skip-mount

[0] https://bugzilla.kernel.org/show_bug.cgi?id=212337
[1] https://bugzilla.kernel.org/show_bug.cgi?id=212337
Comment 1 Luis Chamberlain 2021-08-10 02:32:36 UTC
I have patches for kmod which addresses these issues. I'll post patches soon, but using this to track the issue and the script which helps to reproduce the issues.
Comment 2 Luis Chamberlain 2021-08-10 04:38:07 UTC
Created attachment 298249 [details]
busy-open-block-device-sleep.c

The attached can be used with the script to induce the race where the race for the refcnt is force and then a few module removal attempts should fail after the refcnt is 0.
Comment 3 Luis Chamberlain 2021-08-10 05:18:00 UTC
v2 Patches posted on:

https://lkml.kernel.org/r/20210810051602.3067384-1-mcgrof@kernel.org

Note You need to log in before you can comment on or make changes to this bug.