Bug 219244 - "driver core: Fix uevent_show() vs driver detach race" introduces weird "frozen usb mouse pointer at boot" bug on my system
Summary: "driver core: Fix uevent_show() vs driver detach race" introduces weird "froz...
Status: NEW
Alias: None
Product: Drivers
Classification: Unclassified
Component: Input Devices (show other bugs)
Hardware: Intel Linux
: P3 normal
Assignee: drivers_input-devices
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2024-09-06 19:37 UTC by brmails+k
Modified: 2024-10-03 15:04 UTC (History)
2 users (show)

See Also:
Kernel Version: v6.6.46 - 6.6.52, v6.10.10, v6.11.0
Subsystem:
Regression: Yes
Bisected commit-id: 4d035c743c3e391728a6f81cbf0f7f9ca700cf62


Attachments

Description brmails+k 2024-09-06 19:37:59 UTC
The symptoms of this bug are as follows:

- After booting (to the graphical login screen) the mouse pointer would frozen and only after unplugging and plugging-in again the usb plug of the mouse would the mouse be working as expected.
- If one would log in without fixing the mouse issue, the mouse pointer would still be frozen after login.
- The usb keyboard usually is not affected even though plugged into the same usb-hub - thus logging in is possible.
- The mouse pointer is also frozen if the usb connector is plugged into a different usb-port (different from the usb-hub)
- The pointer is moveable via the inbuilt synaptics trackpad


The kernel log shows almost the same messages (not sure if the differences mean anything in regards to this bug) for the initial recognizing the mouse (frozen mouse pointer) and the re-plugged-in mouse (and subsequently moveable mouse pointer):

[kernel] [    8.763158] usb 1-2.2.1.2: new low-speed USB device number 10 using xhci_hcd
[kernel] [    8.956028] usb 1-2.2.1.2: New USB device found, idVendor=045e, idProduct=00cb, bcdDevice= 1.04
[kernel] [    8.956036] usb 1-2.2.1.2: New USB device strings: Mfr=1, Product=2, SerialNumber=0
[kernel] [    8.956039] usb 1-2.2.1.2: Product: Microsoft Basic Optical Mouse v2.0 
[kernel] [    8.956041] usb 1-2.2.1.2: Manufacturer: Microsoft 
[kernel] [    8.963554] input: Microsoft  Microsoft Basic Optical Mouse v2.0  as /devices/pci0000:00/0000:00:14.0/usb1/1-2/1-2.2/1-2.2.1/1-2.2.1.2/1-2.2.1.2:1.0/0003:045E:00CB.0002/input/input18
[kernel] [    8.964417] hid-generic 0003:045E:00CB.0002: input,hidraw1: USB HID v1.11 Mouse [Microsoft  Microsoft Basic Optical Mouse v2.0 ] on usb-0000:00:14.0-2.2.1.2/input0

[kernel] [   31.258381] usb 1-2.2.1.2: USB disconnect, device number 10
[kernel] [   31.595051] usb 1-2.2.1.2: new low-speed USB device number 16 using xhci_hcd
[kernel] [   31.804002] usb 1-2.2.1.2: New USB device found, idVendor=045e, idProduct=00cb, bcdDevice= 1.04
[kernel] [   31.804010] usb 1-2.2.1.2: New USB device strings: Mfr=1, Product=2, SerialNumber=0
[kernel] [   31.804013] usb 1-2.2.1.2: Product: Microsoft Basic Optical Mouse v2.0 
[kernel] [   31.804016] usb 1-2.2.1.2: Manufacturer: Microsoft 
[kernel] [   31.812933] input: Microsoft  Microsoft Basic Optical Mouse v2.0  as /devices/pci0000:00/0000:00:14.0/usb1/1-2/1-2.2/1-2.2.1/1-2.2.1.2/1-2.2.1.2:1.0/0003:045E:00CB.0004/input/input20
[kernel] [   31.814028] hid-generic 0003:045E:00CB.0004: input,hidraw1: USB HID v1.11 Mouse [Microsoft  Microsoft Basic Optical Mouse v2.0 ] on usb-0000:00:14.0-2.2.1.2/input0

Differences:

../0003:045E:00CB.0002/input/input18 vs ../0003:045E:00CB.0004/input/input20

and

hid-generic 0003:045E:00CB.0002 vs hid-generic 0003:045E:00CB.0004


The connector / usb-port was not changed in this case!


The symptoms of this bug have been present at one point in the recent past, but with kernel v6.6.45 (or maybe even some version before that) it was fine. But with 6.6.45 it seems to be definitely fine.

But with v6.6.46 the symptoms returned. That's the reason I suspected the kernel to be the cause of this issue. So I did some bisecting - which wasn't easy because that bug would often times not appear if the system was simply rebooted into the test kernel.
As the bug would definitely appear on the affected kernels (v6.6.46 ff) after shutting down the system for the night and booting the next day, I resorted to simulating the over-night powering-off by shutting the system down, unplugging the power and pressing the power button to get rid of residual voltage. But even then a few times the bug would only appear if I repeated this procedure before booting the system again with the respective kernel.

This is on a Thinkpad with Kaby Lake and integrated Intel graphics.
Even though it is a laptop, it is used as a desktop device, and the internal battery is disconnected, the removable battery is removed as the system is plugged-in via the power cord at all times (when in use)! 
Also, the system has no power (except for the bios battery, of course) over night as the power outlet is switched off if the device is not in use.

Not sure if this affects the issue - or how it does. But for successful bisecting I had to resort to the above procedure.

Bisecting the issue (between the release commits of v6.6.45 and v6.6.46) resulted in this commit [1] being the probable culprit.

I then tested kernel v6.6.49. It still produced the bug for me. So I reverted the changes of the assumed "bad commit" and re-compiled kernel v6.6.49. With this modified kernel the bug seems to be gone.

Now, I assume the commit has a reason for being introduced, but maybe needs some adjusting in order to avoid this bug I'm experiencing on my system.
Also, I can't say why the issue appeared in the past even without this commit being present, as I haven't bisected any kernel version before v6.6.45.


[1]:

4d035c743c3e391728a6f81cbf0f7f9ca700cf62 is the first bad commit
commit 4d035c743c3e391728a6f81cbf0f7f9ca700cf62
Author: Dan Williams <dan.j.williams@intel.com>
Date:   Fri Jul 12 12:42:09 2024 -0700

    driver core: Fix uevent_show() vs driver detach race
    
    commit 15fffc6a5624b13b428bb1c6e9088e32a55eb82c upstream.
    
    uevent_show() wants to de-reference dev->driver->name. There is no clean
    way for a device attribute to de-reference dev->driver unless that
    attribute is defined via (struct device_driver).dev_groups. Instead, the
    anti-pattern of taking the device_lock() in the attribute handler risks
    deadlocks with code paths that remove device attributes while holding
    the lock.
    
    This deadlock is typically invisible to lockdep given the device_lock()
    is marked lockdep_set_novalidate_class(), but some subsystems allocate a
    local lockdep key for @dev->mutex to reveal reports of the form:
    
     ======================================================
     WARNING: possible circular locking dependency detected
     6.10.0-rc7+ #275 Tainted: G           OE    N
     ------------------------------------------------------
     modprobe/2374 is trying to acquire lock:
     ffff8c2270070de0 (kn->active#6){++++}-{0:0}, at: __kernfs_remove+0xde/0x220
    
     but task is already holding lock:
     ffff8c22016e88f8 (&cxl_root_key){+.+.}-{3:3}, at: device_release_driver_internal+0x39/0x210
    
     which lock already depends on the new lock.
    
     the existing dependency chain (in reverse order) is:
    
     -> #1 (&cxl_root_key){+.+.}-{3:3}:
            __mutex_lock+0x99/0xc30
            uevent_show+0xac/0x130
            dev_attr_show+0x18/0x40
            sysfs_kf_seq_show+0xac/0xf0
            seq_read_iter+0x110/0x450
            vfs_read+0x25b/0x340
            ksys_read+0x67/0xf0
            do_syscall_64+0x75/0x190
            entry_SYSCALL_64_after_hwframe+0x76/0x7e
    
     -> #0 (kn->active#6){++++}-{0:0}:
            __lock_acquire+0x121a/0x1fa0
            lock_acquire+0xd6/0x2e0
            kernfs_drain+0x1e9/0x200
            __kernfs_remove+0xde/0x220
            kernfs_remove_by_name_ns+0x5e/0xa0
            device_del+0x168/0x410
            device_unregister+0x13/0x60
            devres_release_all+0xb8/0x110
            device_unbind_cleanup+0xe/0x70
            device_release_driver_internal+0x1c7/0x210
            driver_detach+0x47/0x90
            bus_remove_driver+0x6c/0xf0
            cxl_acpi_exit+0xc/0x11 [cxl_acpi]
            __do_sys_delete_module.isra.0+0x181/0x260
            do_syscall_64+0x75/0x190
            entry_SYSCALL_64_after_hwframe+0x76/0x7e
    
    The observation though is that driver objects are typically much longer
    lived than device objects. It is reasonable to perform lockless
    de-reference of a @driver pointer even if it is racing detach from a
    device. Given the infrequency of driver unregistration, use
    synchronize_rcu() in module_remove_driver() to close any potential
    races.  It is potentially overkill to suffer synchronize_rcu() just to
    handle the rare module removal racing uevent_show() event.
    
    Thanks to Tetsuo Handa for the debug analysis of the syzbot report [1].
    
    Fixes: c0a40097f0bc ("drivers: core: synchronize really_probe() and dev_uevent()")
    Reported-by: syzbot+4762dd74e32532cda5ff@syzkaller.appspotmail.com
    Reported-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
    Closes: http://lore.kernel.org/5aa5558f-90a4-4864-b1b1-5d6784c5607d@I-love.SAKURA.ne.jp [1]
    Link: http://lore.kernel.org/669073b8ea479_5fffa294c1@dwillia2-xfh.jf.intel.com.notmuch
    Cc: stable@vger.kernel.org
    Cc: Ashish Sangwan <a.sangwan@samsung.com>
    Cc: Namjae Jeon <namjae.jeon@samsung.com>
    Cc: Dirk Behme <dirk.behme@de.bosch.com>
    Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
    Cc: Rafael J. Wysocki <rafael@kernel.org>
    Signed-off-by: Dan Williams <dan.j.williams@intel.com>
    Link: https://lore.kernel.org/r/172081332794.577428.9738802016494057132.stgit@dwillia2-xfh.jf.intel.com
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

 drivers/base/core.c   | 13 ++++++++-----
 drivers/base/module.c |  4 ++++
 2 files changed, 12 insertions(+), 5 deletions(-)




----------------------------- drivers/base/core.c -----------------------------
index aeb4644817d5..cb323700e952 100644
@@ -25,6 +25,7 @@
 #include <linux/mutex.h>
 #include <linux/pm_runtime.h>
 #include <linux/netdevice.h>
+#include <linux/rcupdate.h>
 #include <linux/sched/signal.h>
 #include <linux/sched/mm.h>
 #include <linux/string_helpers.h>
@@ -2565,6 +2566,7 @@ static const char *dev_uevent_name(const struct kobject *kobj)
 static int dev_uevent(const struct kobject *kobj, struct kobj_uevent_env *env)
 {
 	const struct device *dev = kobj_to_dev(kobj);
+	struct device_driver *driver;
 	int retval = 0;
 
 	/* add device node properties if present */
@@ -2593,8 +2595,12 @@ static int dev_uevent(const struct kobject *kobj, struct kobj_uevent_env *env)
 	if (dev->type && dev->type->name)
 		add_uevent_var(env, "DEVTYPE=%s", dev->type->name);
 
-	if (dev->driver)
-		add_uevent_var(env, "DRIVER=%s", dev->driver->name);
+	/* Synchronize with module_remove_driver() */
+	rcu_read_lock();
+	driver = READ_ONCE(dev->driver);
+	if (driver)
+		add_uevent_var(env, "DRIVER=%s", driver->name);
+	rcu_read_unlock();
 
 	/* Add common DT information about the device */
 	of_device_uevent(dev, env);
@@ -2664,11 +2670,8 @@ static ssize_t uevent_show(struct device *dev, struct device_attribute *attr,
 	if (!env)
 		return -ENOMEM;
 
-	/* Synchronize with really_probe() */
-	device_lock(dev);
 	/* let the kset specific function add its keys */
 	retval = kset->uevent_ops->uevent(&dev->kobj, env);
-	device_unlock(dev);
 	if (retval)
 		goto out;
 

---------------------------- drivers/base/module.c ----------------------------
index a1b55da07127..b0b79b9c189d 100644
@@ -7,6 +7,7 @@
 #include <linux/errno.h>
 #include <linux/slab.h>
 #include <linux/string.h>
+#include <linux/rcupdate.h>
 #include "base.h"
 
 static char *make_driver_name(struct device_driver *drv)
@@ -97,6 +98,9 @@ void module_remove_driver(struct device_driver *drv)
 	if (!drv)
 		return;
 
+	/* Synchronize with dev_uevent() */
+	synchronize_rcu();
+
 	sysfs_remove_link(&drv->p->kobj, "module");
 
 	if (drv->owner)
Comment 1 The Linux kernel's regression tracker (Thorsten Leemhuis) 2024-09-09 03:53:59 UTC
The important question is: does the same problem happen in mainline? It's important, as different steps will be taken depending on the answer -- and it also influences which developers will care/are responsible.
Comment 2 brmails+k 2024-09-09 19:07:52 UTC
I just tested v6.6.50 in the off-chance the issue might have been solved coincidentally. Unfortunately it wasn't

I'd have to set up a separate system for mainline testing. That might take a while for me to find the necessary time.

I'll report back when I have done that and tested the mainline kernel for the bug.
Comment 3 The Linux kernel's regression tracker (Thorsten Leemhuis) 2024-09-10 06:34:49 UTC
brmails+k: can I CC you when forwarding this by mail? This would expose your email address to the world.
Comment 4 brmails+k 2024-09-10 15:45:48 UTC
Thanks for asking. Even though this address is separate from my personal one, I'd still like to keep it spam-free, if possible.

If I can read your forwarded messages on https://lore.kernel.org/ that should be ok and no CC would be necessary, I guess.
Comment 5 brmails+k 2024-09-10 15:50:42 UTC
I noticed that the commit that apparently leads to some issues on my system solves a CVE [1], so simply reverting won't be the solution - which I didn't expect anyway.


[1]: CVE-2024-44952: driver core: Fix uevent_show() vs driver detach race: https://lore.kernel.org/all/2024090411-CVE-2024-44952-6290@gregkh/
Comment 6 The Linux kernel's regression tracker (Thorsten Leemhuis) 2024-09-11 08:05:06 UTC
(In reply to brmails+k from comment #4)
> If I can read your forwarded messages on https://lore.kernel.org/ 

Then keep an eye on https://lore.kernel.org/regressions/3724e8e8-ab71-4f64-8ba1-c5c9a617632f@leemhuis.info/T/#u :-D
Comment 7 brmails+k 2024-09-12 18:09:44 UTC
> https://lore.kernel.org/regressions/3724e8e8-ab71-4f64-8ba1-
> c5c9a617632f@leemhuis.info/T/#u :-D


Thanks for "forwarding" the issue.

I'm still planning on setting up a test drive to test the mainline kernel. But given that this is a rather minor issue, the commit a rather important one (to keep) and my system (in the current setup) being the only one affected / being able to replicate the issue, there's no pressure (time-wise or other) - definitely not from my part.

O'll report back when I had the time to setup the test drive and do the mainline testing - and post the results, of course.
Comment 8 brmails+k 2024-09-14 15:51:20 UTC
Little update:

I set up a test install but haven't been able to re-create the issue, even with the v6.6 linux.

I transferred then some settings to the test install and installed all packages that the install with the problem has installed (as per what the package manager tells me). But the test install still does not show the issue.

The only difference between those two setups (apart from probably some config files I have to compare that might be different, of course) is that the main install is on a nvme and the test install is on a hdd. The two different drives are connected to the same connector to the mainboard, but need different connector cables (m.2-nvme vs sata).

So, for now, I don't have an answer what might cause the issue, but I'll keep testing. It might be the fresh install vs. the old - but I tried to minimize that by having all packages on the test install, too.

Maybe the hdd vs the nvme (speed), that leads to different order of executing commands, or the different connector cables, which interfere somehow with usb initialization. All of course depending on the changes of the commit.

But that's just theoretical.

I might have to sacrifice my backup install which is also on a (separate) nvme and put a fresh isntall on there to see if it's the fresh vs. old install or nvme vs. hdd setup.

I'll update as I get new data / information.
Comment 9 brmails+k 2024-09-16 18:05:53 UTC
Update after a few days of more testing.

I haven't been able to replicate the issue on a test install on a hdd. Not even once!

But I have been able to replicate the issue on a test install on another nvme of the same make and model (Samsung 980 Pro) as the main install. With kernel v6.6.51, v6.10.10 and 6.11.0.

I'm not sure what to make of it or what to test next / further at this point. Maybe I get some idea and the do some more testing.

Any command output needed I'll try to provide.
Comment 10 brmails+k 2024-09-24 11:27:21 UTC
I saw Dan Williams' post [1] with the potential patch.

I'll test it as soon as I get some time to do so and report back.

Thanks for the forward and thanks to Dan Williams for taking the time to come up with a potential fix!


[1]

https://lore.kernel.org/regressions/3724e8e8-ab71-4f64-8ba1-c5c9a617632f@leemhuis.info/T/#m00607dbb85fe611b056230a52d3032e13990b800

Dan Williams wrote:


[...]


Ok, the following boots and passes the CXL unit tests, would appreciate
if the reporter can give this a try:

-- >8 --
Subject: driver core: Fix userspace expectations of uevent_show() as a probe barrier

From: Dan Williams <dan.j.williams@intel.com>

Commit "driver core: Fix uevent_show() vs driver detach race" [1]
attempted to fix a lockdep report in uevent_show() by making the lookup
of driver information for a given device lockless. It turns out that
userspace likely depends on the side-effect of uevent_show() flushing
device probing.

Introduce a new "locked" type for sysfs attributes that arranges for the
attribute to be called under the device-lock, but without setting up a
reverse locking order dependency with the kernfs_get_active() lock.

This new facility holds a reference on a device while any locked-sysfs
attribute of that device is open. It then takes the lock around sysfs
read/write operations in the following order:

    of->mutex
    of->op_mutex <= device_lock()
    kernfs_get_active()
    <operation>

Compare that to problematic locking order of:

    of->mutex
    kernfs_get_active()
    <operation>
        device_lock()

...which causes potential deadlocks with kernfs_drain() that may be
called while the device_lock() is held.

Fixes: 15fffc6a5624 ("driver core: Fix uevent_show() vs driver detach race") [1]
Cc: stable@vger.kernel.org
Link: https://bugzilla.kernel.org/show_bug.cgi?id=219244
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 drivers/base/core.c    |    2 +-
 fs/kernfs/file.c       |   24 +++++++++++++++++++-----
 fs/sysfs/file.c        |   47 +++++++++++++++++++++++++++++++++++++++++++++++
 fs/sysfs/group.c       |    4 ++--
 include/linux/device.h |    3 +++
 include/linux/kernfs.h |    1 +
 include/linux/sysfs.h  |   10 ++++++++++
 7 files changed, 83 insertions(+), 8 deletions(-)

diff --git a/drivers/base/core.c b/drivers/base/core.c
index 8c0733d3aad8..1fd5a18cbb62 100644
--- a/drivers/base/core.c
+++ b/drivers/base/core.c
@@ -2772,7 +2772,7 @@ static ssize_t uevent_store(struct device *dev, struct device_attribute *attr,
 
 	return count;
 }
-static DEVICE_ATTR_RW(uevent);
+static DEVICE_ATTR_LOCKED_RW(uevent);
 
 static ssize_t online_show(struct device *dev, struct device_attribute *attr,
 			   char *buf)
diff --git a/fs/kernfs/file.c b/fs/kernfs/file.c
index 8502ef68459b..eb5c2167beb9 100644
--- a/fs/kernfs/file.c
+++ b/fs/kernfs/file.c
@@ -142,6 +142,20 @@ static void kernfs_seq_stop_active(struct seq_file *sf, void *v)
 	kernfs_put_active(of->kn);
 }
 
+static void kernfs_open_file_lock(struct kernfs_open_file *of)
+{
+	mutex_lock(&of->mutex);
+	if (of->op_mutex)
+		mutex_lock(of->op_mutex);
+}
+
+static void kernfs_open_file_unlock(struct kernfs_open_file *of)
+{
+	if (of->op_mutex)
+		mutex_unlock(of->op_mutex);
+	mutex_unlock(&of->mutex);
+}
+
 static void *kernfs_seq_start(struct seq_file *sf, loff_t *ppos)
 {
 	struct kernfs_open_file *of = sf->private;
@@ -151,7 +165,7 @@ static void *kernfs_seq_start(struct seq_file *sf, loff_t *ppos)
 	 * @of->mutex nests outside active ref and is primarily to ensure that
 	 * the ops aren't called concurrently for the same open file.
 	 */
-	mutex_lock(&of->mutex);
+	kernfs_open_file_lock(of);
 	if (!kernfs_get_active(of->kn))
 		return ERR_PTR(-ENODEV);
 
@@ -193,7 +207,7 @@ static void kernfs_seq_stop(struct seq_file *sf, void *v)
 
 	if (v != ERR_PTR(-ENODEV))
 		kernfs_seq_stop_active(sf, v);
-	mutex_unlock(&of->mutex);
+	kernfs_open_file_unlock(of);
 }
 
 static int kernfs_seq_show(struct seq_file *sf, void *v)
@@ -322,9 +336,9 @@ static ssize_t kernfs_fop_write_iter(struct kiocb *iocb, struct iov_iter *iter)
 	 * @of->mutex nests outside active ref and is used both to ensure that
 	 * the ops aren't called concurrently for the same open file.
 	 */
-	mutex_lock(&of->mutex);
+	kernfs_open_file_lock(of);
 	if (!kernfs_get_active(of->kn)) {
-		mutex_unlock(&of->mutex);
+		kernfs_open_file_unlock(of);
 		len = -ENODEV;
 		goto out_free;
 	}
@@ -336,7 +350,7 @@ static ssize_t kernfs_fop_write_iter(struct kiocb *iocb, struct iov_iter *iter)
 		len = -EINVAL;
 
 	kernfs_put_active(of->kn);
-	mutex_unlock(&of->mutex);
+	kernfs_open_file_unlock(of);
 
 	if (len > 0)
 		iocb->ki_pos += len;
diff --git a/fs/sysfs/file.c b/fs/sysfs/file.c
index d1995e2d6c94..1bb878efcf00 100644
--- a/fs/sysfs/file.c
+++ b/fs/sysfs/file.c
@@ -15,6 +15,7 @@
 #include <linux/list.h>
 #include <linux/mutex.h>
 #include <linux/seq_file.h>
+#include <linux/device.h>
 #include <linux/mm.h>
 
 #include "sysfs.h"
@@ -189,6 +190,26 @@ static int sysfs_kf_bin_open(struct kernfs_open_file *of)
 	return 0;
 }
 
+/* locked attributes are always device attributes */
+static int sysfs_kf_setup_lock(struct kernfs_open_file *of)
+{
+	struct kobject *kobj = of->kn->parent->priv;
+	struct device *dev = kobj_to_dev(kobj);
+
+	get_device(dev);
+	of->op_mutex = &dev->mutex;
+
+	return 0;
+}
+
+static void sysfs_kf_free_lock(struct kernfs_open_file *of)
+{
+	struct kobject *kobj = of->kn->parent->priv;
+	struct device *dev = kobj_to_dev(kobj);
+
+	put_device(dev);
+}
+
 void sysfs_notify(struct kobject *kobj, const char *dir, const char *attr)
 {
 	struct kernfs_node *kn = kobj->sd, *tmp;
@@ -227,6 +248,25 @@ static const struct kernfs_ops sysfs_file_kfops_rw = {
 	.write		= sysfs_kf_write,
 };
 
+static const struct kernfs_ops sysfs_locked_kfops_ro = {
+	.seq_show	= sysfs_kf_seq_show,
+	.open		= sysfs_kf_setup_lock,
+	.release	= sysfs_kf_free_lock,
+};
+
+static const struct kernfs_ops sysfs_locked_kfops_wo = {
+	.write		= sysfs_kf_write,
+	.open		= sysfs_kf_setup_lock,
+	.release	= sysfs_kf_free_lock,
+};
+
+static const struct kernfs_ops sysfs_locked_kfops_rw = {
+	.seq_show	= sysfs_kf_seq_show,
+	.write		= sysfs_kf_write,
+	.open		= sysfs_kf_setup_lock,
+	.release	= sysfs_kf_free_lock,
+};
+
 static const struct kernfs_ops sysfs_prealloc_kfops_ro = {
 	.read		= sysfs_kf_read,
 	.prealloc	= true,
@@ -287,6 +327,13 @@ int sysfs_add_file_mode_ns(struct kernfs_node *parent,
 			ops = &sysfs_prealloc_kfops_ro;
 		else if (sysfs_ops->store)
 			ops = &sysfs_prealloc_kfops_wo;
+	} else if (mode & SYSFS_LOCKED) {
+		if (sysfs_ops->show && sysfs_ops->store)
+			ops = &sysfs_locked_kfops_rw;
+		else if (sysfs_ops->show)
+			ops = &sysfs_locked_kfops_ro;
+		else if (sysfs_ops->store)
+			ops = &sysfs_locked_kfops_wo;
 	} else {
 		if (sysfs_ops->show && sysfs_ops->store)
 			ops = &sysfs_file_kfops_rw;
diff --git a/fs/sysfs/group.c b/fs/sysfs/group.c
index d22ad67a0f32..0158367866be 100644
--- a/fs/sysfs/group.c
+++ b/fs/sysfs/group.c
@@ -68,11 +68,11 @@ static int create_files(struct kernfs_node *parent, struct kobject *kobj,
 					continue;
 			}
 
-			WARN(mode & ~(SYSFS_PREALLOC | 0664),
+			WARN(mode & ~(SYSFS_PREALLOC | SYSFS_LOCKED | 0664),
 			     "Attribute %s: Invalid permissions 0%o\n",
 			     (*attr)->name, mode);
 
-			mode &= SYSFS_PREALLOC | 0664;
+			mode &= SYSFS_PREALLOC | SYSFS_LOCKED | 0664;
 			error = sysfs_add_file_mode_ns(parent, *attr, mode, uid,
 						       gid, NULL);
 			if (unlikely(error))
diff --git a/include/linux/device.h b/include/linux/device.h
index 34eb20f5966f..c38c33bed333 100644
--- a/include/linux/device.h
+++ b/include/linux/device.h
@@ -180,6 +180,9 @@ ssize_t device_show_string(struct device *dev, struct device_attribute *attr,
 #define DEVICE_ATTR_RW(_name) \
 	struct device_attribute dev_attr_##_name = __ATTR_RW(_name)
 
+#define DEVICE_ATTR_LOCKED_RW(_name) \
+	struct device_attribute dev_attr_##_name = __ATTR_LOCKED_RW(_name)
+
 /**
  * DEVICE_ATTR_ADMIN_RW - Define an admin-only read-write device attribute.
  * @_name: Attribute name.
diff --git a/include/linux/kernfs.h b/include/linux/kernfs.h
index 87c79d076d6d..df6828a7cd3e 100644
--- a/include/linux/kernfs.h
+++ b/include/linux/kernfs.h
@@ -257,6 +257,7 @@ struct kernfs_open_file {
 
 	/* private fields, do not use outside kernfs proper */
 	struct mutex		mutex;
+	struct mutex		*op_mutex;
 	struct mutex		prealloc_mutex;
 	int			event;
 	struct list_head	list;
diff --git a/include/linux/sysfs.h b/include/linux/sysfs.h
index c4e64dc11206..981588c9c673 100644
--- a/include/linux/sysfs.h
+++ b/include/linux/sysfs.h
@@ -103,6 +103,7 @@ struct attribute_group {
 
 #define SYSFS_PREALLOC		010000
 #define SYSFS_GROUP_INVISIBLE	020000
+#define SYSFS_LOCKED		040000
 
 /*
  * DEFINE_SYSFS_GROUP_VISIBLE(name):
@@ -230,6 +231,13 @@ struct attribute_group {
 	.store	= _store,						\
 }
 
+#define __ATTR_LOCKED(_name, _mode, _show, _store) {			\
+	.attr = {.name = __stringify(_name),				\
+		 .mode = SYSFS_LOCKED | VERIFY_OCTAL_PERMISSIONS(_mode) },\
+	.show	= _show,						\
+	.store	= _store,						\
+}
+
 #define __ATTR_RO(_name) {						\
 	.attr	= { .name = __stringify(_name), .mode = 0444 },		\
 	.show	= _name##_show,						\
@@ -255,6 +263,8 @@ struct attribute_group {
 
 #define __ATTR_RW(_name) __ATTR(_name, 0644, _name##_show, _name##_store)
 
+#define __ATTR_LOCKED_RW(_name) __ATTR_LOCKED(_name, 0644, _name##_show, _name##_store)
+
 #define __ATTR_NULL { .attr = { .name = NULL } }
 
 #ifdef CONFIG_DEBUG_LOCK_ALLOC
Comment 11 brmails+k 2024-09-26 15:55:49 UTC
Good news!

I think the proposed patch by Dan Williams fixes the issue.

I have tested it with v6.6.52 and v6.10.11. I haven't been able to recreate the issue with those modified kernels even once.

The patch can be applied to v6.11.0 and v6.10.11 out of the box. For v6.6.52 I had to slightly modify it as the line

#define SYSFS_GROUP_INVISIBLE	020000

doesn't exist in /include/linux/sysfs.h in v6.6.52 hence the patch looking for that line fails on that file.
But after adjusting the patch accordingly, the patch works fine on v6.6.52 and the issue is gone with the patched version of v6.6.52, just like 6.10.11.

So, I assume that the fix / patch proposed by Dan Williams works as intended resolving the issue I had.

Thanks again for forwarding the bug report and for the quick fix!
Comment 12 Dan Williams 2024-09-27 20:57:56 UTC
(In reply to brmails+k from comment #11)
> Good news!
> 
> I think the proposed patch by Dan Williams fixes the issue.

Thanks for the report and the test!

Let me know if you want a different credit in the kernel besides the following:

Reported-by: brmails+k@disroot.org
Tested-by: brmails+k@disroot.org
Link: https://bugzilla.kernel.org/show_bug.cgi?id=219244
Comment 13 brmails+k 2024-09-28 06:26:46 UTC
Thanks for the fix!

I'd prefer just "brmails+k" to avoid spam, but if an email address is necessary, what you proposed is fine, too.
Comment 14 The Linux kernel's regression tracker (Thorsten Leemhuis) 2024-09-30 11:43:26 UTC
(In reply to brmails+k from comment #13)

> I'd prefer just "brmails+k" to avoid spam,

Maybe it's time to allow a format like 

Reported-by: brmails+k 
Tested-by: brmails+k

Dan, what do you think, should I submit this as a documentation change (which tells people that that format is only allowed in combination with a Link:/Closes: tag) and see what people think about it? That format could work for subsystems that use gitlab instances or github, too.

> but if an email address is
> necessary, what you proposed is fine, too.

The email address in a public comment now anyway. :-/
Comment 15 Dan Williams 2024-09-30 19:36:25 UTC
I see there is already precedent for not including a full email in Reported-by and Tested-by lines: "git log | grep Reported-by: | grep -v @". It does break flows like mine, "stg mail --auto", and checkpatch complains, but looks like "git send-email" is ok to skip over those lines without complaint.

I am not sure we need to formalize that this is ok because the preference would be to include testers on future updates about the patch, but if individuals do not want their email in the kernel history for Reported-by and Tested-by: then that is their prerogative.

An update that might be worthwhile in the documentation is the suggestion to extend reporters and testers the courtesy of confirming how and whether they are credited in git history rather than just assuming the contact details used to make the report will find their way into the kernel history verbatim.
Comment 16 brmails+k 2024-09-30 20:29:58 UTC
I didn't want do cause any trouble / extra work...

But thank you for all your efforts!

Luckily, I seem to be the only one who ran into this issue, so maybe no one outside of the kernel team will ever read the comment / commit... :-)
Comment 17 brmails+k 2024-10-03 15:04:57 UTC
I just tested the new version of the patch [1].

Works fine, as expected (I tested 6.6.53 and 6.10.12).

Thanks!




[1]

https://lore.kernel.org/all/172790598832.1168608.4519484276671503678.stgit@dwillia2-xfh.jf.intel.com/


Commit "driver core: Fix uevent_show() vs driver detach race" [1]
attempted to fix a lockdep report in uevent_show() by making the lookup
of driver information for a given device lockless. It turns out that
userspace likely depends on the side-effect of uevent_show() flushing
device probing, as evidenced by the removal of the lock causing a
regression initializing USB devices [2].

Introduce a new "locked" type for sysfs attributes that arranges for the
attribute to be called under the device-lock, but without setting up a
reverse locking order dependency with the kernfs_get_active() lock.

This new facility holds a reference on a device while any locked-sysfs
attribute of that device is open. It then takes the lock around sysfs
read/write operations in the following order:

    of->mutex
    of->op_mutex <= device_lock()
    kernfs_get_active()
    <operation>

Compare that to problematic locking order of:

    of->mutex
    kernfs_get_active()
    <operation>
        device_lock()

...which causes potential deadlocks with kernfs_drain() that may be
called while the device_lock() is held.

Note that the synchronize_rcu() in module_remove_driver(), introduced in
the precedubg fix, likely needs to remain since dev_uevent() is not
always called with the device_lock held. Userspace can potentially
trigger use after free by racing module removal against kernel
invocations of kobject_uevent().

Note2, this change effectively allows userspace to test for
CONFIG_DEBUG_KOBJECT_RELEASE-style driver bugs as the lifetime of open
file descriptors on the @uevent attribute keeps the device reference
count elevated indefinitely.

Reported-by: brmails+k
Closes: https://bugzilla.kernel.org/show_bug.cgi?id=219244 [2]
Tested-by: brmails+k
Fixes: 15fffc6a5624 ("driver core: Fix uevent_show() vs driver detach race") [1]
Cc: stable@vger.kernel.org
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
Changes since v1 [1]:
- Move the new "locked" infrastructure to private header files to make
  it clear it is not approved for general usage (Greg)

[1]: http://lore.kernel.org/172772671177.1003633.7355063154793624707.stgit@dwillia2-xfh.jf.intel.com

 drivers/base/base.h    |    9 +++++++++
 drivers/base/core.c    |    2 +-
 fs/kernfs/file.c       |   24 +++++++++++++++++++-----
 fs/sysfs/file.c        |   47 +++++++++++++++++++++++++++++++++++++++++++++++
 fs/sysfs/group.c       |    4 ++--
 fs/sysfs/sysfs.h       |   18 ++++++++++++++++++
 include/linux/kernfs.h |    1 +
 include/linux/sysfs.h  |    1 +
 8 files changed, 98 insertions(+), 8 deletions(-)

diff --git a/drivers/base/base.h b/drivers/base/base.h
index 0b53593372d7..0405e7667184 100644
--- a/drivers/base/base.h
+++ b/drivers/base/base.h
@@ -246,3 +246,12 @@ static inline int devtmpfs_delete_node(struct device *dev) { return 0; }
 
 void software_node_notify(struct device *dev);
 void software_node_notify_remove(struct device *dev);
+
+/*
+ * "locked" device attributes are an internal exception for the @uevent
+ * attribute.
+ */
+#include "../../fs/sysfs/sysfs.h"
+
+#define DEVICE_ATTR_LOCKED_RW(_name) \
+	struct device_attribute dev_attr_##_name = __ATTR_LOCKED_RW(_name)
diff --git a/drivers/base/core.c b/drivers/base/core.c
index 8c0733d3aad8..1fd5a18cbb62 100644
--- a/drivers/base/core.c
+++ b/drivers/base/core.c
@@ -2772,7 +2772,7 @@ static ssize_t uevent_store(struct device *dev, struct device_attribute *attr,
 
 	return count;
 }
-static DEVICE_ATTR_RW(uevent);
+static DEVICE_ATTR_LOCKED_RW(uevent);
 
 static ssize_t online_show(struct device *dev, struct device_attribute *attr,
 			   char *buf)
diff --git a/fs/kernfs/file.c b/fs/kernfs/file.c
index 8502ef68459b..eb5c2167beb9 100644
--- a/fs/kernfs/file.c
+++ b/fs/kernfs/file.c
@@ -142,6 +142,20 @@ static void kernfs_seq_stop_active(struct seq_file *sf, void *v)
 	kernfs_put_active(of->kn);
 }
 
+static void kernfs_open_file_lock(struct kernfs_open_file *of)
+{
+	mutex_lock(&of->mutex);
+	if (of->op_mutex)
+		mutex_lock(of->op_mutex);
+}
+
+static void kernfs_open_file_unlock(struct kernfs_open_file *of)
+{
+	if (of->op_mutex)
+		mutex_unlock(of->op_mutex);
+	mutex_unlock(&of->mutex);
+}
+
 static void *kernfs_seq_start(struct seq_file *sf, loff_t *ppos)
 {
 	struct kernfs_open_file *of = sf->private;
@@ -151,7 +165,7 @@ static void *kernfs_seq_start(struct seq_file *sf, loff_t *ppos)
 	 * @of->mutex nests outside active ref and is primarily to ensure that
 	 * the ops aren't called concurrently for the same open file.
 	 */
-	mutex_lock(&of->mutex);
+	kernfs_open_file_lock(of);
 	if (!kernfs_get_active(of->kn))
 		return ERR_PTR(-ENODEV);
 
@@ -193,7 +207,7 @@ static void kernfs_seq_stop(struct seq_file *sf, void *v)
 
 	if (v != ERR_PTR(-ENODEV))
 		kernfs_seq_stop_active(sf, v);
-	mutex_unlock(&of->mutex);
+	kernfs_open_file_unlock(of);
 }
 
 static int kernfs_seq_show(struct seq_file *sf, void *v)
@@ -322,9 +336,9 @@ static ssize_t kernfs_fop_write_iter(struct kiocb *iocb, struct iov_iter *iter)
 	 * @of->mutex nests outside active ref and is used both to ensure that
 	 * the ops aren't called concurrently for the same open file.
 	 */
-	mutex_lock(&of->mutex);
+	kernfs_open_file_lock(of);
 	if (!kernfs_get_active(of->kn)) {
-		mutex_unlock(&of->mutex);
+		kernfs_open_file_unlock(of);
 		len = -ENODEV;
 		goto out_free;
 	}
@@ -336,7 +350,7 @@ static ssize_t kernfs_fop_write_iter(struct kiocb *iocb, struct iov_iter *iter)
 		len = -EINVAL;
 
 	kernfs_put_active(of->kn);
-	mutex_unlock(&of->mutex);
+	kernfs_open_file_unlock(of);
 
 	if (len > 0)
 		iocb->ki_pos += len;
diff --git a/fs/sysfs/file.c b/fs/sysfs/file.c
index d1995e2d6c94..1bb878efcf00 100644
--- a/fs/sysfs/file.c
+++ b/fs/sysfs/file.c
@@ -15,6 +15,7 @@
 #include <linux/list.h>
 #include <linux/mutex.h>
 #include <linux/seq_file.h>
+#include <linux/device.h>
 #include <linux/mm.h>
 
 #include "sysfs.h"
@@ -189,6 +190,26 @@ static int sysfs_kf_bin_open(struct kernfs_open_file *of)
 	return 0;
 }
 
+/* locked attributes are always device attributes */
+static int sysfs_kf_setup_lock(struct kernfs_open_file *of)
+{
+	struct kobject *kobj = of->kn->parent->priv;
+	struct device *dev = kobj_to_dev(kobj);
+
+	get_device(dev);
+	of->op_mutex = &dev->mutex;
+
+	return 0;
+}
+
+static void sysfs_kf_free_lock(struct kernfs_open_file *of)
+{
+	struct kobject *kobj = of->kn->parent->priv;
+	struct device *dev = kobj_to_dev(kobj);
+
+	put_device(dev);
+}
+
 void sysfs_notify(struct kobject *kobj, const char *dir, const char *attr)
 {
 	struct kernfs_node *kn = kobj->sd, *tmp;
@@ -227,6 +248,25 @@ static const struct kernfs_ops sysfs_file_kfops_rw = {
 	.write		= sysfs_kf_write,
 };
 
+static const struct kernfs_ops sysfs_locked_kfops_ro = {
+	.seq_show	= sysfs_kf_seq_show,
+	.open		= sysfs_kf_setup_lock,
+	.release	= sysfs_kf_free_lock,
+};
+
+static const struct kernfs_ops sysfs_locked_kfops_wo = {
+	.write		= sysfs_kf_write,
+	.open		= sysfs_kf_setup_lock,
+	.release	= sysfs_kf_free_lock,
+};
+
+static const struct kernfs_ops sysfs_locked_kfops_rw = {
+	.seq_show	= sysfs_kf_seq_show,
+	.write		= sysfs_kf_write,
+	.open		= sysfs_kf_setup_lock,
+	.release	= sysfs_kf_free_lock,
+};
+
 static const struct kernfs_ops sysfs_prealloc_kfops_ro = {
 	.read		= sysfs_kf_read,
 	.prealloc	= true,
@@ -287,6 +327,13 @@ int sysfs_add_file_mode_ns(struct kernfs_node *parent,
 			ops = &sysfs_prealloc_kfops_ro;
 		else if (sysfs_ops->store)
 			ops = &sysfs_prealloc_kfops_wo;
+	} else if (mode & SYSFS_LOCKED) {
+		if (sysfs_ops->show && sysfs_ops->store)
+			ops = &sysfs_locked_kfops_rw;
+		else if (sysfs_ops->show)
+			ops = &sysfs_locked_kfops_ro;
+		else if (sysfs_ops->store)
+			ops = &sysfs_locked_kfops_wo;
 	} else {
 		if (sysfs_ops->show && sysfs_ops->store)
 			ops = &sysfs_file_kfops_rw;
diff --git a/fs/sysfs/group.c b/fs/sysfs/group.c
index d22ad67a0f32..0158367866be 100644
--- a/fs/sysfs/group.c
+++ b/fs/sysfs/group.c
@@ -68,11 +68,11 @@ static int create_files(struct kernfs_node *parent, struct kobject *kobj,
 					continue;
 			}
 
-			WARN(mode & ~(SYSFS_PREALLOC | 0664),
+			WARN(mode & ~(SYSFS_PREALLOC | SYSFS_LOCKED | 0664),
 			     "Attribute %s: Invalid permissions 0%o\n",
 			     (*attr)->name, mode);
 
-			mode &= SYSFS_PREALLOC | 0664;
+			mode &= SYSFS_PREALLOC | SYSFS_LOCKED | 0664;
 			error = sysfs_add_file_mode_ns(parent, *attr, mode, uid,
 						       gid, NULL);
 			if (unlikely(error))
diff --git a/fs/sysfs/sysfs.h b/fs/sysfs/sysfs.h
index 3f28c9af5756..51d5b6af243f 100644
--- a/fs/sysfs/sysfs.h
+++ b/fs/sysfs/sysfs.h
@@ -40,4 +40,22 @@ int sysfs_add_bin_file_mode_ns(struct kernfs_node *parent,
 int sysfs_create_link_sd(struct kernfs_node *kn, struct kobject *target,
 			 const char *name);
 
+#define SYSFS_LOCKED		040000
+
+/*
+ * uevent_show() needs this due to userspace expectations of reading
+ * that attribute flushing device probing, it is not intended to be a
+ * general semantic for other device sysfs attributes. It is broken to
+ * use this with non-device / pure-kobject sysfs attributes, see
+ * sysfs_kf_setup_lock().
+ */
+#define __ATTR_LOCKED(_name, _mode, _show, _store) {			\
+	.attr = {.name = __stringify(_name),				\
+		 .mode = SYSFS_LOCKED | VERIFY_OCTAL_PERMISSIONS(_mode) },\
+	.show	= _show,						\
+	.store	= _store,						\
+}
+
+#define __ATTR_LOCKED_RW(_name) __ATTR_LOCKED(_name, 0644, _name##_show, _name##_store)
+
 #endif	/* __SYSFS_INTERNAL_H */
diff --git a/include/linux/kernfs.h b/include/linux/kernfs.h
index 87c79d076d6d..df6828a7cd3e 100644
--- a/include/linux/kernfs.h
+++ b/include/linux/kernfs.h
@@ -257,6 +257,7 @@ struct kernfs_open_file {
 
 	/* private fields, do not use outside kernfs proper */
 	struct mutex		mutex;
+	struct mutex		*op_mutex;
 	struct mutex		prealloc_mutex;
 	int			event;
 	struct list_head	list;
diff --git a/include/linux/sysfs.h b/include/linux/sysfs.h
index c4e64dc11206..dd28c6238b64 100644
--- a/include/linux/sysfs.h
+++ b/include/linux/sysfs.h
@@ -103,6 +103,7 @@ struct attribute_group {
 
 #define SYSFS_PREALLOC		010000
 #define SYSFS_GROUP_INVISIBLE	020000
+/* see fs/sysfs/sysfs.h for private mode-flags */
 
 /*
  * DEFINE_SYSFS_GROUP_VISIBLE(name):

Note You need to log in before you can comment on or make changes to this bug.