Created attachment 305463 [details] SDL Haptic Test Program This is a follow-up to an SDL issue, reported here: https://github.com/libsdl-org/SDL/issues/8071 In short: There is a way to freeze the device by using an external controller and sleep mode. I've attached a test program that uses SDL, which should be used with the following steps: Step 0: Start the system with 0 USB devices connected, run the test, should exit immediately Step 1: Connect Xbox One S controller via Bluetooth, run test again, should still exit immediately Step 2: Using the Xbox controller, put the system to sleep, wait for the Xbox controller to turn off Step 3: Wake up the system, do NOT reconnect the Xbox controller Step 4: Run test, should hang on ioctl It seems the issue is that on wakeup, the uinput device still exists when it shouldn't, so when uploading a haptic effect it hangs while trying to send the effect to a nonexistent endpoint. Beyond that, I can't say for certain what is happening, since I don't have the Steam Input source code handy.
I'm having a little trouble testing this as-is, but can you confirm if it hangs indefinitely? I see a 30 second timeout that might get hit and it'd be good to confirm if that's where it's blocking.
I can confirm that it does eventually time out and continue, I'd need to go back and measure it exactly but 30 seconds sounds about right.
Perfect, I'm preparing two patches, one of which should fix this, though I'm unsure which specific one will.
Created attachment 305527 [details] Preliminary patch 1 I have two preliminary patches. I don't know if you have the means to build the kernel yourself to test, but this first patch definitely fixes *a* bug that I observed with uploading a pattern preventing a suspend until it times out.
Created attachment 305528 [details] Preliminary patch 2 This patch fixes what I believe is a race condition between attempting to upload a pattern (or delete, etc) and the uinput device being destroyed due to the file handle being closed. I'm not sure this patch actually does anything useful though.
Here is a build of the Steam Deck kernel with those two patches applied: https://www.libsdl.org/tmp/linux-neptune-61-6.1.52.valve9-1-x86_64.pkg.tar.zst
Tested the above kernel's uinput module and it didn't appear to fix the problem. One more test procedure to make this a bit easier to test on a completely fresh Steam Deck: 1. Install RetroArch (FlatHub or Steam) 2. Connect any Bluetooth controller 3. Start and exit RetroArch using the external controller 4. Put the Deck to sleep, then wake it up after the controller turns off 5. Start RetroArch with the Deck controller or a keyboard, should freeze for exactly 30 seconds journalctl/dmesg didn't produce anything _too_ interesting, other than the leaking virtual gamepad mentioned in the SDL report.
Created attachment 305531 [details] Replacement patch 2 Hmm, alright. I've got a third patch I was working on that slightly more likely to help than the second patch. Since I'm not actually convinced that the second patch helps with anything, I've marked it as obsolete. I haven't written up a full commit message for it yet, but it should be safe for testing. Patch 1 definitely fixes things, though apparently not this. I'll see if I can reproduce from your steps, but if I can't, can you get the "process state" from top or ps while RA is frozen? If it's stuck in the kernel, it should be "D", as opposed to "R" or "S".
Also, do you have the exact message about the gamepad leaking? I'm unsure what a leak would look like, as once all references to /dev/uinput are closed it's supposed to clean up on its own.
The best evidence I have of a leak is some printf spam from SDL's enumeration of devices, since dmesg/journalctl just print devices getting added/removed: https://github.com/libsdl-org/SDL/issues/8071#issuecomment-1666305214 The process state is `Sl`, with this as the stack (at least with SDL games, Unity games using Rewired unfortunately give no useful stack since it's presumably doing this mostly from managed code): --- (gdb) bt #0 0x00007fbb3e10d88d in ioctl () from target:/usr/lib/x86_64-linux-gnu/libc.so.6 #1 0x00007fbb40791c3e in SDL_SYS_HapticNewEffect () from target:/usr/lib/x86_64-linux-gnu/libSDL2-2.0.so.0 #2 0x00007fbb406cb92b in SDL_HapticNewEffect_REAL () from target:/usr/lib/x86_64-linux-gnu/libSDL2-2.0.so.0 #3 0x000056275d3e5391 in ?? () #4 0x000056275d1a2d95 in ?? () #5 0x000056275d0dd837 in ?? () #6 0x000056275d3500ec in ?? () #7 0x000056275d350360 in ?? () #8 0x000056275d0e9db6 in ?? () #9 0x000056275d0df378 in ?? () #10 0x00007fbb3e02958a in __libc_start_call_main () from target:/usr/lib/x86_64-linux-gnu/libc.so.6 #11 0x00007fbb3e02964b in __libc_start_main () from target:/usr/lib/x86_64-linux-gnu/libc.so.6 #12 0x000056275d0d35e5 in ?? () --- Location in the SDL2 source: https://github.com/libsdl-org/SDL/blob/release-2.28.5/src/haptic/linux/SDL_syshaptic.c#L937 (FWIW it could also be that I updated the module incorrectly, though I would hope overwriting the old module in /var/lib/modules/ would just work...)
You'll need to reboot if you didn't. Anyway, I have what appears to be a reproducing case now. I think this is with my patched kernel, I see the following in strace: ioctl(27, EVIOCSFF, {type=FF_RUMBLE, id=-1, direction=0, ...}) = 0 ioctl(27, EVIOCRMFF, 0) = 0 ioctl(27, EVIOCSFF, {type=FF_RUMBLE, id=-1, direction=0, ...}) = ? ERESTARTSYS (To be restarted if SA_RESTART is set) ioctl(27, EVIOCSFF, {type=FF_RUMBLE, id=-1, direction=0, ...}) = -1 ENODEV (No such device) ioctl(27, EVIOCSFF, {type=FF_RUMBLE, id=-1, direction=0, ...} I'm not sure why it continues after the ENODEV, as that should be "fatal".
I have a slightly wedged build that I've been using for testing this race condition and got this after resuming. Seems evdev is unhappy doing the cleanup. [ 368.653060] task:CSteamControlle state:D stack:0 pid:1484 ppid:1142 flags:0x20004006 [ 368.653069] Call Trace: [ 368.653072] <TASK> [ 368.653081] __schedule+0x354/0x1260 [ 368.653099] ? rwsem_mark_wake+0x1c0/0x2e0 [ 368.653111] schedule+0x5e/0xd0 [ 368.653118] schedule_preempt_disabled+0x15/0x30 [ 368.653124] __mutex_lock.constprop.0+0x381/0x690 [ 368.653135] evdev_cleanup+0x22/0xc0 [ 368.653145] evdev_disconnect+0x31/0x60 [ 368.653151] __input_unregister_device+0xb2/0x190 [ 368.653160] input_unregister_device+0x45/0x70 [ 368.653173] uinput_destroy_device.cold+0x5f/0x64 [uinput 2d1e792fd6f612d1f13a711f0935024362625f8d] [ 368.653189] uinput_ioctl_handler.isra.0+0x1b2/0x930 [uinput 2d1e792fd6f612d1f13a711f0935024362625f8d] [ 368.653207] __ia32_compat_sys_ioctl+0xd6/0x1b0 [ 368.653218] __do_fast_syscall_32+0x89/0xe0 [ 368.653229] ? handle_mm_fault+0xf2/0x2e0 [ 368.653239] ? do_user_addr_fault+0x225/0x570 [ 368.653247] do_fast_syscall_32+0x33/0x80 [ 368.653252] entry_SYSCALL_compat_after_hwframe+0x63/0x6b [ 368.653259] RIP: 0023:0xf7f9c549 [ 368.653319] RSP: 002b:00000000d88fb944 EFLAGS: 00000292 ORIG_RAX: 0000000000000036 [ 368.653325] RAX: ffffffffffffffda RBX: 0000000000000072 RCX: 0000000000005502 [ 368.653328] RDX: 000000000000000b RSI: 00000000f7c22e34 RDI: 00000000df095370 [ 368.653331] RBP: 00000000df095370 R08: 00000000d88fb944 R09: 0000000000000000 [ 368.653333] R10: 0000000000000000 R11: 0000000000000292 R12: 0000000000000000 [ 368.653336] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000 [ 368.653345] </TASK>
I've determined there's a mutex interlock causing a hang, possibly this one. While the uinput owner is sending ioctls, the uinput mutex is locked. Then, when it destroys the uinput device, the evdev mutex is locked. However, while uploading an ff pattern to the evdev device from the client, it locks the evdev (and ff) mutex, then tries to lock the uinput mutex. Depending on the ordering, this can definitely lead to a deadlock.
Created attachment 305544 [details] Patch 2 v3 Here's a new version of patch 2 that fixes a deadlock. I think this might fix the bug this time, unless there's another way to trigger the race condition (quite likely, based on what I've seen--this driver seems to have a lot of potential race conditions, based on staring at it for the past several days).
And here is a Steam Deck kernel package with Patch 2 v3 https://www.libsdl.org/tmp/linux-neptune-uinput-attempt2.pkg.tar.zst
Tested Sam's latest kernel build and can confirm the issue persists.
After digging into the exact repro case you've provided, I think these are three separate bugs, two in the kernel (which I discovered and patched here) and one in Steam, not SDL nor the kernel (yours), based on some testing I did. I've relayed this information I've to Sam off of this bug tracker since I don't think this is actually a kernel bug at this point.
This issue ended up being a bug in Steam, and Vicki is upstreaming patches for the other issues she found.