Bug 218183

Summary: Steam Deck uinput devices freeze on haptic ioctl
Product: Drivers Reporter: Ethan Lee (flibitijibibo)
Component: Input DevicesAssignee: drivers_input-devices
Status: NEW ---    
Severity: normal CC: slouken, vi
Priority: P3    
Hardware: All   
OS: Linux   
Kernel Version: Subsystem:
Regression: No Bisected commit-id:
Attachments: SDL Haptic Test Program
Preliminary patch 1
Preliminary patch 2
Replacement patch 2
Patch 2 v3

Description Ethan Lee 2023-11-23 18:45:02 UTC
Created attachment 305463 [details]
SDL Haptic Test Program

This is a follow-up to an SDL issue, reported here:

https://github.com/libsdl-org/SDL/issues/8071

In short: There is a way to freeze the device by using an external controller and sleep mode. I've attached a test program that uses SDL, which should be used with the following steps:

Step 0: Start the system with 0 USB devices connected, run the test, should exit immediately
Step 1: Connect Xbox One S controller via Bluetooth, run test again, should still exit immediately
Step 2: Using the Xbox controller, put the system to sleep, wait for the Xbox controller to turn off
Step 3: Wake up the system, do NOT reconnect the Xbox controller
Step 4: Run test, should hang on ioctl

It seems the issue is that on wakeup, the uinput device still exists when it shouldn't, so when uploading a haptic effect it hangs while trying to send the effect to a nonexistent endpoint. Beyond that, I can't say for certain what is happening, since I don't have the Steam Input source code handy.
Comment 1 Vicki Pfau 2023-12-01 00:15:37 UTC
I'm having a little trouble testing this as-is, but can you confirm if it hangs indefinitely? I see a 30 second timeout that might get hit and it'd be good to confirm if that's where it's blocking.
Comment 2 Ethan Lee 2023-12-01 00:43:46 UTC
I can confirm that it does eventually time out and continue, I'd need to go back and measure it exactly but 30 seconds sounds about right.
Comment 3 Vicki Pfau 2023-12-01 00:44:46 UTC
Perfect, I'm preparing two patches, one of which should fix this, though I'm unsure which specific one will.
Comment 4 Vicki Pfau 2023-12-02 00:05:07 UTC
Created attachment 305527 [details]
Preliminary patch 1

I have two preliminary patches. I don't know if you have the means to build the kernel yourself to test, but this first patch definitely fixes *a* bug that I observed with uploading a pattern preventing a suspend until it times out.
Comment 5 Vicki Pfau 2023-12-02 00:06:20 UTC
Created attachment 305528 [details]
Preliminary patch 2

This patch fixes what I believe is a race condition between attempting to upload a pattern (or delete, etc) and the uinput device being destroyed due to the file handle being closed. I'm not sure this patch actually does anything useful though.
Comment 6 Sam Lantinga 2023-12-02 23:03:06 UTC
Here is a build of the Steam Deck kernel with those two patches applied:
https://www.libsdl.org/tmp/linux-neptune-61-6.1.52.valve9-1-x86_64.pkg.tar.zst
Comment 7 Ethan Lee 2023-12-02 23:40:56 UTC
Tested the above kernel's uinput module and it didn't appear to fix the problem.

One more test procedure to make this a bit easier to test on a completely fresh Steam Deck:

1. Install RetroArch (FlatHub or Steam)
2. Connect any Bluetooth controller
3. Start and exit RetroArch using the external controller
4. Put the Deck to sleep, then wake it up after the controller turns off
5. Start RetroArch with the Deck controller or a keyboard, should freeze for exactly 30 seconds

journalctl/dmesg didn't produce anything _too_ interesting, other than the leaking virtual gamepad mentioned in the SDL report.
Comment 8 Vicki Pfau 2023-12-03 00:01:33 UTC
Created attachment 305531 [details]
Replacement patch 2

Hmm, alright. I've got a third patch I was working on that slightly more likely to help than the second patch. Since I'm not actually convinced that the second patch helps with anything, I've marked it as obsolete. I haven't written up a full commit message for it yet, but it should be safe for testing.

Patch 1 definitely fixes things, though apparently not this. I'll see if I can reproduce from your steps, but if I can't, can you get the "process state" from top or ps while RA is frozen? If it's stuck in the kernel, it should be "D", as opposed to "R" or "S".
Comment 9 Vicki Pfau 2023-12-03 00:03:23 UTC
Also, do you have the exact message about the gamepad leaking? I'm unsure what a leak would look like, as once all references to /dev/uinput are closed it's supposed to clean up on its own.
Comment 10 Ethan Lee 2023-12-03 00:19:28 UTC
The best evidence I have of a leak is some printf spam from SDL's enumeration of devices, since dmesg/journalctl just print devices getting added/removed:

https://github.com/libsdl-org/SDL/issues/8071#issuecomment-1666305214

The process state is `Sl`, with this as the stack (at least with SDL games, Unity games using Rewired unfortunately give no useful stack since it's presumably doing this mostly from managed code):

---

(gdb) bt
#0  0x00007fbb3e10d88d in ioctl () from target:/usr/lib/x86_64-linux-gnu/libc.so.6
#1  0x00007fbb40791c3e in SDL_SYS_HapticNewEffect () from target:/usr/lib/x86_64-linux-gnu/libSDL2-2.0.so.0
#2  0x00007fbb406cb92b in SDL_HapticNewEffect_REAL () from target:/usr/lib/x86_64-linux-gnu/libSDL2-2.0.so.0
#3  0x000056275d3e5391 in ?? ()
#4  0x000056275d1a2d95 in ?? ()
#5  0x000056275d0dd837 in ?? ()
#6  0x000056275d3500ec in ?? ()
#7  0x000056275d350360 in ?? ()
#8  0x000056275d0e9db6 in ?? ()
#9  0x000056275d0df378 in ?? ()
#10 0x00007fbb3e02958a in __libc_start_call_main () from target:/usr/lib/x86_64-linux-gnu/libc.so.6
#11 0x00007fbb3e02964b in __libc_start_main () from target:/usr/lib/x86_64-linux-gnu/libc.so.6
#12 0x000056275d0d35e5 in ?? ()

---

Location in the SDL2 source: https://github.com/libsdl-org/SDL/blob/release-2.28.5/src/haptic/linux/SDL_syshaptic.c#L937

(FWIW it could also be that I updated the module incorrectly, though I would hope overwriting the old module in /var/lib/modules/ would just work...)
Comment 11 Vicki Pfau 2023-12-03 00:32:21 UTC
You'll need to reboot if you didn't.

Anyway, I have what appears to be a reproducing case now. I think this is with my patched kernel, I see the following in strace:

ioctl(27, EVIOCSFF, {type=FF_RUMBLE, id=-1, direction=0, ...}) = 0
ioctl(27, EVIOCRMFF, 0)                 = 0
ioctl(27, EVIOCSFF, {type=FF_RUMBLE, id=-1, direction=0, ...}) = ? ERESTARTSYS (To be restarted if SA_RESTART is set)
ioctl(27, EVIOCSFF, {type=FF_RUMBLE, id=-1, direction=0, ...}) = -1 ENODEV (No such device)
ioctl(27, EVIOCSFF, {type=FF_RUMBLE, id=-1, direction=0, ...}

I'm not sure why it continues after the ENODEV, as that should be "fatal".
Comment 12 Vicki Pfau 2023-12-03 01:38:49 UTC
I have a slightly wedged build that I've been using for testing this race condition and got this after resuming. Seems evdev is unhappy doing the cleanup.

[  368.653060] task:CSteamControlle state:D stack:0     pid:1484  ppid:1142   flags:0x20004006
[  368.653069] Call Trace:
[  368.653072]  <TASK>
[  368.653081]  __schedule+0x354/0x1260
[  368.653099]  ? rwsem_mark_wake+0x1c0/0x2e0
[  368.653111]  schedule+0x5e/0xd0
[  368.653118]  schedule_preempt_disabled+0x15/0x30
[  368.653124]  __mutex_lock.constprop.0+0x381/0x690
[  368.653135]  evdev_cleanup+0x22/0xc0
[  368.653145]  evdev_disconnect+0x31/0x60
[  368.653151]  __input_unregister_device+0xb2/0x190
[  368.653160]  input_unregister_device+0x45/0x70
[  368.653173]  uinput_destroy_device.cold+0x5f/0x64 [uinput 2d1e792fd6f612d1f13a711f0935024362625f8d]
[  368.653189]  uinput_ioctl_handler.isra.0+0x1b2/0x930 [uinput 2d1e792fd6f612d1f13a711f0935024362625f8d]
[  368.653207]  __ia32_compat_sys_ioctl+0xd6/0x1b0
[  368.653218]  __do_fast_syscall_32+0x89/0xe0
[  368.653229]  ? handle_mm_fault+0xf2/0x2e0
[  368.653239]  ? do_user_addr_fault+0x225/0x570
[  368.653247]  do_fast_syscall_32+0x33/0x80
[  368.653252]  entry_SYSCALL_compat_after_hwframe+0x63/0x6b
[  368.653259] RIP: 0023:0xf7f9c549
[  368.653319] RSP: 002b:00000000d88fb944 EFLAGS: 00000292 ORIG_RAX: 0000000000000036
[  368.653325] RAX: ffffffffffffffda RBX: 0000000000000072 RCX: 0000000000005502
[  368.653328] RDX: 000000000000000b RSI: 00000000f7c22e34 RDI: 00000000df095370
[  368.653331] RBP: 00000000df095370 R08: 00000000d88fb944 R09: 0000000000000000
[  368.653333] R10: 0000000000000000 R11: 0000000000000292 R12: 0000000000000000
[  368.653336] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
[  368.653345]  </TASK>
Comment 13 Vicki Pfau 2023-12-03 01:56:13 UTC
I've determined there's a mutex interlock causing a hang, possibly this one. While the uinput owner is sending ioctls, the uinput mutex is locked. Then, when it destroys the uinput device, the evdev mutex is locked. However, while uploading an ff pattern to the evdev device from the client, it locks the evdev (and ff) mutex, then tries to lock the uinput mutex. Depending on the ordering, this can definitely lead to a deadlock.
Comment 14 Vicki Pfau 2023-12-06 02:16:59 UTC
Created attachment 305544 [details]
Patch 2 v3

Here's a new version of patch 2 that fixes a deadlock. I think this might fix the bug this time, unless there's another way to trigger the race condition (quite likely, based on what I've seen--this driver seems to have a lot of potential race conditions, based on staring at it for the past several days).
Comment 15 Sam Lantinga 2023-12-06 06:22:40 UTC
And here is a Steam Deck kernel package with Patch 2 v3
https://www.libsdl.org/tmp/linux-neptune-uinput-attempt2.pkg.tar.zst
Comment 16 Ethan Lee 2023-12-06 16:41:07 UTC
Tested Sam's latest kernel build and can confirm the issue persists.
Comment 17 Vicki Pfau 2023-12-07 05:22:41 UTC
After digging into the exact repro case you've provided, I think these are three separate bugs, two in the kernel (which I discovered and patched here) and one in Steam, not SDL nor the kernel (yours), based on some testing I did. I've relayed this information I've to Sam off of this bug tracker since I don't think this is actually a kernel bug at this point.
Comment 18 Sam Lantinga 2023-12-10 16:45:54 UTC
This issue ended up being a bug in Steam, and Vicki is upstreaming patches for the other issues she found.