Bug 193741 - uinput crashes when closed with uploaded FF effects
Summary: uinput crashes when closed with uploaded FF effects
Status: RESOLVED CODE_FIX
Alias: None
Product: Drivers
Classification: Unclassified
Component: Input Devices (show other bugs)
Hardware: All Linux
: P1 normal
Assignee: drivers_input-devices
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2017-01-31 19:37 UTC by Rodrigo Rivas Costa
Modified: 2019-02-07 01:35 UTC (History)
2 users (show)

See Also:
Kernel Version: 4.9.6
Tree: Mainline
Regression: No


Attachments
dmesg output (7.14 KB, text/plain)
2017-01-31 19:37 UTC, Rodrigo Rivas Costa
Details
patch that avoids the crash (1.14 KB, patch)
2017-02-08 20:06 UTC, Rodrigo Rivas Costa
Details | Diff

Description Rodrigo Rivas Costa 2017-01-31 19:37:52 UTC
Created attachment 253731 [details]
dmesg output

The following procedure crashes the system:

1. Compile and run this test program:

    #include <fcntl.h>
    #include <unistd.h>
    #include <string.h>
    #include <assert.h>
    #include <stdio.h>
    #include <sys/ioctl.h>
    #include <linux/uinput.h>

    int main()
    {
        int ui = open("/dev/uinput", O_RDWR | O_CLOEXEC);

        struct uinput_setup us = {0};
        us.id.bustype = BUS_VIRTUAL;
        us.id.version = 1;
        us.ff_effects_max = 1;
        strcpy(us.name, "Test");
        ioctl(ui, UI_DEV_SETUP, &us);
        ioctl(ui, UI_SET_PHYS, "Test");

        ioctl(ui, UI_SET_EVBIT, EV_FF);
        ioctl(ui, UI_SET_FFBIT, FF_RUMBLE);
        ioctl(ui, UI_DEV_CREATE, 0);

        struct input_event e = {0};
        read(ui, &e, sizeof(e));
        assert(e.type == EV_UINPUT && e.code == UI_FF_UPLOAD);
        printf("UI_FF_UPLOAD\n");

        struct uinput_ff_upload ff = {0};
        ioctl(ui, UI_BEGIN_FF_UPLOAD, &ff);
        ff.retval = 0;
        ioctl(ui, UI_END_FF_UPLOAD, &ff);

        printf("Done!\n");
        
        pause();    
    }

    $ gcc test.c && ./a.out

2. In another console, identify the newly created input device and run `fftest` on it (from package linuxconsole):

    # fftest /dev/input/event22

3. Ctrl+C the a.out program, the system crashes, see the attached dmesg output.

PS. If you Ctrl+C the fftest instead, then it will try to erase the effects. If the a.out program processed the UI_FF_ERASE, all would go well. But if it does  not, fftest will block indefinitely... and you are into point 3 again.
Comment 1 Rodrigo Rivas Costa 2017-02-08 20:06:14 UTC
Created attachment 254601 [details]
patch that avoids the crash

It looks like this bug is caused because when the uinput file is closed, the input device is destroyed, and eventually, uinput_dev_erase_effect() is called of each loaded effect. But since this function enqueues data to be read from the file descriptor, but the file descriptor is being closed, so some nested locking error happens.

I finally debugged the issue and it looks like the attached bug solves the issue: basically if `state != UIST_CREATED` then there is no point on doing any effect operations, because the device does not exist. Since when closing the device, the first operation is to set state to UIST_NEW_DEVICE, all works nicely.

I had to add this check to several functions, or else, the system would crash on different places, such as when closing the fftest instead of when closing the a.out.
Comment 2 Marcos Souza 2019-02-03 04:54:55 UTC
> PS. If you Ctrl+C the fftest instead, then it will try to erase the effects.
> If the a.out program processed the UI_FF_ERASE, all would go well. But if it
> does  not, fftest will block indefinitely... and you are into point 3 again.

Hi Rodrigo,

can you try to reproduce this issue in a newer kernel? Everything is normal in 4.12.14, with your test program returning (both fftest and your test programs need to be executed as root):

UI_FF_UPLOAD
Done!

Thanks.
Comment 3 Rodrigo Rivas Costa 2019-02-03 16:48:22 UTC
(In reply to Marcos Souza from comment #2)
> > PS. If you Ctrl+C the fftest instead, then it will try to erase the
> effects.
> > If the a.out program processed the UI_FF_ERASE, all would go well. But if
> it
> > does  not, fftest will block indefinitely... and you are into point 3
> again.
> 
> Hi Rodrigo,
>
> can you try to reproduce this issue in a newer kernel? Everything is normal
> in 4.12.14, with your test program returning (both fftest and your test
> programs need to be executed as root):

Hi Marcos
I'm currently using 4.20 and the `fftest` is unkillable without killing `a.out` first. Not even a `kill -9` will terminate it.

However, if I wait 30 seconds after Ctrl+C, then it does terminate. It looks like some time-out kicks in...

Killing `a.out` will immediately terminate both of the processes.

Regards.
Comment 4 Dmitry Torokhov 2019-02-06 00:47:44 UTC
commit 8e009118a45af30451ff4bbae2b6efd9575d6694
Author: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Date:   Wed Sep 6 16:31:29 2017 -0700

    Input: uinput - allow FF requests to time out
    
    Previously uinput force feedback requests waited for the userspace
    indefinitely, which caused users to block when uinput server process
    become unresponsive. Let's establish a 30 seconds deadline for servicing
    upload and erase force feedback effect actions, so that users have a
    chance to abort stuck requests.
    
    Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Comment 5 Dmitry Torokhov 2019-02-06 00:51:23 UTC
The original crash was fixed by:

commit e8b95728f724797f958912fd9b765a695595d3a6
Author: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Date:   Fri Sep 1 17:13:43 2017 -0700

    Input: uinput - avoid FF flush when destroying device

    Normally, when input device supporting force feedback effects is being
    destroyed, we try to "flush" currently playing effects, so that the
    physical device does not continue vibrating (or executing other effects).
    Unfortunately this does not work well for uinput as flushing of the effects
    deadlocks with the destroy action:
    
    - if device is being destroyed because the file descriptor is being closed,
      then there is noone to even service FF requests;
    
    - if device is being destroyed because userspace sent UI_DEV_DESTROY,
      while theoretically it could be possible to service FF requests,
      userspace is unlikely to do so (they'd need to make sure FF handling
      happens on a separate thread) even if kernel solves the issue with FF
      ioctls deadlocking with UI_DEV_DESTROY ioctl on udev->mutex.
    
    To avoid lockups like the one below, let's install a custom input device
    flush handler, and avoid trying to flush force feedback effects when we
    destroying the device, and instead rely on uinput to shut off the device
    properly.
    
    NMI watchdog: Watchdog detected hard LOCKUP on cpu 3
    ...
     <<EOE>>  [<ffffffff817a0307>] _raw_spin_lock_irqsave+0x37/0x40
     [<ffffffff810e633d>] complete+0x1d/0x50
     [<ffffffffa00ba08c>] uinput_request_done+0x3c/0x40 [uinput]
     [<ffffffffa00ba587>] uinput_request_submit.part.7+0x47/0xb0 [uinput]
     [<ffffffffa00bb62b>] uinput_dev_erase_effect+0x5b/0x76 [uinput]
     [<ffffffff815d91ad>] erase_effect+0xad/0xf0
     [<ffffffff815d929d>] flush_effects+0x4d/0x90
     [<ffffffff815d4cc0>] input_flush_device+0x40/0x60
     [<ffffffff815daf1c>] evdev_cleanup+0xac/0xc0
     [<ffffffff815daf5b>] evdev_disconnect+0x2b/0x60
     [<ffffffff815d74ac>] __input_unregister_device+0xac/0x150
     [<ffffffff815d75f7>] input_unregister_device+0x47/0x70
     [<ffffffffa00bac45>] uinput_destroy_device+0xb5/0xc0 [uinput]
     [<ffffffffa00bb2de>] uinput_ioctl_handler.isra.9+0x65e/0x740 [uinput]
     [<ffffffff811231ab>] ? do_futex+0x12b/0xad0
     [<ffffffffa00bb3f8>] uinput_ioctl+0x18/0x20 [uinput]
     [<ffffffff81241248>] do_vfs_ioctl+0x298/0x480
     [<ffffffff81337553>] ? security_file_ioctl+0x43/0x60
     [<ffffffff812414a9>] SyS_ioctl+0x79/0x90
     [<ffffffff817a04ee>] entry_SYSCALL_64_fastpath+0x12/0x71
    
    Reported-by: Rodrigo Rivas Costa <rodrigorivascosta@gmail.com>
    Reported-by: Clément VUCHENER <clement.vuchener@gmail.com>
    Fixes: https://bugzilla.kernel.org/show_bug.cgi?id=193741
    Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Comment 6 Marcos Souza 2019-02-06 01:00:00 UTC
Hi Dmitry,

but about the unkillable fftest, do you think it needs a new bug to be open?

In my tests, if we run the code provided by Rodrigo, fftest can't be killed, and if we type -1, fftest just prints "Stopping effects" and can't be killed either.

Thanks.
Comment 7 Dmitry Torokhov 2019-02-06 05:24:39 UTC
It does terminate after 30 seconds timeout, no?
Comment 8 Dmitry Torokhov 2019-02-06 05:37:50 UTC
I guess we could try changing wait_for_completion_timeout() to wait_for_completion_killable_timeout() in uinput_request_submit().
Comment 9 Rodrigo Rivas Costa 2019-02-06 21:49:48 UTC
(In reply to Dmitry Torokhov from comment #8)
> I guess we could try changing wait_for_completion_timeout() to
> wait_for_completion_killable_timeout() in uinput_request_submit().

I've just tested it and it does exactly the same, unkillable for 30 seconds. 
However, I tried wait_for_completion_interruptible_timeout() instead and it does finish instantly when I Ctrl+C, I don't know why the difference.
Comment 10 Marcos Souza 2019-02-07 01:35:40 UTC
(In reply to Dmitry Torokhov from comment #7)
> It does terminate after 30 seconds timeout, no?

I was testing kernel 4.14, so it just stucks... but testing 5.0-rc4, indeed, it exits in 30 seconds, because of this guy here: wait_for_completion_timeout(&request->done, 30 * HZ)

In 4.14, there wasn't a timeout for this wait call:
wait_for_completion_interruptible(&request->done);

(In reply to Rodrigo Rivas Costa from comment #9)
> (In reply to Dmitry Torokhov from comment #8)
> > I guess we could try changing wait_for_completion_timeout() to
> > wait_for_completion_killable_timeout() in uinput_request_submit().
> 
> I've just tested it and it does exactly the same, unkillable for 30 seconds. 
> However, I tried wait_for_completion_interruptible_timeout() instead and it
> does finish instantly when I Ctrl+C, I don't know why the difference.

It happens because killable_timeout sets the state to TASK_KILLABLE:

#define TASK_KILLABLE       (TASK_WAKEKILL | TASK_UNINTERRUPTIBLE)

In the function wait_for_completion_killable_timeout, this is stated:

...
 * This waits for either a completion of a specific task to be
 * signaled or for a specified timeout to expire. It can be
 * interrupted by a kill signal. The timeout is in jiffies.
...

So, in my POV, only SIGKILL would be able to "interrupt" this wait.

OTOH, wait_for_completion_interruptible_timeout documentation is much less restrict:

...
 * This waits for either a completion of a specific task to be signaled or for a
 * specified timeout to expire. It is interruptible. The timeout is in jiffies.
...

So, it seems that _any_ signal would interrupt the wait, and not only SIGKILL. Take a look in the call stack bellow:
 * wait_for_completion_interruptible_timeout
   * wait_for_common (completion, timeout, TASK_INTERRUPTIBLE)
     * __wait_for_common (completion, schedule_timeout, timeout, TASK_INTERRUPTIBLE);

In __wait_for_common, the schedule_timeout is an *action*, and in this case, just sets a timeout in the current process and calls schedule.

As the call stack goes down, it goes to do_wait_for_common, which sets the current task to TASK_INTERRUPTIBLE (in this example), and then calls the action (schedule_timeout in this example), waiting for a signal (because the current task is INTERRUPTIBLE) or to reach the timeout.

In my tests, the current code is already dies if a SIGKILL is issued to fftest, but maybe this is an expected behavior (maybe implied by TASK_WAKEKILL) ?

@Dmitry, is this assumption right? If yes, what about changing the wait_for_completion_timeout to wait_for_completion_interruptible_timeout?

What about create a test in kernel selftests to ensure this bug never comes back to life? (I can happily send a patch once we agree in this matter).

Thanks!

Note You need to log in before you can comment on or make changes to this bug.