Bug 213179 - vdpa_device_unregister() caller hangs if the QEMU (VM) using the vhost_vdpa device is running
Summary: vdpa_device_unregister() caller hangs if the QEMU (VM) using the vhost_vdpa d...
Status: NEW
Alias: None
Product: Drivers
Classification: Unclassified
Component: Network (show other bugs)
Hardware: All Linux
: P1 high
Assignee: drivers_network@kernel-bugs.osdl.org
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2021-05-22 20:04 UTC by Gautam Dawar
Modified: 2021-07-04 18:04 UTC (History)
0 users

See Also:
Kernel Version: 5.11 and above
Subsystem:
Regression: No
Bisected commit-id:


Attachments

Description Gautam Dawar 2021-05-22 20:04:29 UTC
In the vhost_vdpa use case, if QEMU (VM) is running, calling vdpa_device_unregister() in the hypervisor driver causes the caller to hang indefinitely (if hung_task_timeout_secs is zero) or results in kernel panic (with hung_task_timeout_secs non-zero/default value).

Here is the call trace when testing with sfc driver:

May 11 21:21:37 ndr730w kernel: INFO: task rmmod:21098 blocked for more than 122 seconds.
May 11 21:21:37 ndr730w kernel:      Tainted: G           OE     5.11.15 #3
May 11 21:21:37 ndr730w kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
May 11 21:21:37 ndr730w kernel: task:rmmod           state:D stack:    0 pid:21098 ppid: 20957 flags:0x00004000
May 11 21:21:37 ndr730w kernel: Call Trace:
May 11 21:21:37 ndr730w kernel: __schedule+0x391/0x830
May 11 21:21:37 ndr730w kernel: ? kernfs_put+0xec/0x190
May 11 21:21:37 ndr730w kernel: schedule+0x3c/0xa0
May 11 21:21:37 ndr730w kernel: schedule_timeout+0x215/0x2b0
May 11 21:21:37 ndr730w kernel: ? kobj_kset_leave+0x23/0x60
May 11 21:21:37 ndr730w kernel: ? __kobject_del+0x42/0x80
May 11 21:21:37 ndr730w kernel: wait_for_completion+0x98/0xf0
May 11 21:21:37 ndr730w kernel: vhost_vdpa_remove+0x58/0x70 [vhost_vdpa]
May 11 21:21:37 ndr730w kernel: vdpa_dev_remove+0x1f/0x30 [vdpa]
May 11 21:21:37 ndr730w kernel: device_release_driver_internal+0xf7/0x1d0
May 11 21:21:37 ndr730w kernel: bus_remove_device+0xdb/0x140
May 11 21:21:37 ndr730w kernel: device_del+0x18b/0x3e0
May 11 21:21:37 ndr730w kernel: device_unregister+0x16/0x50
May 11 21:21:37 ndr730w kernel: ef100_vdpa_delete+0x61/0x120 [sfc]
May 11 21:21:37 ndr730w kernel: ? kobject_release+0x58/0x150
May 11 21:21:37 ndr730w kernel: ef100_vdpa_fini+0x2e/0xa0 [sfc]
May 11 21:21:37 ndr730w kernel: ef100_remove+0x9b/0xa0 [sfc]
May 11 21:21:37 ndr730w kernel: ef100_pci_remove+0x34/0x90 [sfc]
May 11 21:21:37 ndr730w kernel: pci_device_remove+0x3b/0xc0
May 11 21:21:37 ndr730w kernel: device_release_driver_internal+0xf7/0x1d0
May 11 21:21:37 ndr730w kernel: driver_detach+0x46/0x90
May 11 21:21:37 ndr730w kernel: bus_remove_driver+0x77/0xd0
May 11 21:21:37 ndr730w kernel: pci_unregister_driver+0x2a/0xa0
May 11 21:21:37 ndr730w kernel: efx_exit_module+0x18/0xfd [sfc]
May 11 21:21:37 ndr730w kernel: __x64_sys_delete_module+0x13d/0x250
May 11 21:21:37 ndr730w kernel: ? syscall_trace_enter.isra.19+0x13c/0x1b0
May 11 21:21:37 ndr730w kernel: do_syscall_64+0x33/0x40
May 11 21:21:37 ndr730w kernel: entry_SYSCALL_64_after_hwframe+0x44/0xa9
Comment 1 Gautam Dawar 2021-05-22 20:07:31 UTC
Here is the sequence of events that cause the rmmod sfc command to hang:

1. QEMU opens the vhost-vdpa character device for configuring the underlying vDPA device. At this time vhost_vdpa_open callback of vhost-vdpa kernel module is invoked. This function sets the opened atomic variable to 1.

opened = atomic_cmpxchg(&v->opened, 0, 1);

2. This vdpa character device is generally closed when the VM is shutdown. At this time (first and last close() of the vhost-vdpa char device fd), vhost_vdpa module's release callback vhost_vdpa_release gets called which decrements the opened variable and issues the completion event:

atomic_dec(&v->opened);
complete(&v->completion);

3. When sfc is unloaded, vdpa_device_unregister is invoked for each vdpa device which  in-turn invokes vhost_vdpa_remove in vhost-vdpa module. vhost-vdpa kernel module waits  (wait_for_completion) on the atomic variable 'opened' which refers to the state of vdpa character device.
Since the VM was still running when sfc unload is attempted, the character device remains open and vhost_vdpa module's vhost_vdpa_release is not called. Because of this, vhost_vdpa_remove blocks on the completion event for hung_task_timeout_secs which is 120 seconds.
 

To fix this problem, vhost-vdpa module must notify qemu to close the fd corresponding to the vhost-vdpa character device when it is waiting for the completion event in vhost_vdpa_remove()
Comment 2 Gautam Dawar 2021-07-04 18:04:43 UTC
In continuation to my findings noted in bug description and previous comment, I found that any malicious user-space application can render a module registering a vDPA device to hang in their de-initialization sequence. This will typically surface when vdpa_device_unregister() is called from the function responsible for module unload leading rmmod commands to not return, forever.

This is being discussed at https://lists.linuxfoundation.org/pipermail/virtualization/2021-June/054775.html

Note You need to log in before you can comment on or make changes to this bug.