In the vhost_vdpa use case, if QEMU (VM) is running, calling vdpa_device_unregister() in the hypervisor driver causes the caller to hang indefinitely (if hung_task_timeout_secs is zero) or results in kernel panic (with hung_task_timeout_secs non-zero/default value). Here is the call trace when testing with sfc driver: May 11 21:21:37 ndr730w kernel: INFO: task rmmod:21098 blocked for more than 122 seconds. May 11 21:21:37 ndr730w kernel: Tainted: G OE 5.11.15 #3 May 11 21:21:37 ndr730w kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. May 11 21:21:37 ndr730w kernel: task:rmmod state:D stack: 0 pid:21098 ppid: 20957 flags:0x00004000 May 11 21:21:37 ndr730w kernel: Call Trace: May 11 21:21:37 ndr730w kernel: __schedule+0x391/0x830 May 11 21:21:37 ndr730w kernel: ? kernfs_put+0xec/0x190 May 11 21:21:37 ndr730w kernel: schedule+0x3c/0xa0 May 11 21:21:37 ndr730w kernel: schedule_timeout+0x215/0x2b0 May 11 21:21:37 ndr730w kernel: ? kobj_kset_leave+0x23/0x60 May 11 21:21:37 ndr730w kernel: ? __kobject_del+0x42/0x80 May 11 21:21:37 ndr730w kernel: wait_for_completion+0x98/0xf0 May 11 21:21:37 ndr730w kernel: vhost_vdpa_remove+0x58/0x70 [vhost_vdpa] May 11 21:21:37 ndr730w kernel: vdpa_dev_remove+0x1f/0x30 [vdpa] May 11 21:21:37 ndr730w kernel: device_release_driver_internal+0xf7/0x1d0 May 11 21:21:37 ndr730w kernel: bus_remove_device+0xdb/0x140 May 11 21:21:37 ndr730w kernel: device_del+0x18b/0x3e0 May 11 21:21:37 ndr730w kernel: device_unregister+0x16/0x50 May 11 21:21:37 ndr730w kernel: ef100_vdpa_delete+0x61/0x120 [sfc] May 11 21:21:37 ndr730w kernel: ? kobject_release+0x58/0x150 May 11 21:21:37 ndr730w kernel: ef100_vdpa_fini+0x2e/0xa0 [sfc] May 11 21:21:37 ndr730w kernel: ef100_remove+0x9b/0xa0 [sfc] May 11 21:21:37 ndr730w kernel: ef100_pci_remove+0x34/0x90 [sfc] May 11 21:21:37 ndr730w kernel: pci_device_remove+0x3b/0xc0 May 11 21:21:37 ndr730w kernel: device_release_driver_internal+0xf7/0x1d0 May 11 21:21:37 ndr730w kernel: driver_detach+0x46/0x90 May 11 21:21:37 ndr730w kernel: bus_remove_driver+0x77/0xd0 May 11 21:21:37 ndr730w kernel: pci_unregister_driver+0x2a/0xa0 May 11 21:21:37 ndr730w kernel: efx_exit_module+0x18/0xfd [sfc] May 11 21:21:37 ndr730w kernel: __x64_sys_delete_module+0x13d/0x250 May 11 21:21:37 ndr730w kernel: ? syscall_trace_enter.isra.19+0x13c/0x1b0 May 11 21:21:37 ndr730w kernel: do_syscall_64+0x33/0x40 May 11 21:21:37 ndr730w kernel: entry_SYSCALL_64_after_hwframe+0x44/0xa9
Here is the sequence of events that cause the rmmod sfc command to hang: 1. QEMU opens the vhost-vdpa character device for configuring the underlying vDPA device. At this time vhost_vdpa_open callback of vhost-vdpa kernel module is invoked. This function sets the opened atomic variable to 1. opened = atomic_cmpxchg(&v->opened, 0, 1); 2. This vdpa character device is generally closed when the VM is shutdown. At this time (first and last close() of the vhost-vdpa char device fd), vhost_vdpa module's release callback vhost_vdpa_release gets called which decrements the opened variable and issues the completion event: atomic_dec(&v->opened); complete(&v->completion); 3. When sfc is unloaded, vdpa_device_unregister is invoked for each vdpa device which in-turn invokes vhost_vdpa_remove in vhost-vdpa module. vhost-vdpa kernel module waits (wait_for_completion) on the atomic variable 'opened' which refers to the state of vdpa character device. Since the VM was still running when sfc unload is attempted, the character device remains open and vhost_vdpa module's vhost_vdpa_release is not called. Because of this, vhost_vdpa_remove blocks on the completion event for hung_task_timeout_secs which is 120 seconds. To fix this problem, vhost-vdpa module must notify qemu to close the fd corresponding to the vhost-vdpa character device when it is waiting for the completion event in vhost_vdpa_remove()
In continuation to my findings noted in bug description and previous comment, I found that any malicious user-space application can render a module registering a vDPA device to hang in their de-initialization sequence. This will typically surface when vdpa_device_unregister() is called from the function responsible for module unload leading rmmod commands to not return, forever. This is being discussed at https://lists.linuxfoundation.org/pipermail/virtualization/2021-June/054775.html