Bug 215787

Summary: mt7921: panic on unbind - drivers/net/wireless/mediatek/mt76/mt7921/pci.c:mt7921_pci_remove
Product: Drivers Reporter: Thiner Logoer (logoerthiner1)
Component: network-wirelessAssignee: drivers_network-wireless (drivers_network-wireless)
Status: RESOLVED PATCH_ALREADY_AVAILABLE    
Severity: high CC: iakuninaa, mario.limonciello
Priority: P1    
Hardware: AMD   
OS: Linux   
Kernel Version: latest as in 2022-04-01 and 5.17 5.16 5.15 ... Subsystem:
Regression: No Bisected commit-id:

Description Thiner Logoer 2022-04-01 03:49:22 UTC
The issue is reported at: https://github.com/QubesOS/qubes-issues/issues/7294

`mt7921_pci_remove` seems to always crash whenever it is called.

This can be reproduced by echoing the pci address of wifi device (for example `0000:00:07.0`) to `/sys/bus/pci/drivers/mt7921e/unbind`.

I have read the kernel source code and have a guess.

```
static void mt7921_pci_remove(struct pci_dev *pdev)
{
	struct mt76_dev *mdev = pci_get_drvdata(pdev);
	struct mt7921_dev *dev = container_of(mdev, struct mt7921_dev, mt76);

	mt7921e_unregister_device(dev);
	devm_free_irq(&pdev->dev, pdev->irq, dev);
	pci_free_irq_vectors(pdev);
}
```

From my newbie kernel knowledge I suspect that `mt7921_pci_remove` should first call `devm_free_irq` and then `mt7921e_unregister_device`, due to the reason that `devm_free_irq` calls `free_irq` that "does not return until any executing interrupts for this IRQ have completed" according to the comment there, and that when IRQ for mt7921 is being handled, it 100% uses some fields in `dev`, so before that `dev` cannot be unregistered.

My original email is in https://lore.kernel.org/linux-wireless/153f1a0c.36a0.17fba8be75c.Coremail.logoerthiner1@163.com/T/ however it seems that maillist is not the correct place.
Comment 1 Iakunin Andrei 2022-05-13 15:35:05 UTC
I have the same issue with my ThinkPad E14 Gen 3 with AMD Ryzen 3 5300U.
Laptop did not wake up after suspend.

Device-2: MEDIATEK MT7921 802.11ax PCI Express Wireless Network Adapter driver: mt7921e
 

Proposed patch good for 5.18+ kernel branch, but in can be easy changed to use with kernels 5.17 and before. 
I patched my 5.15.25  kernel with it and it fix the problem.
Comment 2 Mario Limonciello (AMD) 2023-03-08 19:49:52 UTC
I came across this issue and wanted to share it was fixed in the mainline kernel 5.19 and later last year with this commit:

ad483ed9dd51 ("mt76: mt7921: fix kernel crash at mt7921_pci_remove")