Created attachment 149991 [details] Fix crash after rmmod radeon on PX systems. Calling rmmod radeon on PX system cause kernel crash. Reason is function vga_switcheroo_init_domain_pm_ops() which setting dev->pm_domain function of PCI device. When radeon module is unloaded pointer dev->pm_domain is set to vga_switcheroo function which try to call radeon function (which does not exists in memory after rmmod radeon). I bet that nouveau has same problem. I'm attaching simple patch which set dev->pm_domain of PCI device back to NULL when removing radeon device so vga_switcheroo will not be called. But I think that proper way for fixing this bug - which is in vga_switcheroo - should be to add function like "vga_switcheroo_exit_domain_pm_ops()" which will set pm_domain back to origin value (which is in my case NULL). With my patch on PX system I can call rmmod radeon, modprobe radeon, rmmod radeon, ... many times without no crash.
Care to generate a git patch and sign-off on it?
I can, but I do not know if this is proper way how to fix it. I still think that root of bug is in function vga_switcheroo_init_domain_pm_ops() which overwrite dev->pm_domain, but does not restore it when driver/device unregister.
Created attachment 150001 [details] patch 1/3 How about this patch set?
Created attachment 150011 [details] patch 2/3
Created attachment 150021 [details] patch 3/3
I tested 1/3 and 2/3 on 3.13 kernel. And as expected (because patches doing same thing) same result as with my patch - no kernel crash anymore. You can add my Signed-off. I do not have nvidia optimus card, so I cannot test last patch. Anyway in vga_switcheroo.c is exported function vga_switcheroo_init_domain_pm_optimus_hdmi_audio() which changing dev->pm_domain too. But I do not see any driver which using it.
Function vga_switcheroo_init_domain_pm_optimus_hdmi_audio() is used in sound/pci/hda/hda_intel.c. So that driver has same problem and cause kernel panic on driver unload.
Alex, That patchset indeed got rid of that bug, but for some reason it introduced another one: https://bugzilla.kernel.org/show_bug.cgi?id=86011 97d30fa3524ff60b43d450012abe8f961d280478 from stable kernel tree breaks nouveau power management through vga-switcheroo.
(In reply to Pali Rohár from comment #7) > Function vga_switcheroo_init_domain_pm_optimus_hdmi_audio() is used in > sound/pci/hda/hda_intel.c. So that driver has same problem and cause kernel > panic on driver unload. A patch for this issue is queued at http://mailman.alsa-project.org/pipermail/alsa-devel/2016-July/110125.html Joaquín, how does 97d30fa35 break nouveau vga-switcheroo? If you load nouveau with runpm=0, then you can write OFF to debugfs' vga_switcheroo. However runpm=1 (or -1 for Optimus systems) is recommended. I think that the original bug is fixed, so this can be marked as resolved?
> Joaquín, how does 97d30fa35 break nouveau vga-switcheroo? If you load > nouveau with runpm=0, then you can write OFF to debugfs' vga_switcheroo. > However runpm=1 (or -1 for Optimus systems) is recommended. Just tested removing nouveau module with Ubuntu 16.04 on mainline kernel v4.6.5 and it worked correctly. Also modprobed it after that and worked correctly. This bug should be marked as resolved.