[ 548.324604] ACPI: PM: Preparing to enter system sleep state S3 [ 548.326137] ACPI: EC: event blocked [ 548.326138] ACPI: EC: EC stopped [ 548.326139] ACPI: PM: Saving platform NVS memory [ 548.326184] Disabling non-boot CPUs ... [ 548.328170] IRQ146: set affinity failed(-22). $ grep . /sys/kernel/irq/146/* /sys/kernel/irq/146/actions:PCIe PME,aerdrv /sys/kernel/irq/146/chip_name:VMD-MSI /sys/kernel/irq/146/hwirq:125 /sys/kernel/irq/146/per_cpu_count:0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0 /sys/kernel/irq/146/type:edge /sys/kernel/irq/146/wakeup:disabled
Created attachment 303458 [details] dmesg
Created attachment 303459 [details] lspci
Tried patch 3/3 from [1] but it renders the system unbootable. [1] https://lore.kernel.org/linux-pci/1573040408-3831-1-git-send-email-jonathan.derrick@intel.com/
I assume this is a post-6.1 regression? Then finding the culprit with a bisection will likely be needed, unless a maintainers of the involved subsystems (Interrupts? Maybe PCI and PM?) have an idea what's causing this (but as you as kernel developer might know, you are unlikely to reach them here in bugzilla: https://lwn.net/Articles/910740/)
This is not a regression. The vmd_irq_set_affinity() doesn't have a proper implementation since VMD driver was introduced. And this is not for reaching out to maintainers, a BKO bug is required for Intel's internal escalation.
(In reply to Kai-Heng Feng from comment #5) > This is not a regression. Sorry for bothering then, looked a bit like one (many people forget to set "Regression" to "yes"… :-/)
Keith, is there any plan to implement vmd_irq_set_affinity()?
If there's any interest there, it won't come from me.
(In reply to Keith Busch from comment #8) > If there's any interest there, it won't come from me. If I were to implement it, do you know what's missing in [1] alone? [1] https://lore.kernel.org/linux-pci/1573040408-3831-4-git-send-email-jonathan.derrick@intel.com/
Its been awhile, but I think the cpu offline migration was the main problem. Unless something changed since I last looked, you'd need to register a hotcpu notifier that will rebalance the affinity and rewrite all the child device msi table entries.