Bug 216835 - S3/CPU hotplug doesn't work because of stubbed vmd_irq_set_affinity()
Summary: S3/CPU hotplug doesn't work because of stubbed vmd_irq_set_affinity()
Status: NEW
Alias: None
Product: Drivers
Classification: Unclassified
Component: Other (show other bugs)
Hardware: All Linux
: P1 normal
Assignee: drivers_other
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2022-12-23 02:15 UTC by Kai-Heng Feng
Modified: 2023-01-18 08:06 UTC (History)
2 users (show)

See Also:
Kernel Version: mainline
Subsystem:
Regression: No
Bisected commit-id:


Attachments
dmesg (41.84 KB, text/plain)
2022-12-23 02:16 UTC, Kai-Heng Feng
Details
lspci (39.43 KB, text/plain)
2022-12-23 02:16 UTC, Kai-Heng Feng
Details

Description Kai-Heng Feng 2022-12-23 02:15:43 UTC
[  548.324604] ACPI: PM: Preparing to enter system sleep state S3
[  548.326137] ACPI: EC: event blocked
[  548.326138] ACPI: EC: EC stopped
[  548.326139] ACPI: PM: Saving platform NVS memory
[  548.326184] Disabling non-boot CPUs ...
[  548.328170] IRQ146: set affinity failed(-22).

$ grep . /sys/kernel/irq/146/*
/sys/kernel/irq/146/actions:PCIe PME,aerdrv
/sys/kernel/irq/146/chip_name:VMD-MSI
/sys/kernel/irq/146/hwirq:125
/sys/kernel/irq/146/per_cpu_count:0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
/sys/kernel/irq/146/type:edge
/sys/kernel/irq/146/wakeup:disabled
Comment 1 Kai-Heng Feng 2022-12-23 02:16:01 UTC
Created attachment 303458 [details]
dmesg
Comment 2 Kai-Heng Feng 2022-12-23 02:16:26 UTC
Created attachment 303459 [details]
lspci
Comment 3 Kai-Heng Feng 2022-12-23 02:18:34 UTC
Tried patch 3/3 from [1] but it renders the system unbootable.

[1] https://lore.kernel.org/linux-pci/1573040408-3831-1-git-send-email-jonathan.derrick@intel.com/
Comment 4 The Linux kernel's regression tracker (Thorsten Leemhuis) 2022-12-23 07:43:16 UTC
I assume this is a post-6.1 regression? Then finding the culprit with a bisection will likely be needed, unless a maintainers of the involved subsystems (Interrupts? Maybe PCI and PM?) have an idea what's causing this (but as you as kernel developer might know, you are unlikely to reach them here in bugzilla: https://lwn.net/Articles/910740/)
Comment 5 Kai-Heng Feng 2022-12-23 07:47:07 UTC
This is not a regression. The vmd_irq_set_affinity() doesn't have a proper implementation since VMD driver was introduced.

And this is not for reaching out to maintainers, a BKO bug is required for Intel's internal escalation.
Comment 6 The Linux kernel's regression tracker (Thorsten Leemhuis) 2022-12-23 07:55:24 UTC
(In reply to Kai-Heng Feng from comment #5)
> This is not a regression.

Sorry for bothering then, looked a bit like one (many people forget to set "Regression" to "yes"… :-/)
Comment 7 Kai-Heng Feng 2023-01-18 06:28:04 UTC
Keith, is there any plan to implement vmd_irq_set_affinity()?
Comment 8 Keith Busch 2023-01-18 06:38:19 UTC
If there's any interest there, it won't come from me.
Comment 9 Kai-Heng Feng 2023-01-18 07:41:28 UTC
(In reply to Keith Busch from comment #8)
> If there's any interest there, it won't come from me.

If I were to implement it, do you know what's missing in [1] alone?

[1] https://lore.kernel.org/linux-pci/1573040408-3831-4-git-send-email-jonathan.derrick@intel.com/
Comment 10 Keith Busch 2023-01-18 08:06:12 UTC
Its been awhile, but I think the cpu offline migration was the main problem. Unless something changed since I last looked, you'd need to register a hotcpu notifier that will rebalance the affinity and rewrite all the child device msi table entries.

Note You need to log in before you can comment on or make changes to this bug.