216835 – S3/CPU hotplug doesn't work because of stubbed vmd_irq_set_affinity()

Bug 216835 - S3/CPU hotplug doesn't work because of stubbed vmd_irq_set_affinity()

Summary: S3/CPU hotplug doesn't work because of stubbed vmd_irq_set_affinity()

Status:	NEW

Alias:	None

Product:	Drivers
Classification:	Unclassified
Component:	Other (show other bugs)
Hardware:	All Linux

Importance:	P1 normal
Assignee:	drivers_other

URL:
Keywords:

Depends on:
Blocks:

Reported:	2022-12-23 02:15 UTC by Kai-Heng Feng
Modified:	2023-01-18 08:06 UTC (History)
CC List:	2 users (show)

See Also:
Kernel Version:	mainline
Subsystem:
Regression:	No
Bisected commit-id:

Attachments
dmesg (41.84 KB, text/plain) 2022-12-23 02:16 UTC, Kai-Heng Feng	Details
lspci (39.43 KB, text/plain) 2022-12-23 02:16 UTC, Kai-Heng Feng	Details
Add an attachment (proposed patch, testcase, etc.)

Description Kai-Heng Feng 2022-12-23 02:15:43 UTC

[  548.324604] ACPI: PM: Preparing to enter system sleep state S3
[  548.326137] ACPI: EC: event blocked
[  548.326138] ACPI: EC: EC stopped
[  548.326139] ACPI: PM: Saving platform NVS memory
[  548.326184] Disabling non-boot CPUs ...
[  548.328170] IRQ146: set affinity failed(-22).

$ grep . /sys/kernel/irq/146/*
/sys/kernel/irq/146/actions:PCIe PME,aerdrv
/sys/kernel/irq/146/chip_name:VMD-MSI
/sys/kernel/irq/146/hwirq:125
/sys/kernel/irq/146/per_cpu_count:0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
/sys/kernel/irq/146/type:edge
/sys/kernel/irq/146/wakeup:disabled

Comment 1 Kai-Heng Feng 2022-12-23 02:16:01 UTC

Created attachment 303458 [details]
dmesg

Comment 2 Kai-Heng Feng 2022-12-23 02:16:26 UTC

Created attachment 303459 [details]
lspci

Comment 3 Kai-Heng Feng 2022-12-23 02:18:34 UTC

Tried patch 3/3 from [1] but it renders the system unbootable.

[1] https://lore.kernel.org/linux-pci/1573040408-3831-1-git-send-email-jonathan.derrick@intel.com/

Comment 4 The Linux kernel's regression tracker (Thorsten Leemhuis) 2022-12-23 07:43:16 UTC

I assume this is a post-6.1 regression? Then finding the culprit with a bisection will likely be needed, unless a maintainers of the involved subsystems (Interrupts? Maybe PCI and PM?) have an idea what's causing this (but as you as kernel developer might know, you are unlikely to reach them here in bugzilla: https://lwn.net/Articles/910740/)

Comment 5 Kai-Heng Feng 2022-12-23 07:47:07 UTC

This is not a regression. The vmd_irq_set_affinity() doesn't have a proper implementation since VMD driver was introduced.

And this is not for reaching out to maintainers, a BKO bug is required for Intel's internal escalation.

Comment 6 The Linux kernel's regression tracker (Thorsten Leemhuis) 2022-12-23 07:55:24 UTC

(In reply to Kai-Heng Feng from comment #5)
> This is not a regression.

Sorry for bothering then, looked a bit like one (many people forget to set "Regression" to "yes"… :-/)

Comment 7 Kai-Heng Feng 2023-01-18 06:28:04 UTC

Keith, is there any plan to implement vmd_irq_set_affinity()?

Comment 8 Keith Busch 2023-01-18 06:38:19 UTC

If there's any interest there, it won't come from me.

Comment 9 Kai-Heng Feng 2023-01-18 07:41:28 UTC

(In reply to Keith Busch from comment #8)
> If there's any interest there, it won't come from me.

If I were to implement it, do you know what's missing in [1] alone?

[1] https://lore.kernel.org/linux-pci/1573040408-3831-4-git-send-email-jonathan.derrick@intel.com/

Comment 10 Keith Busch 2023-01-18 08:06:12 UTC

Its been awhile, but I think the cpu offline migration was the main problem. Unless something changed since I last looked, you'd need to register a hotcpu notifier that will rebalance the affinity and rewrite all the child device msi table entries.

Note You need to log in before you can comment on or make changes to this bug.