Bug 215379 - pcie hotplug problem with pasid enabled devices
Summary: pcie hotplug problem with pasid enabled devices
Status: NEW
Alias: None
Product: Drivers
Classification: Unclassified
Component: IOMMU (show other bugs)
Hardware: ARM Linux
: P1 normal
Assignee: drivers_iommu
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2021-12-21 02:37 UTC by yaohongbo
Modified: 2021-12-21 02:43 UTC (History)
0 users

See Also:
Kernel Version: 5.16-rc5
Subsystem:
Regression: No
Bisected commit-id:


Attachments

Description yaohongbo 2021-12-21 02:37:50 UTC
# step1: boot the os normally.

pci topo:
 +-[0000:98]-+-00.0-[99]----00.0  Samsung Electronics Co Ltd Device a824
 |           +-01.0-[9a]----00.0  Intel Corporation Device 0b60
 |           +-02.0-[9b]----00.0  Samsung Electronics Co Ltd Device a824
 |           \-03.0-[9c]----00.0  Intel Corporation Device 0b60


Every thing is ok. Each device would alloc an iommu.

[   28.230583] pcieport 0000:98:00.0: Adding to iommu group 8
[   28.238956] pcieport 0000:98:00.0: PME: Signaling with IRQ 89
[   28.247042] pcieport 0000:98:00.0: pciehp: Slot #41 AttnBtn- PwrCtrl+ MRL- AttnInd+ PwrInd+ HotPlug+ Surprise- Interlock+ NoCompl- IbPresDis- LLActRep+
[   28.265505] pcieport 0000:98:01.0: Adding to iommu group 9
[   28.273967] pcieport 0000:98:01.0: PME: Signaling with IRQ 90
[   28.282110] pcieport 0000:98:01.0: pciehp: Slot #42 AttnBtn- PwrCtrl+ MRL- AttnInd+ PwrInd+ HotPlug+ Surprise- Interlock+ NoCompl- IbPresDis- LLActRep+
[   28.300701] pcieport 0000:98:02.0: Adding to iommu group 10
[   28.309284] pcieport 0000:98:02.0: PME: Signaling with IRQ 91
[   28.317513] pcieport 0000:98:02.0: pciehp: Slot #43 AttnBtn- PwrCtrl+ MRL- AttnInd+ PwrInd+ HotPlug+ Surprise- Interlock+ NoCompl- IbPresDis- LLActRep+
[   28.336332] pcieport 0000:98:03.0: Adding to iommu group 11
[   28.345102] pcieport 0000:98:03.0: PME: Signaling with IRQ 92
[   28.353504] pcieport 0000:98:03.0: pciehp: Slot #44 AttnBtn- PwrCtrl+ MRL- AttnInd+ PwrInd+ HotPlug+ Surprise- Interlock+ NoCompl- IbPresDis- LLActRep+
....
[   29.886411] nvme 0000:99:00.0: Adding to iommu group 16
[   29.887116] nvme nvme0: pci function 0000:99:00.0
[   29.887367] nvme 0000:9c:00.0: Adding to iommu group 17
[   29.887961] nvme nvme1: pci function 0000:9c:00.0

98:03.0 is the upstream bridge, and its slots is #44.

# step2: hot-plug the pasid enabld devices.

echo 0 > /sys/bus/pci/slots/44/power
echo 1 > /sys/bus/pci/slots/44/power

the log is following:
[  455.717886] pci 0000:9c:00.0: Removing from iommu group 17
[  503.215235] pcieport 0000:98:03.0: pciehp: Slot(44): Card present
[  503.217039] systemd-journald[1251]: Sent WATCHDOG=1 notification.
[  503.492810] pci 0000:9c:00.0: config space:
[  503.492840] 00000000: 86 80 60 0b 00 00 10 00 00 02 08 01 00 00 00 00
[  503.492842] 00000010: 04 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[  503.492842] 00000020: 00 00 00 00 00 00 00 00 00 00 00 00 86 80 08 80
[  503.492843] 00000030: 00 00 00 00 40 00 00 00 00 00 00 00 00 01 00 00
[  503.492844] 00000040: 01 50 03 00 08 00 00 00 00 00 00 00 00 00 00 00
[  503.492845] 00000050: 11 60 87 00 00 20 00 00 00 30 00 00 00 00 00 00
[  503.492846] 00000060: 10 a0 02 00 22 89 00 10 10 28 00 00 44 3c 46 00
[  503.492847] 00000070: 00 00 44 00 00 00 00 00 00 00 00 00 00 00 00 00
[  503.492847] 00000080: 00 00 00 00 1f 08 31 00 00 00 00 00 1e 00 80 01
[  503.492848] 00000090: 04 00 1e 00 00 00 00 00 00 00 00 00 00 00 00 00
[  503.492849] 000000a0: 05 00 8a 01 00 00 00 00 00 00 00 00 00 00 00 00
[  503.492850] 000000b0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[  503.492850] 000000c0: 03 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[  503.492851] 000000d0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[  503.492853] 000000e0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[  503.492854] 000000f0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[  503.492863] pci 0000:9c:00.0: [8086:0b60] type 00 class 0x010802
[  503.492873] pci 0000:9c:00.0: reg 0x10: [mem 0x00000000-0x00003fff 64bit]
[  503.492886] pci 0000:9c:00.0: reg 0x30: [mem 0x00000000-0x0000ffff pref]
[  503.492890] pci 0000:9c:00.0: Max Payload Size set to 512 (was 128, max 512)
[  503.492892] pci 0000:9c:00.0: enabling Extended Tags
[  503.493064] pci 0000:9c:00.0: BAR 6: assigned [mem 0x75600000-0x7560ffff pref]
[  503.493066] pci 0000:9c:00.0: BAR 0: assigned [mem 0x75610000-0x75613fff 64bit]
[  503.493071] pcieport 0000:98:03.0: PCI bridge to [bus 9c]
[  503.493072] pcieport 0000:98:03.0:   bridge window [io  0x53000-0x53fff]
[  503.493074] pcieport 0000:98:03.0:   bridge window [mem 0x75600000-0x757fffff]
[  503.493075] pcieport 0000:98:03.0:   bridge window [mem 0x6a000600000-0x6a0007fffff 64bit pref]
[  503.493078] pcieport 0000:98:03.0: enabling device (0106 -> 0107)
[  503.493130] nvme 0000:9c:00.0: cannot attach to incompatible domain (0 SSID bits != 20)
[  503.501268] nvme 0000:9c:00.0: Failed to add to iommu group 11: -22
[  503.507848] nvme nvme1: pci function 0000:9c:00.0
[  503.507883] nvme 0000:9c:00.0: enabling device (0000 -> 0002)
Comment 1 yaohongbo 2021-12-21 02:41:42 UTC
the device removing from iommu 17, and then add to iommu group 11.
Comment 2 yaohongbo 2021-12-21 02:43:00 UTC
#lspci -s 9c:00.0 -vvvv
9c:00.0 Non-Volatile memory controller: Intel Corporation Device 0b60 (prog-if 02 [NVM Express])
	Subsystem: Intel Corporation Device 8008
	Physical Slot: 44
	Control: I/O- Mem+ BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
	Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort+ <MAbort- >SERR- <PERR- INTx-
	Interrupt: pin A routed to IRQ 92
	Region 0: Memory at 75610000 (64-bit, non-prefetchable) [size=16K]
	[virtual] Expansion ROM at 75600000 [disabled] [size=64K]
	Capabilities: [40] Power Management version 3
		Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
		Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
	Capabilities: [50] MSI-X: Enable- Count=136 Masked-
		Vector table: BAR=0 offset=00002000
		PBA: BAR=0 offset=00003000
	Capabilities: [60] Express (v2) Endpoint, MSI 00
		DevCap:	MaxPayload 512 bytes, PhantFunc 0, Latency L0s <1us, L1 <16us
			ExtTag+ AttnBtn- AttnInd- PwrInd- RBE+ FLReset+
		DevCtl:	Report errors: Correctable- Non-Fatal- Fatal- Unsupported-
			RlxdOrd+ ExtTag+ PhantFunc- AuxPwr- NoSnoop+ FLReset-
			MaxPayload 512 bytes, MaxReadReq 512 bytes
		DevSta:	CorrErr- UncorrErr- FatalErr- UnsuppReq- AuxPwr- TransPend-
		LnkCap:	Port #0, Speed unknown, Width x4, ASPM L0s L1, Exit Latency L0s <512ns, L1 <16us
			ClockPM+ Surprise- LLActRep- BwNot-
		LnkCtl:	ASPM Disabled; RCB 64 bytes Disabled- CommClk-
			ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
		LnkSta:	Speed unknown, Width x4, TrErr- Train- SlotClk- DLActive- BWMgmt- ABWMgmt-
		DevCap2: Completion Timeout: Range ABCD, TimeoutDis+, LTR+, OBFF Not Supported
		DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR-, OBFF Disabled
		LnkCtl2: Target Link Speed: Unknown, EnterCompliance- SpeedDis-
			 Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
			 Compliance De-emphasis: -6dB
		LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete+, EqualizationPhase1+
			 EqualizationPhase2+, EqualizationPhase3+, LinkEqualizationRequest-
	Capabilities: [a0] MSI: Enable- Count=1/32 Maskable+ 64bit+
		Address: 0000000000000000  Data: 0000
		Masking: 00000000  Pending: 00000000
	Capabilities: [100 v2] Advanced Error Reporting
		UESta:	DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
		UEMsk:	DLP- SDES- TLP- FCP- CmpltTO+ CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
		UESvrt:	DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
		CESta:	RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr-
		CEMsk:	RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+
		AERCap:	First Error Pointer: 00, GenCap+ CGenEn- ChkCap+ ChkEn-
	Capabilities: [150 v1] Virtual Channel
		Caps:	LPEVC=0 RefClk=100ns PATEntryBits=1
		Arb:	Fixed- WRR32- WRR64- WRR128-
		Ctrl:	ArbSelect=Fixed
		Status:	InProgress-
		VC0:	Caps:	PATOffset=00 MaxTimeSlots=1 RejSnoopTrans-
			Arb:	Fixed- WRR32- WRR64- WRR128- TWRR128- WRR256-
			Ctrl:	Enable+ ID=0 ArbSelect=Fixed TC/VC=ff
			Status:	NegoPending- InProgress-
	Capabilities: [180 v1] Power Budgeting <?>
	Capabilities: [190 v1] Alternative Routing-ID Interpretation (ARI)
		ARICap:	MFVC- ACS-, Next Function: 0
		ARICtl:	MFVC- ACS-, Function Group: 0
	Capabilities: [270 v1] Device Serial Number 55-cd-2e-41-52-f7-92-fa
	Capabilities: [2a0 v1] #19
	Capabilities: [2d0 v1] Latency Tolerance Reporting
		Max snoop latency: 0ns
		Max no snoop latency: 0ns
	Capabilities: [320 v1] #25
	Capabilities: [330 v1] #26
	Capabilities: [360 v1] #27
	Capabilities: [450 v1] #1b
	Capabilities: [460 v1] #23
	Capabilities: [700 v1] L1 PM Substates
		L1SubCap: PCI-PM_L1.2+ PCI-PM_L1.1+ ASPM_L1.2+ ASPM_L1.1+ L1_PM_Substates+
			  PortCommonModeRestoreTime=1us PortTPowerOnTime=3100us

Note You need to log in before you can comment on or make changes to this bug.