Bug 216455

Summary: PCI AER error caused by LTR enablement on amdgpu with LTR disabled on video card PCIe bridge
Product: Drivers Reporter: Gustaw Smolarczyk (wielkiegie)
Component: Video(DRI - non Intel)Assignee: drivers_video-dri
Status: NEW ---    
Severity: normal CC: alexdeucher
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: 5.19.6 Subsystem:
Regression: No Bisected commit-id:
Attachments: dmesg with pci=earlydump (v5.19.5 + ltr debug patch)
lspci -vvnn on vega10 system
LTR Fix

Description Gustaw Smolarczyk 2022-09-06 16:43:06 UTC
Created attachment 301753 [details]
dmesg with pci=earlydump (v5.19.5 + ltr debug patch)

Split off bug 216373 as this is a different issue than what it was initially about.

To quote Bjorn:

> The issue you're seeing is an Unsupported Request error logged by a Switch
> Downstream Port when it received an LTR message sent by 44:00.0 when the
> Switch has LTR disabled:
>
>  pcieport 0000:43:00.0:   device [1022:1471] error
>  status/mask=00100000/00000000
>  pcieport 0000:43:00.0:    [20] UnsupReq               (First)
>  pcieport 0000:43:00.0: AER:   TLP Header: 34000000 44000010 00000000
>  84288428

The errors themselves can be masked by providing pci=noaer to the kernel.

The amdgpu.aspm=0 kernel option makes this issue disappear.
Comment 1 Gustaw Smolarczyk 2022-09-06 16:43:33 UTC
Created attachment 301754 [details]
lspci -vvnn on vega10 system
Comment 2 Gustaw Smolarczyk 2022-09-06 16:44:04 UTC
Hardware:
CPU: Ryzen Threadripper 1950X
MB: Asrock X399 Taichi
GPU: Radeon Vega 64 [1002:687f]
Comment 3 Lijo Lazar 2022-09-07 12:45:31 UTC
Created attachment 301760 [details]
LTR Fix

Does the attached patch help?
Comment 4 Gustaw Smolarczyk 2022-09-07 16:09:56 UTC
Yes, it does. LTR+ is no longer being enabled with this patch.
Comment 5 Alex Deucher 2022-09-07 16:13:29 UTC
(In reply to Lijo Lazar from comment #3)
> Created attachment 301760 [details]
> LTR Fix
> 
> Does the attached patch help?

The ASPM code in vi.c and cik.c and si.c should be similarly protected.
Comment 6 Lijo Lazar 2022-09-08 03:22:13 UTC
(In reply to Alex Deucher from comment #5)
> (In reply to Lijo Lazar from comment #3)
> > Created attachment 301760 [details]
> > LTR Fix
> > 
> > Does the attached patch help?
> 
> The ASPM code in vi.c and cik.c and si.c should be similarly protected.

Just checked. Actually, LTR settings are not changed in them.
Comment 7 Alex Deucher 2022-09-08 03:26:29 UTC
(In reply to Lijo Lazar from comment #6)
> (In reply to Alex Deucher from comment #5)
> > (In reply to Lijo Lazar from comment #3)
> > > Created attachment 301760 [details]
> > > LTR Fix
> > > 
> > > Does the attached patch help?
> > 
> > The ASPM code in vi.c and cik.c and si.c should be similarly protected.
> 
> Just checked. Actually, LTR settings are not changed in them.

Yes, but they do setup ASPM which should be protected with CONFIG_PCIEASPM?
Comment 8 Alex Deucher 2022-09-13 15:00:11 UTC
Patch:
https://patchwork.freedesktop.org/patch/501912/