Bug 216455 - PCI AER error caused by LTR enablement on amdgpu with LTR disabled on video card PCIe bridge
Summary: PCI AER error caused by LTR enablement on amdgpu with LTR disabled on video c...
Status: NEW
Alias: None
Product: Drivers
Classification: Unclassified
Component: Video(DRI - non Intel) (show other bugs)
Hardware: All Linux
: P1 normal
Assignee: drivers_video-dri
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2022-09-06 16:43 UTC by Gustaw Smolarczyk
Modified: 2022-09-13 15:00 UTC (History)
1 user (show)

See Also:
Kernel Version: 5.19.6
Subsystem:
Regression: No
Bisected commit-id:


Attachments
dmesg with pci=earlydump (v5.19.5 + ltr debug patch) (167.39 KB, text/plain)
2022-09-06 16:43 UTC, Gustaw Smolarczyk
Details
lspci -vvnn on vega10 system (141.39 KB, text/plain)
2022-09-06 16:43 UTC, Gustaw Smolarczyk
Details
LTR Fix (4.37 KB, patch)
2022-09-07 12:45 UTC, Lijo Lazar
Details | Diff

Description Gustaw Smolarczyk 2022-09-06 16:43:06 UTC
Created attachment 301753 [details]
dmesg with pci=earlydump (v5.19.5 + ltr debug patch)

Split off bug 216373 as this is a different issue than what it was initially about.

To quote Bjorn:

> The issue you're seeing is an Unsupported Request error logged by a Switch
> Downstream Port when it received an LTR message sent by 44:00.0 when the
> Switch has LTR disabled:
>
>  pcieport 0000:43:00.0:   device [1022:1471] error
>  status/mask=00100000/00000000
>  pcieport 0000:43:00.0:    [20] UnsupReq               (First)
>  pcieport 0000:43:00.0: AER:   TLP Header: 34000000 44000010 00000000
>  84288428

The errors themselves can be masked by providing pci=noaer to the kernel.

The amdgpu.aspm=0 kernel option makes this issue disappear.
Comment 1 Gustaw Smolarczyk 2022-09-06 16:43:33 UTC
Created attachment 301754 [details]
lspci -vvnn on vega10 system
Comment 2 Gustaw Smolarczyk 2022-09-06 16:44:04 UTC
Hardware:
CPU: Ryzen Threadripper 1950X
MB: Asrock X399 Taichi
GPU: Radeon Vega 64 [1002:687f]
Comment 3 Lijo Lazar 2022-09-07 12:45:31 UTC
Created attachment 301760 [details]
LTR Fix

Does the attached patch help?
Comment 4 Gustaw Smolarczyk 2022-09-07 16:09:56 UTC
Yes, it does. LTR+ is no longer being enabled with this patch.
Comment 5 Alex Deucher 2022-09-07 16:13:29 UTC
(In reply to Lijo Lazar from comment #3)
> Created attachment 301760 [details]
> LTR Fix
> 
> Does the attached patch help?

The ASPM code in vi.c and cik.c and si.c should be similarly protected.
Comment 6 Lijo Lazar 2022-09-08 03:22:13 UTC
(In reply to Alex Deucher from comment #5)
> (In reply to Lijo Lazar from comment #3)
> > Created attachment 301760 [details]
> > LTR Fix
> > 
> > Does the attached patch help?
> 
> The ASPM code in vi.c and cik.c and si.c should be similarly protected.

Just checked. Actually, LTR settings are not changed in them.
Comment 7 Alex Deucher 2022-09-08 03:26:29 UTC
(In reply to Lijo Lazar from comment #6)
> (In reply to Alex Deucher from comment #5)
> > (In reply to Lijo Lazar from comment #3)
> > > Created attachment 301760 [details]
> > > LTR Fix
> > > 
> > > Does the attached patch help?
> > 
> > The ASPM code in vi.c and cik.c and si.c should be similarly protected.
> 
> Just checked. Actually, LTR settings are not changed in them.

Yes, but they do setup ASPM which should be protected with CONFIG_PCIEASPM?
Comment 8 Alex Deucher 2022-09-13 15:00:11 UTC
Patch:
https://patchwork.freedesktop.org/patch/501912/

Note You need to log in before you can comment on or make changes to this bug.