Bug 215494 - [radeon, rv370] Running piglit shaders@glsl-vs-raytrace-bug26691 test causes hard lockup & reboot
Summary: [radeon, rv370] Running piglit shaders@glsl-vs-raytrace-bug26691 test causes ...
Status: RESOLVED INVALID
Alias: None
Product: Drivers
Classification: Unclassified
Component: Video(DRI - non Intel) (show other bugs)
Hardware: All Linux
: P1 normal
Assignee: drivers_video-dri
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2022-01-14 19:33 UTC by Erhard F.
Modified: 2022-01-14 20:52 UTC (History)
1 user (show)

See Also:
Kernel Version: 5.16.0
Subsystem:
Regression: No
Bisected commit-id:


Attachments
kernel dmesg (kernel 5.16.0, Ryzen 9 5950X) (79.67 KB, text/plain)
2022-01-14 19:33 UTC, Erhard F.
Details
kernel .config (kernel 5.16.0, Ryzen 9 5950X) (114.89 KB, text/plain)
2022-01-14 19:34 UTC, Erhard F.
Details
piglit results & summary up to glsl-vs-raytrace-bug26691 (206.73 KB, application/x-bzip)
2022-01-14 19:35 UTC, Erhard F.
Details

Description Erhard F. 2022-01-14 19:33:26 UTC
Created attachment 300268 [details]
kernel dmesg (kernel 5.16.0, Ryzen 9 5950X)

Running the piglit festsuite (git-11ee10ba04) for https://gitlab.freedesktop.org/mesa/mesa/-/issues/3152 via './piglit run -1 quick -l verbose -s --dmesg' on a Radeon X600 causes the X600 to hard lockup & reboot. On my system this happens with kernel 5.15.11, 5.16.0, mesa 21.3.4 and mesa 22 (git-8b3d947267).

I had a closer look and found out that shaders@glsl-vs-raytrace-bug26691 causes the lockup. Running "./piglit/bin/glsl-vs-raytrace-bug26691 -auto -fbo" as a single test works sometimes the 1st time, but re-running it a 2nd or a 3rd time always causes the lockup:

[...]
[  518.794824] radeon: wait for empty RBBM fifo failed! Bad things might happen.
[  519.110152] Failed to wait GUI idle while programming pipes. Bad things might happen.
[  519.111220] radeon 0000:07:00.0: Saved 59 dwords of commands on ring 0.
[  519.111247] radeon 0000:07:00.0: (r300_asic_reset:426) RBBM_STATUS=0x8411C100
[  519.616733] radeon 0000:07:00.0: (r300_asic_reset:445) RBBM_STATUS=0x8401C100
[  520.118160] radeon 0000:07:00.0: (r300_asic_reset:457) RBBM_STATUS=0x8400C100
[  520.118231] radeon 0000:07:00.0: failed to reset GPU
[  520.319694] pcieport 0000:00:03.1: AER: Corrected error received: 0000:00:03.1
[  520.319723] pcieport 0000:00:03.1: PCIe Bus Error: severity=Corrected, type=Transaction Layer, (Receiver ID)
[  520.319729] pcieport 0000:00:03.1:   device [1022:1483] error status/mask=00002000/00004000
[  520.319735] pcieport 0000:00:03.1:    [13] NonFatalErr           
[  520.722345] pcieport 0000:00:03.1: AER: Corrected error received: 0000:00:03.1


For regular desktop usage the X600 seems ok so far. Some data about the system:

 $ inxi -b
System:
  Host: prototype Kernel: 5.16.0-Zen3 x86_64 bits: 64 Desktop: Openbox 3.6.1 
  Distro: Gentoo Base System release 2.7 
Machine:
  Type: Desktop Mobo: ASRock model: B450M Steel Legend 
  serial: <superuser/root required> UEFI: American Megatrends v: P4.20 
  date: 08/03/2021 
CPU:
  Info: 16-Core AMD Ryzen 9 5950X [MT MCP] speed: 3685 MHz 
  min/max: 2200/3400 MHz 
Graphics:
  Device-1: AMD RV370 [Radeon X600/X600 SE] driver: radeon v: kernel 
  Display: x11 server: X.Org 1.20.14 driver: ati,radeon 
  unloaded: fbdev,modesetting resolution: 1920x1080~60Hz 
  OpenGL: renderer: ATI RV370 v: 2.1 Mesa 22.0.0-devel (git-8b3d947267) 
Network:
  Device-1: Realtek RTL8111/8168/8411 PCI Express Gigabit Ethernet 
  driver: r8169 

 # lspci 
00:00.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Starship/Matisse Root Complex
00:00.2 IOMMU: Advanced Micro Devices, Inc. [AMD] Starship/Matisse IOMMU
00:01.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Starship/Matisse PCIe Dummy Host Bridge
00:01.1 PCI bridge: Advanced Micro Devices, Inc. [AMD] Starship/Matisse GPP Bridge
00:01.3 PCI bridge: Advanced Micro Devices, Inc. [AMD] Starship/Matisse GPP Bridge
00:02.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Starship/Matisse PCIe Dummy Host Bridge
00:03.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Starship/Matisse PCIe Dummy Host Bridge
00:03.1 PCI bridge: Advanced Micro Devices, Inc. [AMD] Starship/Matisse GPP Bridge
00:04.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Starship/Matisse PCIe Dummy Host Bridge
00:05.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Starship/Matisse PCIe Dummy Host Bridge
00:07.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Starship/Matisse PCIe Dummy Host Bridge
00:07.1 PCI bridge: Advanced Micro Devices, Inc. [AMD] Starship/Matisse Internal PCIe GPP Bridge 0 to bus[E:B]
00:08.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Starship/Matisse PCIe Dummy Host Bridge
00:08.1 PCI bridge: Advanced Micro Devices, Inc. [AMD] Starship/Matisse Internal PCIe GPP Bridge 0 to bus[E:B]
00:14.0 SMBus: Advanced Micro Devices, Inc. [AMD] FCH SMBus Controller (rev 61)
00:14.3 ISA bridge: Advanced Micro Devices, Inc. [AMD] FCH LPC Bridge (rev 51)
00:18.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Matisse/Vermeer Data Fabric: Device 18h; Function 0
00:18.1 Host bridge: Advanced Micro Devices, Inc. [AMD] Matisse/Vermeer Data Fabric: Device 18h; Function 1
00:18.2 Host bridge: Advanced Micro Devices, Inc. [AMD] Matisse/Vermeer Data Fabric: Device 18h; Function 2
00:18.3 Host bridge: Advanced Micro Devices, Inc. [AMD] Matisse/Vermeer Data Fabric: Device 18h; Function 3
00:18.4 Host bridge: Advanced Micro Devices, Inc. [AMD] Matisse/Vermeer Data Fabric: Device 18h; Function 4
00:18.5 Host bridge: Advanced Micro Devices, Inc. [AMD] Matisse/Vermeer Data Fabric: Device 18h; Function 5
00:18.6 Host bridge: Advanced Micro Devices, Inc. [AMD] Matisse/Vermeer Data Fabric: Device 18h; Function 6
00:18.7 Host bridge: Advanced Micro Devices, Inc. [AMD] Matisse/Vermeer Data Fabric: Device 18h; Function 7
01:00.0 Non-Volatile memory controller: Sandisk Corp WD Blue SN550 NVMe SSD (rev 01)
02:00.0 USB controller: Advanced Micro Devices, Inc. [AMD] 400 Series Chipset USB 3.1 XHCI Controller (rev 01)
02:00.1 SATA controller: Advanced Micro Devices, Inc. [AMD] 400 Series Chipset SATA Controller (rev 01)
02:00.2 PCI bridge: Advanced Micro Devices, Inc. [AMD] 400 Series Chipset PCIe Bridge (rev 01)
03:00.0 PCI bridge: Advanced Micro Devices, Inc. [AMD] 400 Series Chipset PCIe Port (rev 01)
03:01.0 PCI bridge: Advanced Micro Devices, Inc. [AMD] 400 Series Chipset PCIe Port (rev 01)
03:04.0 PCI bridge: Advanced Micro Devices, Inc. [AMD] 400 Series Chipset PCIe Port (rev 01)
05:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller (rev 15)
07:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] RV370 [Radeon X600/X600 SE]
07:00.1 Display controller: Advanced Micro Devices, Inc. [AMD/ATI] RV380 [Radeon X300/X550/X1050 Series] (Secondary)
08:00.0 Non-Essential Instrumentation [1300]: Advanced Micro Devices, Inc. [AMD] Starship/Matisse PCIe Dummy Function
09:00.0 Non-Essential Instrumentation [1300]: Advanced Micro Devices, Inc. [AMD] Starship/Matisse Reserved SPP
09:00.1 Encryption controller: Advanced Micro Devices, Inc. [AMD] Starship/Matisse Cryptographic Coprocessor PSPCPP
09:00.3 USB controller: Advanced Micro Devices, Inc. [AMD] Matisse USB 3.0 Host Controller

 # lspci -s 07:00.0 -vv
07:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] RV370 [Radeon X600/X600 SE] (prog-if 00 [VGA controller])
	Subsystem: Advanced Micro Devices, Inc. [AMD/ATI] RV370 [Radeon X600/X600 SE]
	Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
	Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
	Latency: 0, Cache Line Size: 64 bytes
	Interrupt: pin A routed to IRQ 59
	IOMMU group: 2
	Region 0: Memory at e8000000 (64-bit, prefetchable) [size=128M]
	Region 2: Memory at fce30000 (64-bit, non-prefetchable) [size=64K]
	Region 4: I/O ports at e000 [size=256]
	Expansion ROM at 000c0000 [disabled] [size=128K]
	Capabilities: [50] Power Management version 2
		Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
		Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME-
	Capabilities: [58] Express (v1) Endpoint, MSI 00
		DevCap:	MaxPayload 128 bytes, PhantFunc 0, Latency L0s <256ns, L1 <4us
			ExtTag+ AttnBtn- AttnInd- PwrInd- RBE- FLReset- SlotPowerLimit 75.000W
		DevCtl:	CorrErr- NonFatalErr- FatalErr- UnsupReq-
			RlxdOrd+ ExtTag+ PhantFunc- AuxPwr- NoSnoop+
			MaxPayload 128 bytes, MaxReadReq 128 bytes
		DevSta:	CorrErr- NonFatalErr- FatalErr- UnsupReq- AuxPwr- TransPend-
		LnkCap:	Port #0, Speed 2.5GT/s, Width x16, ASPM L0s L1, Exit Latency L0s <256ns, L1 <2us
			ClockPM- Surprise- LLActRep- BwNot- ASPMOptComp-
		LnkCtl:	ASPM Disabled; RCB 64 bytes, Disabled- CommClk+
			ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
		LnkSta:	Speed 2.5GT/s (ok), Width x16 (ok)
			TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
	Capabilities: [80] MSI: Enable+ Count=1/1 Maskable- 64bit+
		Address: 00000000fee01000  Data: 0022
	Capabilities: [100 v1] Advanced Error Reporting
		UESta:	DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
		UEMsk:	DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
		UESvrt:	DLP+ SDES- TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
		CESta:	RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr-
		CEMsk:	RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr-
		AERCap:	First Error Pointer: 00, ECRCGenCap- ECRCGenEn- ECRCChkCap- ECRCChkEn-
			MultHdrRecCap- MultHdrRecEn- TLPPfxPres- HdrLogCap-
		HeaderLog: 04000001 0000200f 07070000 b8cdf5fd
	Kernel driver in use: radeon
	Kernel modules: radeon
Comment 1 Erhard F. 2022-01-14 19:34:03 UTC
Created attachment 300269 [details]
kernel .config (kernel 5.16.0, Ryzen 9 5950X)
Comment 2 Erhard F. 2022-01-14 19:35:25 UTC
Created attachment 300270 [details]
piglit results & summary up to glsl-vs-raytrace-bug26691
Comment 3 Alex Deucher 2022-01-14 20:29:46 UTC
This is more likely a mesa bug.  I'd suggest filing a bug there:
https://gitlab.freedesktop.org/groups/mesa/-/issues
and closing this one.
Comment 4 Erhard F. 2022-01-14 20:52:05 UTC
Done. It's https://gitlab.freedesktop.org/mesa/mesa/-/issues/5870 now.

Note You need to log in before you can comment on or make changes to this bug.