Bug 46231 - Radeon NI: evergreen_resume fails after GPU lockup
Summary: Radeon NI: evergreen_resume fails after GPU lockup
Status: RESOLVED OBSOLETE
Alias: None
Product: Drivers
Classification: Unclassified
Component: Video(DRI - non Intel) (show other bugs)
Hardware: All Linux
: P1 normal
Assignee: drivers_video-dri
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2012-08-20 00:55 UTC by Benjamin Lee
Modified: 2013-11-19 23:30 UTC (History)
3 users (show)

See Also:
Kernel Version: 3.6-rc2
Subsystem:
Regression: No
Bisected commit-id:


Attachments

Description Benjamin Lee 2012-08-20 00:55:59 UTC
I've always had GPU lockups on my Radeon HD 6750M (MacBookPro8,2) but have not been able to get good kernel messages until upgrading to 3.6-rc2:

[88296.175333] radeon 0000:01:00.0: GPU lockup CP stall for more than 10000msec 
[88296.175348] radeon 0000:01:00.0: GPU lockup (waiting for 0x00000000004f368c last fence id 0x00000000004f3681)
[88296.176433] radeon 0000:01:00.0: Saved 439 dwords of commands on ring 0. 
[88296.176439] radeon 0000:01:00.0: GPU softreset 
[88296.176445] radeon 0000:01:00.0:   GRBM_STATUS=0xA00038A0
[88296.176450] radeon 0000:01:00.0:   GRBM_STATUS_SE0=0x00000007
[88296.176455] radeon 0000:01:00.0:   GRBM_STATUS_SE1=0x00000007
[88296.176460] radeon 0000:01:00.0:   SRBM_STATUS=0x200000C0
[88296.176466] radeon 0000:01:00.0:   R_008674_CP_STALLED_STAT1 = 0x01000000
[88296.176471] radeon 0000:01:00.0:   R_008678_CP_STALLED_STAT2 = 0x00011002
[88296.176476] radeon 0000:01:00.0:   R_00867C_CP_BUSY_STAT     = 0x00028506
[88296.176482] radeon 0000:01:00.0:   R_008680_CP_STAT          = 0x80838647
[88296.176493] radeon 0000:01:00.0:   GRBM_SOFT_RESET=0x00007F6B
[88296.176600] radeon 0000:01:00.0:   GRBM_STATUS=0x800038A0
[88296.176605] radeon 0000:01:00.0:   GRBM_STATUS_SE0=0x00000007
[88296.176610] radeon 0000:01:00.0:   GRBM_STATUS_SE1=0x00000007
[88296.176615] radeon 0000:01:00.0:   SRBM_STATUS=0x200000C0
[88296.176620] radeon 0000:01:00.0:   R_008674_CP_STALLED_STAT1 = 0x00000000
[88296.176625] radeon 0000:01:00.0:   R_008678_CP_STALLED_STAT2 = 0x00000000
[88296.176630] radeon 0000:01:00.0:   R_00867C_CP_BUSY_STAT     = 0x00000000
[88296.176636] radeon 0000:01:00.0:   R_008680_CP_STAT          = 0x00000000
[88296.177642] radeon 0000:01:00.0: GPU reset succeeded, trying to resume
[88296.177647] radeon 0000:01:00.0: GPU softreset 
[88296.177652] radeon 0000:01:00.0:   GRBM_STATUS=0x800038A0
[88296.177657] radeon 0000:01:00.0:   GRBM_STATUS_SE0=0x00000007
[88296.177662] radeon 0000:01:00.0:   GRBM_STATUS_SE1=0x00000007
[88296.177667] radeon 0000:01:00.0:   SRBM_STATUS=0x200000C0
[88296.177672] radeon 0000:01:00.0:   R_008674_CP_STALLED_STAT1 = 0x00000000
[88296.177677] radeon 0000:01:00.0:   R_008678_CP_STALLED_STAT2 = 0x00000000
[88296.177682] radeon 0000:01:00.0:   R_00867C_CP_BUSY_STAT     = 0x00000000
[88296.177687] radeon 0000:01:00.0:   R_008680_CP_STAT          = 0x00000000
[88296.177698] radeon 0000:01:00.0:   GRBM_SOFT_RESET=0x00007F6B
[88296.177805] radeon 0000:01:00.0:   GRBM_STATUS=0x800038A0
[88296.177810] radeon 0000:01:00.0:   GRBM_STATUS_SE0=0x00000007
[88296.177814] radeon 0000:01:00.0:   GRBM_STATUS_SE1=0x00000007
[88296.177819] radeon 0000:01:00.0:   SRBM_STATUS=0x200000C0
[88296.177824] radeon 0000:01:00.0:   R_008674_CP_STALLED_STAT1 = 0x00000000
[88296.177829] radeon 0000:01:00.0:   R_008678_CP_STALLED_STAT2 = 0x00000000
[88296.177835] radeon 0000:01:00.0:   R_00867C_CP_BUSY_STAT     = 0x00000000
[88296.177840] radeon 0000:01:00.0:   R_008680_CP_STAT          = 0x00000000
[88296.183915] [drm] probing gen 2 caps for device 8086:101 = 2/0
[88296.183918] [drm] enabling PCIE gen 2 link speeds, disable with radeon.pcie_gen2=0
[88296.187771] [drm] PCIE GART of 512M enabled (table at 0x0000000000040000).
[88296.187948] radeon 0000:01:00.0: WB enabled
[88296.187957] radeon 0000:01:00.0: fence driver on ring 0 use gpu addr 0x0000000040000c00 and cpu addr 0xffff880265260c00
[88296.417147] [drm:r600_ring_test] *ERROR* radeon: ring 0 test failed (scratch(0x8504)=0xCAFEDEAD)
[88296.417150] [drm:evergreen_resume] *ERROR* evergreen startup failed on resume

blee@supra ~ $ lspci -nn | grep VGA
01:00.0 VGA compatible controller [0300]: Advanced Micro Devices [AMD] nee ATI Whistler [AMD Radeon HD 6600M Series] [1002:6741]
Comment 1 MikeQ 2012-09-19 08:49:00 UTC
Same problem on a Radeon HD6320M running 3.5.4 kernel. Didn't have this problem with 3.4 kernels. Happens just after boot when starting X, around 40-50% of the time.

[   63.387751] [drm] PCIE GART of 512M enabled (table at 0x0000000000040000).
[   63.387861] radeon 0000:00:01.0: WB enabled
[   63.387867] radeon 0000:00:01.0: fence driver on ring 0 use gpu addr 0x0000000018000c00 and cpu addr 0xffff8800d9ad6c00
[   63.566880] [drm:r600_ring_test] *ERROR* radeon: ring 0 test failed (scratch(0x8500)=0xCAFEDEAD)
[   63.566886] [drm:evergreen_resume] *ERROR* evergreen startup failed on resume
Comment 3 Benjamin Lee 2012-09-21 22:46:37 UTC
I got a different crash with 3.6-rc6 (includes the patch mentioned in comment #2).  The system was not functional after these messages:

[143239.080420] radeon 0000:01:00.0: GPU lockup CP stall for more than 10000msec
[143239.080431] radeon 0000:01:00.0: GPU lockup (waiting for 0x0000000000a80f65 last fence id 0x0000000000a80f5e)
[143239.579983] radeon 0000:01:00.0: GPU lockup CP stall for more than 10500msec
[143239.579994] radeon 0000:01:00.0: GPU lockup (waiting for 0x0000000000a80f5f)
[143239.580002] radeon 0000:01:00.0: failed to get a new IB (-35)
[143239.580007] [drm:radeon_cs_ib_chunk] *ERROR* Failed to get ib !
[143239.581196] radeon 0000:01:00.0: Saved 887 dwords of commands on ring 0.
[143239.581219] radeon 0000:01:00.0: GPU softreset 
[143239.581226] radeon 0000:01:00.0:   GRBM_STATUS=0xA0003828
[143239.581232] radeon 0000:01:00.0:   GRBM_STATUS_SE0=0x00000007
[143239.581238] radeon 0000:01:00.0:   GRBM_STATUS_SE1=0x00000007
[143239.581244] radeon 0000:01:00.0:   SRBM_STATUS=0x200000C0
[143239.581249] radeon 0000:01:00.0:   R_008674_CP_STALLED_STAT1 = 0x00000000
[143239.581256] radeon 0000:01:00.0:   R_008678_CP_STALLED_STAT2 = 0x00010002
[143239.581261] radeon 0000:01:00.0:   R_00867C_CP_BUSY_STAT     = 0x00020186
[143239.581267] radeon 0000:01:00.0:   R_008680_CP_STAT          = 0x80038647
[143239.581280] radeon 0000:01:00.0:   GRBM_SOFT_RESET=0x00007F6B
[143239.581388] radeon 0000:01:00.0:   GRBM_STATUS=0x00003828
[143239.581393] radeon 0000:01:00.0:   GRBM_STATUS_SE0=0x00000007
[143239.581399] radeon 0000:01:00.0:   GRBM_STATUS_SE1=0x00000007
[143239.581404] radeon 0000:01:00.0:   SRBM_STATUS=0x200000C0
[143239.581410] radeon 0000:01:00.0:   R_008674_CP_STALLED_STAT1 = 0x00000000
[143239.581416] radeon 0000:01:00.0:   R_008678_CP_STALLED_STAT2 = 0x00000000
[143239.581422] radeon 0000:01:00.0:   R_00867C_CP_BUSY_STAT     = 0x00000000
[143239.581428] radeon 0000:01:00.0:   R_008680_CP_STAT          = 0x00000000
[143239.582434] radeon 0000:01:00.0: GPU reset succeeded, trying to resume
[143239.587527] [drm] probing gen 2 caps for device 8086:101 = 2/0
[143239.587530] [drm] enabling PCIE gen 2 link speeds, disable with radeon.pcie_gen2=0
[143239.590542] [drm] PCIE GART of 512M enabled (table at 0x0000000000040000).
[143239.590743] radeon 0000:01:00.0: WB enabled
[143239.590779] radeon 0000:01:00.0: fence driver on ring 0 use gpu addr 0x0000000040000c00 and cpu addr 0xffff880265312c00
[143239.607117] [drm] ring test on 0 succeeded in 3 usecs
[143239.913692] [drm] ib test on ring 0 succeeded in 0 usecs
[143239.914882] radeon 0000:01:00.0: GPU reset succeeded, trying to resume
[143239.920049] [drm] probing gen 2 caps for device 8086:101 = 2/0
[143239.920050] [drm] enabling PCIE gen 2 link speeds, disable with radeon.pcie_gen2=0
[143239.923057] [drm] PCIE GART of 512M enabled (table at 0x0000000000040000).
[143239.923222] radeon 0000:01:00.0: WB enabled
[143239.923231] radeon 0000:01:00.0: fence driver on ring 0 use gpu addr 0x0000000040000c00 and cpu addr 0xffff880265312c00
[143239.939918] [drm] ring test on 0 succeeded in 2 usecs
[143240.178647] [drm] ib test on ring 0 succeeded in 0 usecs
[143251.248812] radeon 0000:01:00.0: GPU lockup CP stall for more than 10000msec
[143251.248827] radeon 0000:01:00.0: GPU lockup (waiting for 0x0000000000a80f90 last fence id 0x0000000000a80f7c)
[143251.748424] radeon 0000:01:00.0: GPU lockup CP stall for more than 10500msec
[143251.748434] radeon 0000:01:00.0: GPU lockup (waiting for 0x0000000000a80f7d)
[143251.748443] radeon 0000:01:00.0: failed to get a new IB (-35)
[143251.748447] [drm:radeon_cs_ib_chunk] *ERROR* Failed to get ib !
[143251.749646] radeon 0000:01:00.0: Saved 1495 dwords of commands on ring 0.
[143251.749656] radeon 0000:01:00.0: GPU softreset 
[143251.749662] radeon 0000:01:00.0:   GRBM_STATUS=0xA0003828
[143251.749668] radeon 0000:01:00.0:   GRBM_STATUS_SE0=0x00000007
[143251.749674] radeon 0000:01:00.0:   GRBM_STATUS_SE1=0x00000007
[143251.749680] radeon 0000:01:00.0:   SRBM_STATUS=0x200000C0
[143251.749686] radeon 0000:01:00.0:   R_008674_CP_STALLED_STAT1 = 0x00000000
[143251.749692] radeon 0000:01:00.0:   R_008678_CP_STALLED_STAT2 = 0x00004100
[143251.749698] radeon 0000:01:00.0:   R_00867C_CP_BUSY_STAT     = 0x00020182
[143251.749704] radeon 0000:01:00.0:   R_008680_CP_STAT          = 0x80028243
[143251.749716] radeon 0000:01:00.0:   GRBM_SOFT_RESET=0x00007F6B
[143251.749824] radeon 0000:01:00.0:   GRBM_STATUS=0x00003828
[143251.749830] radeon 0000:01:00.0:   GRBM_STATUS_SE0=0x00000007
[143251.749835] radeon 0000:01:00.0:   GRBM_STATUS_SE1=0x00000007
[143251.749841] radeon 0000:01:00.0:   SRBM_STATUS=0x200000C0
[143251.749847] radeon 0000:01:00.0:   R_008674_CP_STALLED_STAT1 = 0x00000000
[143251.749853] radeon 0000:01:00.0:   R_008678_CP_STALLED_STAT2 = 0x00000000
[143251.749858] radeon 0000:01:00.0:   R_00867C_CP_BUSY_STAT     = 0x00000000
[143251.749864] radeon 0000:01:00.0:   R_008680_CP_STAT          = 0x00000000
[143251.750873] radeon 0000:01:00.0: GPU reset succeeded, trying to resume
[143251.756190] [drm] probing gen 2 caps for device 8086:101 = 2/0
[143251.756194] [drm] enabling PCIE gen 2 link speeds, disable with radeon.pcie_gen2=0
[143251.759244] [drm] PCIE GART of 512M enabled (table at 0x0000000000040000).
[143251.759425] radeon 0000:01:00.0: WB enabled
[143251.759432] radeon 0000:01:00.0: fence driver on ring 0 use gpu addr 0x0000000040000c00 and cpu addr 0xffff880265312c00
[143251.776151] [drm] ring test on 0 succeeded in 2 usecs
[143262.538173] radeon 0000:01:00.0: GPU lockup CP stall for more than 10000msec
[143262.538185] radeon 0000:01:00.0: GPU lockup (waiting for 0x0000000000a80fac last fence id 0x0000000000a80f92)
[143262.538191] [drm:r600_ib_test] *ERROR* radeon: fence wait failed (-35).
[143262.538196] [drm:radeon_ib_ring_tests] *ERROR* radeon: failed testing IB on GFX ring (-35).
[143262.538201] radeon 0000:01:00.0: ib ring test failed (-35).
[143262.539292] radeon 0000:01:00.0: GPU softreset 
[143262.539299] radeon 0000:01:00.0:   GRBM_STATUS=0xA0003828
[143262.539305] radeon 0000:01:00.0:   GRBM_STATUS_SE0=0x00000007
[143262.539310] radeon 0000:01:00.0:   GRBM_STATUS_SE1=0x00000007
[143262.539316] radeon 0000:01:00.0:   SRBM_STATUS=0x200000C0
[143262.539322] radeon 0000:01:00.0:   R_008674_CP_STALLED_STAT1 = 0x00000000
[143262.539328] radeon 0000:01:00.0:   R_008678_CP_STALLED_STAT2 = 0x00010002
[143262.539334] radeon 0000:01:00.0:   R_00867C_CP_BUSY_STAT     = 0x00020186
[143262.539340] radeon 0000:01:00.0:   R_008680_CP_STAT          = 0x80038647
[143262.539352] radeon 0000:01:00.0:   GRBM_SOFT_RESET=0x00007F6B
[143262.539460] radeon 0000:01:00.0:   GRBM_STATUS=0x00003828
[143262.539466] radeon 0000:01:00.0:   GRBM_STATUS_SE0=0x00000007
[143262.539472] radeon 0000:01:00.0:   GRBM_STATUS_SE1=0x00000007
[143262.539477] radeon 0000:01:00.0:   SRBM_STATUS=0x200000C0
[143262.539483] radeon 0000:01:00.0:   R_008674_CP_STALLED_STAT1 = 0x00000000
[143262.539488] radeon 0000:01:00.0:   R_008678_CP_STALLED_STAT2 = 0x00000000
[143262.539494] radeon 0000:01:00.0:   R_00867C_CP_BUSY_STAT     = 0x00000000
[143262.539500] radeon 0000:01:00.0:   R_008680_CP_STAT          = 0x00000000
[143262.540509] radeon 0000:01:00.0: GPU reset succeeded, trying to resume
[143262.545844] [drm] probing gen 2 caps for device 8086:101 = 2/0
[143262.545848] [drm] enabling PCIE gen 2 link speeds, disable with radeon.pcie_gen2=0
[143262.548898] [drm] PCIE GART of 512M enabled (table at 0x0000000000040000).
[143262.549080] radeon 0000:01:00.0: WB enabled
[143262.549090] radeon 0000:01:00.0: fence driver on ring 0 use gpu addr 0x0000000040000c00 and cpu addr 0xffff880265312c00
[143262.565807] [drm] ring test on 0 succeeded in 2 usecs
[143262.794823] [drm] ib test on ring 0 succeeded in 0 usecs
[143262.795952] radeon 0000:01:00.0: GPU reset succeeded, trying to resume
[143262.801217] [drm] probing gen 2 caps for device 8086:101 = 2/0
[143262.801222] [drm] enabling PCIE gen 2 link speeds, disable with radeon.pcie_gen2=0
[143262.804151] [drm] PCIE GART of 512M enabled (table at 0x0000000000040000).
[143262.804248] radeon 0000:01:00.0: WB enabled
[143262.804258] radeon 0000:01:00.0: fence driver on ring 0 use gpu addr 0x0000000040000c00 and cpu addr 0xffff880265312c00
[143262.820921] [drm] ring test on 0 succeeded in 2 usecs
[143263.039787] [drm] ib test on ring 0 succeeded in 0 usecs

Note You need to log in before you can comment on or make changes to this bug.