Bug 77001
Summary: | Radeon R9 270X GPU lockup and resume failure after all night inactivity [drm,radeonsi] | ||
---|---|---|---|
Product: | Drivers | Reporter: | Grzegorz Kowal (custos.mentis) |
Component: | Video(DRI - non Intel) | Assignee: | drivers_video-dri |
Status: | RESOLVED UNREPRODUCIBLE | ||
Severity: | normal | CC: | alexdeucher |
Priority: | P1 | ||
Hardware: | x86-64 | ||
OS: | Linux | ||
Kernel Version: | 3.14.4 | Subsystem: | |
Regression: | No | Bisected commit-id: | |
Attachments: |
kernel log with the lockup and following boot messages
Boot log when GPU cannot be initialized. |
Is this easily reproducible? The GPU hang is probably caused by a particular application. Were you running anything in particular when it hung? Maybe a GL screensaver? If so, can you narrow down what app caused this? This type of hang happened for the first time here. I was logged in the KDE with KWin compositing based on OpenGL 3.1. I don't use any screensaver. I switched compositing type to OpenGL 2.0 now, and recompiled the kernel with suspend to RAM and standby turned off. My software stack is: Gentoo default/linux/amd64/13.0/desktop/kde xorg-server-1.15.0 mesa git -> aa78c4586d42856dcc86b143c2e72c05a1422016 libdrm git -> 8fc62ca8ac010659023bb63c4759eb683de4f9af glamor git -> a4fbc7732a1e56de385f50b778aafbdd186e015c xf86-video-ati git -> 851b2cf8714618843725f6d067915375485ade9d I'll try to do some tests to see if the problem can be reproduced with this configuration. This morning at 6:27am it happened again. The computer was idle during the whole night, even during the hang. When I touched the keyboard around 8am, the monitor turned on, but I saw dark flickering screen only. To check if kwin was responsible for the hang, I logged out from the KDE session yesterday, so only KDM was running. This time I could log in through ssh, copy the kernel log and reboot the machine though. The kernel log contained: radeon 0000:01:00.0: GPU lockup CP stall for more than 15500msec radeon 0000:01:00.0: GPU lockup (waiting for 0x000000000026efd3 last fence id 0x000000000026efd2 on ring 0) radeon 0000:01:00.0: failed to get a new IB (-35) radeon 0000:01:00.0: failed to get a new IB (-35) radeon 0000:01:00.0: failed to get a new IB (-35) radeon 0000:01:00.0: sa_manager is not empty, clearing anyway radeon 0000:01:00.0: Saved 2365 dwords of commands on ring 0. radeon 0000:01:00.0: GPU softreset: 0x0000004D radeon 0000:01:00.0: GRBM_STATUS = 0xB3507028 radeon 0000:01:00.0: GRBM_STATUS_SE0 = 0x28000006 radeon 0000:01:00.0: GRBM_STATUS_SE1 = 0x2F000006 radeon 0000:01:00.0: SRBM_STATUS = 0x20000EC0 radeon 0000:01:00.0: SRBM_STATUS2 = 0x00000000 radeon 0000:01:00.0: R_008674_CP_STALLED_STAT1 = 0x00000000 radeon 0000:01:00.0: R_008678_CP_STALLED_STAT2 = 0x40000000 radeon 0000:01:00.0: R_00867C_CP_BUSY_STAT = 0x00408002 radeon 0000:01:00.0: R_008680_CP_STAT = 0x84228647 radeon 0000:01:00.0: R_00D034_DMA_STATUS_REG = 0x44483146 radeon 0000:01:00.0: R_00D834_DMA_STATUS_REG = 0x44C83D57 radeon 0000:01:00.0: VM_CONTEXT1_PROTECTION_FAULT_ADDR 0x00000000 radeon 0000:01:00.0: VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x00000000 radeon 0000:01:00.0: Wait for MC idle timedout ! radeon 0000:01:00.0: GRBM_SOFT_RESET=0x0000DDFF radeon 0000:01:00.0: SRBM_SOFT_RESET=0x00100100 radeon 0000:01:00.0: GRBM_STATUS = 0x00003028 radeon 0000:01:00.0: GRBM_STATUS_SE0 = 0x00000006 radeon 0000:01:00.0: GRBM_STATUS_SE1 = 0x00000006 radeon 0000:01:00.0: SRBM_STATUS = 0x20000EC0 radeon 0000:01:00.0: SRBM_STATUS2 = 0x00000000 radeon 0000:01:00.0: R_008674_CP_STALLED_STAT1 = 0x00000000 radeon 0000:01:00.0: R_008678_CP_STALLED_STAT2 = 0x00000000 radeon 0000:01:00.0: R_00867C_CP_BUSY_STAT = 0x00000000 radeon 0000:01:00.0: R_008680_CP_STAT = 0x00000000 radeon 0000:01:00.0: R_00D034_DMA_STATUS_REG = 0x44C83D57 radeon 0000:01:00.0: R_00D834_DMA_STATUS_REG = 0x44C83D57 radeon 0000:01:00.0: GPU reset succeeded, trying to resume [drm:atom_op_jump] *ERROR* atombios stuck in loop for more than 5secs aborting [drm:atom_execute_table_locked] *ERROR* atombios stuck executing C008 (len 254, WS 0, PS 4) @ 0xC032 [drm:atom_execute_table_locked] *ERROR* atombios stuck executing B67A (len 94, WS 12, PS 8) @ 0xB6C3 [drm] probing gen 2 caps for device 1002:5a16 = 33ed01/0 radeon 0000:01:00.0: Wait for MC idle timedout ! radeon 0000:01:00.0: Wait for MC idle timedout ! [drm] PCIE GART of 1024M enabled (table at 0x0000000000276000). radeon 0000:01:00.0: WB enabled radeon 0000:01:00.0: fence driver on ring 0 use gpu addr 0x0000000080000c00 and cpu addr 0xffff8800ba363c00 radeon 0000:01:00.0: fence driver on ring 1 use gpu addr 0x0000000080000c04 and cpu addr 0xffff8800ba363c04 radeon 0000:01:00.0: fence driver on ring 2 use gpu addr 0x0000000080000c08 and cpu addr 0xffff8800ba363c08 radeon 0000:01:00.0: fence driver on ring 3 use gpu addr 0x0000000080000c0c and cpu addr 0xffff8800ba363c0c radeon 0000:01:00.0: fence driver on ring 4 use gpu addr 0x0000000080000c10 and cpu addr 0xffff8800ba363c10 radeon 0000:01:00.0: fence driver on ring 5 use gpu addr 0x0000000000075a18 and cpu addr 0xffffc90010335a18 [drm:r600_ring_test] *ERROR* radeon: ring 0 test failed (scratch(0x850C)=0xCAFEDEAD) [drm:si_resume] *ERROR* si startup failed on resume [drm:radeon_pm_resume_dpm] *ERROR* radeon: dpm resume failed I've got this again. This time with radeon.dpm=0 and power profile set to "low". Sometimes when I switch the computer on, it seems that system does not initiate the GPU properly and monitor stays off. The system loads but with the following error: BUG: unable to handle kernel NULL pointer dereference at 0000000000000078 IP: [<ffffffff81337bc3>] radeon_driver_open_kms+0x83/0x200 The full log is attached. Created attachment 137941 [details]
Boot log when GPU cannot be initialized.
With the recent kernels 3.14.8 to 3.14.10 and mesa, xf86-video-ati, and libdrm from git I didn't get the resume failures anymore. My usual uptime is of several days (reboot only after the minor kernel version update). However, when the machine is booted after being shut down completely, sometimes I am getting problems with the video initialization at the BIOS POST stage. But it looks like a different problem, maybe BIOS settings or hardware issues, so I am closing this bug report as resolved. |
Created attachment 137581 [details] kernel log with the lockup and following boot messages After leaving the computer on during the night it hung up in the morning, while I tried to use it again, with the following message: ------ May 28 06:13:06 [kernel] [153149.666146] radeon 0000:01:00.0: GPU lockup CP stall for more than 10033msec May 28 06:13:06 [kernel] [153149.666150] radeon 0000:01:00.0: GPU lockup (waiting for 0x000000000071debe last fence id 0x000000000071debd on ring 0) May 28 06:13:06 [kernel] [153150.122657] radeon 0000:01:00.0: GPU lockup CP stall for more than 10490msec May 28 06:13:06 [kernel] [153150.122661] radeon 0000:01:00.0: GPU lockup (waiting for 0x000000000071debe last fence id 0x000000000071debd on ring 0) May 28 06:13:06 [kernel] [153150.122664] radeon 0000:01:00.0: failed to get a new IB (-35) May 28 06:13:06 [kernel] [153150.124014] radeon 0000:01:00.0: sa_manager is not empty, clearing anyway May 28 06:13:07 [kernel] [153150.927575] radeon 0000:01:00.0: Saved 3296 dwords of commands on ring 0. May 28 06:13:07 [kernel] [153150.927713] radeon 0000:01:00.0: GPU softreset: 0x0000004D May 28 06:13:07 [kernel] [153150.927721] radeon 0000:01:00.0: GRBM_STATUS = 0xA3503028 May 28 06:13:07 [kernel] [153150.927726] radeon 0000:01:00.0: GRBM_STATUS_SE0 = 0x28000006 May 28 06:13:07 [kernel] [153150.927732] radeon 0000:01:00.0: GRBM_STATUS_SE1 = 0x2D000006 May 28 06:13:07 [kernel] [153150.927736] radeon 0000:01:00.0: SRBM_STATUS = 0x20000EC0 May 28 06:13:07 [kernel] [153150.927850] radeon 0000:01:00.0: SRBM_STATUS2 = 0x00000000 May 28 06:13:07 [kernel] [153150.927854] radeon 0000:01:00.0: R_008674_CP_STALLED_STAT1 = 0x00000000 May 28 06:13:07 [kernel] [153150.927859] radeon 0000:01:00.0: R_008678_CP_STALLED_STAT2 = 0x00004100 May 28 06:13:07 [kernel] [153150.927863] radeon 0000:01:00.0: R_00867C_CP_BUSY_STAT = 0x00028986 May 28 06:13:07 [kernel] [153150.927867] radeon 0000:01:00.0: R_008680_CP_STAT = 0x800282E7 May 28 06:13:07 [kernel] [153150.927874] radeon 0000:01:00.0: R_00D034_DMA_STATUS_REG = 0x44483146 May 28 06:13:07 [kernel] [153150.927878] radeon 0000:01:00.0: R_00D834_DMA_STATUS_REG = 0x44C83D57 May 28 06:13:07 [kernel] [153150.927883] radeon 0000:01:00.0: VM_CONTEXT1_PROTECTION_FAULT_ADDR 0x00000000 May 28 06:13:07 [kernel] [153150.927887] radeon 0000:01:00.0: VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x00000000 May 28 06:13:08 [kernel] [153152.248694] radeon 0000:01:00.0: Wait for MC idle timedout ! May 28 06:13:08 [kernel] [153152.248703] radeon 0000:01:00.0: GRBM_SOFT_RESET=0x0000DDFF May 28 06:13:08 [kernel] [153152.248760] radeon 0000:01:00.0: SRBM_SOFT_RESET=0x00100100 May 28 06:13:08 [kernel] [153152.249931] radeon 0000:01:00.0: GRBM_STATUS = 0x00003028 May 28 06:13:08 [kernel] [153152.249937] radeon 0000:01:00.0: GRBM_STATUS_SE0 = 0x00000006 May 28 06:13:08 [kernel] [153152.249941] radeon 0000:01:00.0: GRBM_STATUS_SE1 = 0x00000006 May 28 06:13:08 [kernel] [153152.249945] radeon 0000:01:00.0: SRBM_STATUS = 0x20000EC0 May 28 06:13:08 [kernel] [153152.250059] radeon 0000:01:00.0: SRBM_STATUS2 = 0x00000000 May 28 06:13:08 [kernel] [153152.250088] radeon 0000:01:00.0: R_008674_CP_STALLED_STAT1 = 0x00000000 May 28 06:13:08 [kernel] [153152.250092] radeon 0000:01:00.0: R_008678_CP_STALLED_STAT2 = 0x00000000 May 28 06:13:08 [kernel] [153152.250099] radeon 0000:01:00.0: R_00867C_CP_BUSY_STAT = 0x00000000 May 28 06:13:08 [kernel] [153152.250109] radeon 0000:01:00.0: R_008680_CP_STAT = 0x00000000 May 28 06:13:08 [kernel] [153152.250114] radeon 0000:01:00.0: R_00D034_DMA_STATUS_REG = 0x44C83D57 May 28 06:13:08 [kernel] [153152.250118] radeon 0000:01:00.0: R_00D834_DMA_STATUS_REG = 0x44C83D57 May 28 06:13:08 [kernel] [153152.250372] radeon 0000:01:00.0: GPU reset succeeded, trying to resume May 28 06:13:13 [kernel] [153157.261944] [drm:atom_op_jump] *ERROR* atombios stuck in loop for more than 5secs aborting May 28 06:13:13 [kernel] [153157.261952] [drm:atom_execute_table_locked] *ERROR* atombios stuck executing C008 (len 254, WS 0, PS 4) @ 0xC032 May 28 06:13:13 [kernel] [153157.261957] [drm:atom_execute_table_locked] *ERROR* atombios stuck executing B67A (len 94, WS 12, PS 8) @ 0xB6C3 May 28 06:13:13 [kernel] [153157.276909] [drm] probing gen 2 caps for device 1002:5a16 = 33ed02/0 May 28 06:13:13 [kernel] [153157.276918] [drm] PCIE gen 2 link speeds already enabled May 28 06:13:14 [kernel] [153157.761994] radeon 0000:01:00.0: Wait for MC idle timedout ! May 28 06:13:14 [kernel] [153157.982771] radeon 0000:01:00.0: Wait for MC idle timedout ! May 28 06:13:14 [kernel] [153157.988649] [drm] PCIE GART of 1024M enabled (table at 0x0000000000276000). May 28 06:13:14 [kernel] [153157.988769] radeon 0000:01:00.0: WB enabled May 28 06:13:14 [kernel] [153157.988775] radeon 0000:01:00.0: fence driver on ring 0 use gpu addr 0x0000000080000c00 and cpu addr 0xffff8800ba368c00 May 28 06:13:14 [kernel] [153157.988780] radeon 0000:01:00.0: fence driver on ring 1 use gpu addr 0x0000000080000c04 and cpu addr 0xffff8800ba368c04 May 28 06:13:14 [kernel] [153157.988785] radeon 0000:01:00.0: fence driver on ring 2 use gpu addr 0x0000000080000c08 and cpu addr 0xffff8800ba368c08 May 28 06:13:14 [kernel] [153157.988792] radeon 0000:01:00.0: fence driver on ring 3 use gpu addr 0x0000000080000c0c and cpu addr 0xffff8800ba368c0c May 28 06:13:14 [kernel] [153157.988798] radeon 0000:01:00.0: fence driver on ring 4 use gpu addr 0x0000000080000c10 and cpu addr 0xffff8800ba368c10 May 28 06:13:14 [kernel] [153157.989912] radeon 0000:01:00.0: fence driver on ring 5 use gpu addr 0x0000000000075a18 and cpu addr 0xffffc90010335a18 May 28 06:13:15 [kernel] [153158.516551] [drm:r600_ring_test] *ERROR* radeon: ring 0 test failed (scratch(0x850C)=0xCAFEDEAD) May 28 06:13:15 [kernel] [153158.516557] [drm:si_resume] *ERROR* si startup failed on resume May 28 06:13:15 [kernel] [153158.516629] [drm:radeon_pm_resume_dpm] *ERROR* radeon: dpm resume failed May 28 06:13:16 [kernel] [153159.392111] radeon 0000:01:00.0: Saved 9824 dwords of commands on ring 0. May 28 06:13:16 [kernel] [153159.392244] radeon 0000:01:00.0: GPU softreset: 0x00000048 May 28 06:13:16 [kernel] [153159.392246] radeon 0000:01:00.0: GRBM_STATUS = 0xA0003028 May 28 06:13:16 [kernel] [153159.392249] radeon 0000:01:00.0: GRBM_STATUS_SE0 = 0x00000006 May 28 06:13:16 [kernel] [153159.392251] radeon 0000:01:00.0: GRBM_STATUS_SE1 = 0x00000006 May 28 06:13:16 [kernel] [153159.392253] radeon 0000:01:00.0: SRBM_STATUS = 0x20000EC0 May 28 06:13:16 [kernel] [153159.392363] radeon 0000:01:00.0: SRBM_STATUS2 = 0x00000000 May 28 06:13:16 [kernel] [153159.392365] radeon 0000:01:00.0: R_008674_CP_STALLED_STAT1 = 0x00000000 May 28 06:13:16 [kernel] [153159.392367] radeon 0000:01:00.0: R_008678_CP_STALLED_STAT2 = 0x00010100 May 28 06:13:16 [kernel] [153159.392369] radeon 0000:01:00.0: R_00867C_CP_BUSY_STAT = 0x00420182 May 28 06:13:16 [kernel] [153159.392371] radeon 0000:01:00.0: R_008680_CP_STAT = 0x84038243 May 28 06:13:16 [kernel] [153159.392375] radeon 0000:01:00.0: R_00D034_DMA_STATUS_REG = 0x44C83D57 May 28 06:13:16 [kernel] [153159.392377] radeon 0000:01:00.0: R_00D834_DMA_STATUS_REG = 0x44C83D57 May 28 06:13:16 [kernel] [153159.392380] radeon 0000:01:00.0: VM_CONTEXT1_PROTECTION_FAULT_ADDR 0x00000000 May 28 06:13:16 [kernel] [153159.392383] radeon 0000:01:00.0: VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x00000000 May 28 06:13:17 [kernel] [153160.661846] radeon 0000:01:00.0: Wait for MC idle timedout ! May 28 06:13:17 [kernel] [153160.661850] radeon 0000:01:00.0: GRBM_SOFT_RESET=0x0000DDFF May 28 06:13:17 [kernel] [153160.661904] radeon 0000:01:00.0: SRBM_SOFT_RESET=0x00000100 May 28 06:13:17 [kernel] [153160.663060] radeon 0000:01:00.0: GRBM_STATUS = 0x00003028 May 28 06:13:17 [kernel] [153160.663062] radeon 0000:01:00.0: GRBM_STATUS_SE0 = 0x00000006 May 28 06:13:17 [kernel] [153160.663067] radeon 0000:01:00.0: GRBM_STATUS_SE1 = 0x00000006 May 28 06:13:17 [kernel] [153160.663075] radeon 0000:01:00.0: SRBM_STATUS = 0x20000EC0 May 28 06:13:17 [kernel] [153160.663186] radeon 0000:01:00.0: SRBM_STATUS2 = 0x00000000 May 28 06:13:17 [kernel] [153160.663193] radeon 0000:01:00.0: R_008674_CP_STALLED_STAT1 = 0x00000000 May 28 06:13:17 [kernel] [153160.663197] radeon 0000:01:00.0: R_008678_CP_STALLED_STAT2 = 0x00000000 May 28 06:13:17 [kernel] [153160.663199] radeon 0000:01:00.0: R_00867C_CP_BUSY_STAT = 0x00000000 May 28 06:13:17 [kernel] [153160.663200] radeon 0000:01:00.0: R_008680_CP_STAT = 0x00000000 May 28 06:13:17 [kernel] [153160.663207] radeon 0000:01:00.0: R_00D034_DMA_STATUS_REG = 0x44C83D57 May 28 06:13:17 [kernel] [153160.663209] radeon 0000:01:00.0: R_00D834_DMA_STATUS_REG = 0x44C83D57 May 28 06:13:17 [kernel] [153160.663478] radeon 0000:01:00.0: GPU reset succeeded, trying to resume ------ After that it didn't respond anymore, not even through ssh, so hard reset was required. I've noticed that simply pressing the reset button is not enough, and hard reset with powering off the computer is necessary.