Hello, This bug formerly has been reported at https://bugzilla.kernel.org/show_bug.cgi?id=196197 Because it seems to be a problem caused by the PCI subsystem, it is reassigned here. The bug could be bisected exactly: git rev-list v4.11-rc1 ... 60e8d3e11645a1b9c4197d9786df3894332c1685 (BAD) 190c3ee06a0f0660839785b7ad8a830e832d9481 (GOOD) ... Description: Every time the system is booting it will hang, when trying to start Xorg every 5 seconds in an endless loop. The keyboard is locked too. Only possibility to get a console is via ssh from another machine. The bug is reproducible always and can always be avoided by downgrading to a kernel 4.10, leaving the rest of the system unchanged. That bug comes with the step from kernel 4.10 to kernel 4.11. Hardware: AMD Opteron Board with graphics card in a PCI-E 8x Slot. 06:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] RV630 XT [Radeon HD 2600 XT] Relevant sections of kern.log: Jun 4 09:19:41 a1 kernel: [ 46.719247] radeon 0000:06:00.0: ring 0 stalled for more than 10093msec Jun 4 09:19:41 a1 kernel: [ 46.719253] radeon 0000:06:00.0: GPU lockup (current fence id 0x000000000000000e last fence id 0x0000000000000 012 on ring 0) Jun 4 09:19:41 a1 kernel: [ 46.729651] radeon 0000:06:00.0: Saved 121 dwords of commands on ring 0. Jun 4 09:19:41 a1 kernel: [ 46.729666] radeon 0000:06:00.0: GPU softreset: 0x00000019 Jun 4 09:19:41 a1 kernel: [ 46.729669] radeon 0000:06:00.0: R_008010_GRBM_STATUS = 0xE57C24E0 Jun 4 09:19:41 a1 kernel: [ 46.729672] radeon 0000:06:00.0: R_008014_GRBM_STATUS2 = 0x00113303 Jun 4 09:19:41 a1 kernel: [ 46.729674] radeon 0000:06:00.0: R_000E50_SRBM_STATUS = 0x200030C0 Jun 4 09:19:41 a1 kernel: [ 46.729676] radeon 0000:06:00.0: R_008674_CP_STALLED_STAT1 = 0x01000000 Jun 4 09:19:41 a1 kernel: [ 46.729678] radeon 0000:06:00.0: R_008678_CP_STALLED_STAT2 = 0x00001002 Jun 4 09:19:41 a1 kernel: [ 46.729680] radeon 0000:06:00.0: R_00867C_CP_BUSY_STAT = 0x00028C86 Jun 4 09:19:41 a1 kernel: [ 46.729683] radeon 0000:06:00.0: R_008680_CP_STAT = 0x808386C5 Jun 4 09:19:41 a1 kernel: [ 46.729685] radeon 0000:06:00.0: R_00D034_DMA_STATUS_REG = 0x44C83D57 Jun 4 09:19:41 a1 kernel: [ 46.965016] radeon 0000:06:00.0: R_008020_GRBM_SOFT_RESET=0x00007FEF Jun 4 09:19:41 a1 kernel: [ 46.965070] radeon 0000:06:00.0: SRBM_SOFT_RESET=0x00000100 Jun 4 09:19:41 a1 kernel: [ 46.967176] radeon 0000:06:00.0: R_008010_GRBM_STATUS = 0xA0003030 Jun 4 09:19:41 a1 kernel: [ 46.967178] radeon 0000:06:00.0: R_008014_GRBM_STATUS2 = 0x00000003 Jun 4 09:19:41 a1 kernel: [ 46.967181] radeon 0000:06:00.0: R_000E50_SRBM_STATUS = 0x2000B0C0 Jun 4 09:19:41 a1 kernel: [ 46.967183] radeon 0000:06:00.0: R_008674_CP_STALLED_STAT1 = 0x00000000 Jun 4 09:19:41 a1 kernel: [ 46.967185] radeon 0000:06:00.0: R_008678_CP_STALLED_STAT2 = 0x00000000 Jun 4 09:19:41 a1 kernel: [ 46.967187] radeon 0000:06:00.0: R_00867C_CP_BUSY_STAT = 0x00000000 Jun 4 09:19:41 a1 kernel: [ 46.967189] radeon 0000:06:00.0: R_008680_CP_STAT = 0x80100000 Jun 4 09:19:41 a1 kernel: [ 46.967191] radeon 0000:06:00.0: R_00D034_DMA_STATUS_REG = 0x44C83D57 Jun 4 09:19:41 a1 kernel: [ 46.967200] radeon 0000:06:00.0: GPU reset succeeded, trying to resume Jun 4 09:19:41 a1 kernel: [ 47.139539] [drm] PCIE GART of 512M enabled (table at 0x0000000000142000). Jun 4 09:19:41 a1 kernel: [ 47.139556] radeon 0000:06:00.0: WB enabled Jun 4 09:19:41 a1 kernel: [ 47.139559] radeon 0000:06:00.0: fence driver on ring 0 use gpu addr 0x0000000010000c00 and cpu addr 0xffff880 33fc2cc00 Jun 4 09:19:41 a1 kernel: [ 47.140372] radeon 0000:06:00.0: fence driver on ring 5 use gpu addr 0x00000000000521d0 and cpu addr 0xffffc90 004a121d0 Jun 4 09:19:42 a1 kernel: [ 47.343830] [drm:r600_ring_test [radeon]] *ERROR* radeon: ring 0 test failed (scratch(0x8504)=0xCAFEDEAD) Jun 4 09:19:42 a1 kernel: [ 47.343855] [drm:r600_resume [radeon]] *ERROR* r600 startup failed on resume Jun 4 09:19:52 a1 kernel: [ 57.358140] radeon 0000:06:00.0: ring 0 stalled for more than 10186msec Jun 4 09:19:52 a1 kernel: [ 57.358148] radeon 0000:06:00.0: GPU lockup (current fence id 0x000000000000000e last fence id 0x0000000000000 012 on ring 0) Jun 4 09:19:52 a1 kernel: [ 57.367850] radeon 0000:06:00.0: Saved 261817 dwords of commands on ring 0. Jun 4 09:19:52 a1 kernel: [ 57.367866] radeon 0000:06:00.0: GPU softreset: 0x00000008 Jun 4 09:19:52 a1 kernel: [ 57.367869] radeon 0000:06:00.0: R_008010_GRBM_STATUS = 0xA0003030 Jun 4 09:19:52 a1 kernel: [ 57.367871] radeon 0000:06:00.0: R_008014_GRBM_STATUS2 = 0x00000003 Greetings, Andreas
I've been experiencing the same proble with more or less the same setup. I bought a new mainboard because the frequent reboots exposed some hardware problems that otherwise showed up less frequently. Anyway, the new mainboard has an [AMD/ATI] RS780L [Radeon 3000] and suffered from this bug as well. Suddenly I realized that the reboots (or just the kernel errors) happened during the fb splash loading. I uninstalled the splashutils and disabled fb_con_decor in my kernel config and all is well since then. Uninstalling splashutils solved the issue for me, but I wanted to be 200% sure I will never have it again so I disabled fb_con_decor. I don't know if the others who are experiencing the same issue are using splashutils or something similar, but I think that what happened to me could be of some use to someone else.
Marking as duplicate of 196197 (the original report). We can continue this there. *** This bug has been marked as a duplicate of bug 196197 ***