Bug 196117
Summary: | amdgpu - RX 480 (polaris) - freeze during boot | ||
---|---|---|---|
Product: | Drivers | Reporter: | Paul K. Gerke (paulkgerke) |
Component: | Video(DRI - non Intel) | Assignee: | drivers_video-dri |
Status: | RESOLVED PATCH_ALREADY_AVAILABLE | ||
Severity: | normal | ||
Priority: | P1 | ||
Hardware: | x86-64 | ||
OS: | Linux | ||
Kernel Version: | 4.11.6 | Subsystem: | |
Regression: | No | Bisected commit-id: | |
Attachments: |
screen capture of the last kernel messages before system freeze
kernel config file for a working for the RX480 system |
I just realized that while trying to be concise, I deprived this bug report of a lot of information: The very short version: The AMDGPU driver somehow causes the system to hang after initramfs, after the kernel was loaded, but VERY early in the boot sequence. Within the first 100 milliseconds or so during regular boot, when the system normally switches to its graphics driver, my monitor goes black and the system just hangs. No communication possible. As stated above, the monitor stays on, so there is some sort of graphics signal. With the "nomodeset"-option, this issue is gone and the system comes up normally. This happens on all modern kernels, with different linux distros. I currently tried Xubuntu (16.04, 17.04), Ubuntu (17.04), and Arch Linux. I forgot which kernel versions I exactly tested, if relevant I can list them, but the result seemed all the same. Sounds like maybe some files under /lib/firmware/amdgpu/ are missing in the initrd. In order to get more information, try booting with modprobe.blacklist=amdgpu on the kernel command line and then run sudo modprobe amdgpu manually and check dmesg. Ah, thank you! Really quick @Michel Dänzer:"Sounds like maybe some files under /lib/firmware/amdgpu/ are missing in the initrd": I thought the same, so I baked the important firmware blobs right into the kernel, just in case there would be some issues. I will dig up the settings that I used tonight when I have some more time. Anyway, the other tip allowed me to extract the (Crash) logs using my serial console. The machine freezes up just as usual after modprobing. I did the following: - Boot with: linux /vmlinuz... root=UUID=xxxx ro debug ignore_loglevel modprobe.blacklist=amdgpu console=ttyUSB0,115200 - (I disabled all X-server functionality before so that it would not screw with any graphics settings) - I logged in and issued "sudo modprobe amdgpu", as suggested. - I recorded the following logs on my second system. [ 279.264565] [drm] amdgpu kernel modesetting enabled. [ 279.271228] AMD IOMMUv2 driver by Joerg Roedel <jroedel@suse.de> [ 279.271248] AMD IOMMUv2 functionality not available on this system [ 279.279814] CRAT table not found [ 279.279837] Finished initializing topology ret=0 [ 279.279877] kfd kfd: Initialized module [ 279.280153] checking generic (c0000000 760000) vs hw (c0000000 10000000) [ 279.280163] fb: switching to amdgpudrmfb from VESA VGA [ 279.280383] Console: switching to colour dummy device 80x25 [ 279.280941] [drm] initializing kernel modesetting (POLARIS10 0x1002:0x67DF 0x1043:0x0505 0xC7). [ 279.280960] [drm] register mmio base: 0xDFFC0000 [ 279.280962] [drm] register mmio size: 262144 [ 279.280968] [drm] doorbell mmio base: 0xDEA00000 [ 279.280971] [drm] doorbell mmio size: 2097152 [ 279.280983] [drm] probing gen 2 caps for device 10de:778 = 313d02/0 [ 279.280989] [drm] probing mlw for device 10de:778 = 313d02 [ 279.280997] [drm] UVD is enabled in VM mode [ 279.280999] [drm] VCE enabled in VM mode [ 279.303168] [drm] BIOS signature incorrect 1 1 [ 279.303174] amdgpu 0000:02:00.0: Invalid PCI ROM header signature: expecting 0xaa55, got 0x0000 [ 279.303781] ATOM BIOS: 67DFHB.15.50.0.0.AS18 [ 279.303792] [drm] GPU post is not needed [ 279.304306] amdgpu 0000:02:00.0: VRAM: 8192M 0x0000000000000000 - 0x00000001FFFFFFFF (8192M used) [ 279.304310] amdgpu 0000:02:00.0: GTT: 8192M 0x0000000200000000 - 0x00000003FFFFFFFF [ 279.304314] [drm] Detected VRAM RAM=8192M, BAR=256M [ 279.304316] [drm] RAM width 256bits GDDR5 [ 279.304414] [TTM] Zone kernel: Available graphics memory: 8214392 kiB [ 279.304416] [TTM] Zone dma32: Available graphics memory: 2097152 kiB [ 279.304418] [TTM] Initializing pool allocator [ 279.304423] [TTM] Initializing DMA pool allocator [ 279.304457] [drm] amdgpu: 8192M of VRAM memory ready [ 279.304459] [drm] amdgpu: 8192M of GTT memory ready. [ 279.304468] [drm] GART: num cpu pages 2097152, num gpu pages 2097152 [ 279.305642] [drm] PCIE GART of 8192M enabled (table at 0x0000000000040000). [ 279.305653] [drm] Supports vblank timestamp caching Rev 2 (21.10.2013). [ 279.305654] [drm] Driver supports precise vblank timestamp query. [ 279.305687] amdgpu 0000:02:00.0: amdgpu: using MSI. [ 279.305705] [drm] amdgpu: irq initialized. [ 279.305728] amdgpu: [powerplay] amdgpu: powerplay sw initialized [ 279.306759] [drm] AMDGPU Display Connectors [ 279.306762] [drm] Connector 0: [ 279.306765] [drm] DP-1 [ 279.306766] [drm] HPD1 [ 279.306770] [drm] DDC: 0x486c 0x486c 0x486d 0x486d 0x486e 0x486e 0x486f 0x486f [ 279.306772] [drm] Encoders: [ 279.306774] [drm] DFP1: INTERNAL_UNIPHY1 [ 279.306776] [drm] Connector 1: [ 279.306777] [drm] DP-2 [ 279.306780] [drm] HPD5 [ 279.306782] [drm] DDC: 0x4874 0x4874 0x4875 0x4875 0x4876 0x4876 0x4877 0x4877 [ 279.306783] [drm] Encoders: [ 279.306785] [drm] DFP2: INTERNAL_UNIPHY1 [ 279.306786] [drm] Connector 2: [ 279.306788] [drm] HDMI-A-1 [ 279.306791] [drm] HPD6 [ 279.306793] [drm] DDC: 0x4868 0x4868 0x4869 0x4869 0x486a 0x486a 0x486b 0x486b [ 279.306795] [drm] Encoders: [ 279.306797] [drm] DFP3: INTERNAL_UNIPHY2 [ 279.306798] [drm] Connector 3: [ 279.306800] [drm] HDMI-A-2 [ 279.306802] [drm] HPD4 [ 279.306803] [drm] DDC: 0x4870 0x4870 0x4871 0x4871 0x4872 0x4872 0x4873 0x4873 [ 279.306804] [drm] Encoders: [ 279.306806] [drm] DFP4: INTERNAL_UNIPHY2 [ 279.306809] [drm] Connector 4: [ 279.306812] [drm] DVI-D-1 [ 279.306813] [drm] HPD3 [ 279.306816] [drm] DDC: 0x487c 0x487c 0x487d 0x487d 0x487e 0x487e 0x487f 0x487f [ 279.306818] [drm] Encoders: [ 279.306820] [drm] DFP5: INTERNAL_UNIPHY [ 279.306870] amdgpu 0000:02:00.0: fence driver on ring 0 use gpu addr 0x0000000200000008, cpu addr 0xffff8a3867e70008 [ 279.307375] amdgpu 0000:02:00.0: fence driver on ring 1 use gpu addr 0x0000000200000018, cpu addr 0xffff8a3867e70018 [ 279.307435] amdgpu 0000:02:00.0: fence driver on ring 2 use gpu addr 0x0000000200000028, cpu addr 0xffff8a3867e70028 [ 279.307478] amdgpu 0000:02:00.0: fence driver on ring 3 use gpu addr 0x0000000200000038, cpu addr 0xffff8a3867e70038 [ 279.307513] amdgpu 0000:02:00.0: fence driver on ring 4 use gpu addr 0x0000000200000048, cpu addr 0xffff8a3867e70048 [ 279.307541] amdgpu 0000:02:00.0: fence driver on ring 5 use gpu addr 0x0000000200000058, cpu addr 0xffff8a3867e70058 [ 279.307727] amdgpu 0000:02:00.0: fence driver on ring 6 use gpu addr 0x0000000200000068, cpu addr 0xffff8a3867e70068 [ 279.308100] amdgpu 0000:02:00.0: fence driver on ring 7 use gpu addr 0x0000000200000078, cpu addr 0xffff8a3867e70078 [ 279.308152] amdgpu 0000:02:00.0: fence driver on ring 8 use gpu addr 0x0000000200000088, cpu addr 0xffff8a3867e70088 [ 279.308961] amdgpu 0000:02:00.0: fence driver on ring 9 use gpu addr 0x0000000200000098, cpu addr 0xffff8a3867e70098 [ [ 279.348966] amdgpu: [powerplay] [AVFS] Something is broken. See log! [ 279.711657] amdgpu: [powerplay] [ 279.711657] failed to send message 254 ret is 0 [ 279.711681] amdgpu: [powerplay] Can't find requested voltage id in vdd_dep_on_sclk table! [ 279.854919] amdgpu: [powerplay] DPM is already running [ 280.134978] clocksource: timekeeping watchdog on CPU2: Marking clocksource 'tsc' as unstable because the skew is too large: [ 280.135000] clocksource: 'hpet' wd_now: a19f4644 wd_last: a0b4da18 mask: ffffffff [ 280.135005] clocksource: 'tsc' cs_now: 1133a962472 cs_last: 112d67312e4 mask: ffffffffffffffff [ 280.135011] sched_clock: Marking unstable (279910536262, 224436834)<-(280245936976, -110963880) [ 280.135028] tsc: Marking TSC unstable due to clocksource watchdog [ 283.495299] clocksource: Switched to clocksource hpet [ 283.495410] amdgpu: [powerplay] SMC address must be 4 byte aligned. [ 283.495417] amdgpu: [powerplay] Failed to initialize Graphics Level! [ 283.495423] amdgpu: [powerplay] Failed to initialize SMC table! [ 283.635476] amdgpu: [powerplay] Failed to enable VR hot GPIO interrupt! [ 284.195615] amdgpu: [powerplay] Failed to enable ULV! [ 284.335700] amdgpu: [powerplay] Attempt to enable Master Deep Sleep switch failed! [ 284.335719] amdgpu: [powerplay] Failed to enable deep sleep master switch! - I spot a "Something is broken", so something seems fishy. - Note: I do not have time right now to process the log since I have to go to work. Hmm... I quickly checked the kernel codebase but do not really have a clue what is going on. When cross-correlating my issues with other issues, I guess it all starts to go wrong at the line "Something is broken"... which I found is just a result of a statecheck. I think I will try to follow this lead next: https://bugzilla.kernel.org/show_bug.cgi?id=193651#c8 It seems sensible to me that my issues are of similar nature with regards to "Stock distribution kernels do not have stable amdgpu code". I finally found some time to test some other kernel versions. Instead of the suggested 4.10.5 kernel from the link posted above, I tried kernel 4.10.0-rc5+. The errors are *different*: [ 129.738315] [drm] amdgpu kernel modesetting enabled. [ 129.742693] AMD IOMMUv2 driver by Joerg Roedel <jroedel@suse.de> [ 129.742700] AMD IOMMUv2 functionality not available on this system [ 129.747127] CRAT table not found [ 129.747134] Finished initializing topology ret=0 [ 129.747154] kfd kfd: Initialized module [ 129.747288] checking generic (c0000000 760000) vs hw (c0000000 10000000) [ 129.747289] fb: switching to amdgpudrmfb from VESA VGA [ 129.747339] Console: switching to colour dummy device 80x25 [ 129.747619] [drm] initializing kernel modesetting (POLARIS10 0x1002:0x67DF 0x1043:0x0505 0xC7). [ 129.747629] [drm] register mmio base: 0xDFFC0000 [ 129.747630] [drm] register mmio size: 262144 [ 129.747635] [drm] doorbell mmio base: 0xDEA00000 [ 129.747636] [drm] doorbell mmio size: 2097152 [ 129.747647] [drm] probing gen 2 caps for device 10de:778 = 313d02/0 [ 129.747652] [drm] probing mlw for device 10de:778 = 313d02 [ 129.747660] [drm] UVD is enabled in VM mode [ 129.747662] [drm] VCE enabled in VM mode [ 129.769609] [drm] BIOS signature incorrect 1 1 [ 129.769615] amdgpu 0000:02:00.0: Invalid PCI ROM header signature: expecting 0xaa55, got 0x0000 [ 129.770226] ATOM BIOS: 67DFHB.15.50.0.0.AS18 [ 129.770236] [drm] GPU post is not needed [ 129.771291] amdgpu 0000:02:00.0: VRAM: 8192M 0x0000000000000000 - 0x00000001FFFFFFFF (8192M used) [ 129.771295] amdgpu 0000:02:00.0: GTT: 8192M 0x0000000200000000 - 0x00000003FFFFFFFF [ 129.771299] [drm] Detected VRAM RAM=8192M, BAR=256M [ 129.771301] [drm] RAM width 256bits GDDR5 [ 129.772340] [TTM] Zone kernel: Available graphics memory: 8214568 kiB [ 129.772343] [TTM] Zone dma32: Available graphics memory: 2097152 kiB [ 129.772346] [TTM] Initializing pool allocator [ 129.772370] [TTM] Initializing DMA pool allocator [ 129.772424] [drm] amdgpu: 8192M of VRAM memory ready [ 129.772427] [drm] amdgpu: 8192M of GTT memory ready. [ 129.772444] [drm] GART: num cpu pages 2097152, num gpu pages 2097152 [ 129.773652] [drm] PCIE GART of 8192M enabled (table at 0x0000000000040000). [ 129.773667] [drm] Supports vblank timestamp caching Rev 2 (21.10.2013). [ 129.773669] [drm] Driver supports precise vblank timestamp query. [ 129.773721] amdgpu 0000:02:00.0: amdgpu: using MSI. [ 129.773750] [drm] amdgpu: irq initialized. [ 129.773783] amdgpu: [powerplay] amdgpu: powerplay sw initialized [ 129.775082] [drm] AMDGPU Display Connectors [ 129.775087] [drm] Connector 0: [ 129.775090] [drm] DP-1 [ 129.775092] [drm] HPD1 [ 129.775095] [drm] DDC: 0x486c 0x486c 0x486d 0x486d 0x486e 0x486e 0x486f 0x486f [ 129.775097] [drm] Encoders: [ 129.775099] [drm] DFP1: INTERNAL_UNIPHY1 [ 129.775101] [drm] Connector 1: [ 129.775103] [drm] DP-2 [ 129.775104] [drm] HPD5 [ 129.775107] [drm] DDC: 0x4874 0x4874 0x4875 0x4875 0x4876 0x4876 0x4877 0x4877 [ 129.775108] [drm] Encoders: [ 129.775110] [drm] DFP2: INTERNAL_UNIPHY1 [ 129.775112] [drm] Connector 2: [ 129.775113] [drm] HDMI-A-1 [ 129.775115] [drm] HPD6 [ 129.775117] [drm] DDC: 0x4868 0x4868 0x4869 0x4869 0x486a 0x486a 0x486b 0x486b [ 129.775122] [drm] Encoders: [ 129.775127] [drm] DFP3: INTERNAL_UNIPHY2 [ 129.775133] [drm] Connector 3: [ 129.775138] [drm] HDMI-A-2 [ 129.775143] [drm] HPD4 [ 129.775146] [drm] DDC: 0x4870 0x4870 0x4871 0x4871 0x4872 0x4872 0x4873 0x4873 [ 129.775148] [drm] Encoders: [ 129.775149] [drm] DFP4: INTERNAL_UNIPHY2 [ 129.775151] [drm] Connector 4: [ 129.775152] [drm] DVI-D-1 [ 129.775154] [drm] HPD3 [ 129.775156] [drm] DDC: 0x487c 0x487c 0x487d 0x487d 0x487e 0x487e 0x487f 0x487f [ 129.775157] [drm] Encoders: [ 129.775159] [drm] DFP5: INTERNAL_UNIPHY [ 129.775297] amdgpu 0000:02:00.0: fence driver on ring 0 use gpu addr 0x0000000200000008, cpu addr 0xffff9e78266d4008 [ 129.775339] amdgpu 0000:02:00.0: fence driver on ring 1 use gpu addr 0x0000000200000018, cpu addr 0xffff9e78266d4018 [ 129.775404] amdgpu 0000:02:00.0: fence driver on ring 2 use gpu addr 0x0000000200000028, cpu addr 0xffff9e78266d4028 [ 129.775453] amdgpu 0000:02:00.0: fence driver on ring 3 use gpu addr 0x0000000200000038, cpu addr 0xffff9e78266d4038 [ 129.775498] amdgpu 0000:02:00.0: fence driver on ring 4 use gpu addr 0x0000000200000048, cpu addr 0xffff9e78266d4048 [ 129.775542] amdgpu 0000:02:00.0: fence driver on ring 5 use gpu addr 0x0000000200000058, cpu addr 0xffff9e78266d4058 [ 129.775588] amdgpu 0000:02:00.0: fence driver on ring 6 use gpu addr 0x0000000200000068, cpu addr 0xffff9e78266d4068 [ 129.775634] amdgpu 0000:02:00.0: fence driver on ring 7 use gpu addr 0x0000000200000078, cpu addr 0xffff9e78266d4078 [ 129.775681] amdgpu 0000:02:00.0: fence driver on ring 8 use gpu addr 0x0000000200000088, cpu addr 0xffff9e78266d4088 [ 129.775756] amdgpu 0000:02:00.0: fence driver on ring 9 use gpu addr 0x0000000200000098, cpu addr 0xffff9e78266d4098 [ 129.775806] amdgpu 0000:02:00.0: fence driver on ring 10 use gpu addr 0x0000000200[ 129.815268] amdgpu: [powerplay] [AVFS] Something is broken. See log! [ 130.177622] amdgpu: [powerplay] [ 130.177622] failed to send message 254 ret is 0 [ 130.177657] amdgpu: [powerplay] Can't find requested voltage id in vdd_dep_on_sclk table! [ 130.320889] amdgpu: [powerplay] DPM is already running [ 130.600945] clocksource: timekeeping watchdog on CPU3: Marking clocksource 'tsc' as unstable because the skew is too large: [ 130.600953] clocksource: 'hpet' wd_now: c2a2cabc wd_last: c1aae530 mask: ffffffff [ 130.600955] clocksource: 'tsc' cs_now: 92d701459c cs_last: 926c30c1b7 mask: ffffffffffffffff [ 133.961920] amdgpu: [powerplay] SMC address must be 4 byte aligned. [ 133.961938] clocksource: Switched to clocksource hpet [ 133.961947] amdgpu: [powerplay] Failed to initialize Graphics Level! [ 133.961953] amdgpu: [powerplay] Failed to initialize SMC table! [ 134.102010] amdgpu: [powerplay] Failed to enable VR hot GPIO interrupt! [ 134.662150] amdgpu: [powerplay] Failed to enable ULV! [ 134.802179] amdgpu: [powerplay] Attempt to enable Master Deep Sleep switch failed! [ 134.802184] amdgpu: [powerplay] Failed to enable deep sleep master switch! The main correspondence seems to be the part: [ 279.711657] amdgpu: [powerplay] [ 279.711657] failed to send message 254 ret is 0 I will try some more debugging and correlate logs from different other kernels... maybe I can shed some more light on this. Success! Everything seems to be fixed... "everything": I tried the amdstaging-kernel for 4.11.0+ and it works now! Jippieh! This is the log for the working driver (I cut away a bit of the beginning which is the same as for the other kernel logs): [ 102.328213] amdgpu 0000:02:00.0: fence driver on ring 0 use gpu addr 0x0000000000000008, cpu addr 0xffff8efe66e4e008 [ 102.328265] amdgpu 0000:02:00.0: fence driver on ring 1 use gpu addr 0x0000000000000018, cpu addr 0xffff8efe66e4e018 [ 102.328331] amdgpu 0000:02:00.0: fence driver on ring 2 use gpu addr 0x0000000000000028, cpu addr 0xffff8efe66e4e028 [ 102.328377] amdgpu 0000:02:00.0: fence driver on ring 3 use gpu addr 0x0000000000000038, cpu addr 0xffff8efe66e4e038 [ 102.328517] amdgpu 0000:02:00.0: fence driver on ring 4 use gpu addr 0x0000000000000048, cpu addr 0xffff8efe66e4e048 [ 102.328568] amdgpu 0000:02:00.0: fence driver on ring 5 use gpu addr 0x0000000000000058, cpu addr 0xffff8efe66e4e058 [ 102.328610] amdgpu 0000:02:00.0: fence driver on ring 6 use gpu addr 0x0000000000000068, cpu addr 0xffff8efe66e4e068 [ 102.328646] amdgpu 0000:02:00.0: fence driver on ring 7 use gpu addr 0x0000000000000078, cpu addr 0xffff8efe66e4e078 [ 102.328681] amdgpu 0000:02:00.0: fence driver on ring 8 use gpu addr 0x0000000000000088, cpu addr 0xffff8efe66e4e088 [ 102.328706] amdgpu 0000:02:00.0: fence driver on ring 9 use gpu addr 0x000000000000009c, cpu addr 0xffff8efe66e4e09c [ 102.328795] amdgpu 0000:02:00.0: fence driver on ring 10 use gpu addr 0x00000000000000ac, cpu addr 0xffff8efe66e4e0ac [ 102.328833] amdgpu 0000:02:00.0: fence driver on ring 11 use gpu addr 0x00000000000000bc, cpu addr 0xffff8efe66e4e0bc [ 102.328847] [drm] Found UVD firmware Version: 1.79 Family ID: 16 [ 102.329116] amdgpu 0000:02:00.0: fence driver on ring 12 use gpu addr 0x000000f40122d420, cpu addr 0xffffa54243c5a420 [ 102.329128] [drm] Found VCE firmware Version: 52.4 Binary ID: 3 [ 102.329209] amdgpu 0000:02:00.0: fence driver on ring 13 use gpu addr 0x00000000000000dc, cpu addr 0xffff8efe66e4e0dc [ 102.329254] amdgpu 0000:02:00.0: fence driver on ring 14 use gpu addr 0x00000000000000ec, cpu addr 0xffff8efe66e4e0ec [ 102.368327] amdgpu: [powerplay] [AVFS] Something is broken. See log! [ 102.370210] amdgpu: [powerplay] Can't find requested voltage id in vdd_dep_on_sclk table! [ 102.382613] [drm] ring test on 0 succeeded in 14 usecs [ 102.383115] [drm] ring test on 9 succeeded in 9 usecs [ 102.383134] [drm] ring test on 1 succeeded in 8 usecs [ 102.383144] [drm] ring test on 2 succeeded in 3 usecs [ 102.383153] [drm] ring test on 3 succeeded in 3 usecs [ 102.383163] [drm] ring test on 4 succeeded in 3 usecs [ 102.383176] [drm] ring test on 5 succeeded in 5 usecs [ 102.383185] [drm] ring test on 6 succeeded in 3 usecs [ 102.383195] [drm] ring test on 7 succeeded in 3 usecs [ 102.383204] [drm] ring test on 8 succeeded in 3 usecs [ 102.383248] [drm] ring test on 10 succeeded in 5 usecs [ 102.383256] [drm] ring test on 11 succeeded in 6 usecs [ 102.429390] [drm] ring test on 12 succeeded in 1 usecs [ 102.429399] [drm] UVD initialized successfully. [ 102.539389] [drm] ring test on 13 succeeded in 7 usecs [ 102.539400] [drm] ring test on 14 succeeded in 3 usecs [ 102.539402] [drm] VCE initialized successfully. [ 102.539658] [drm] ib test on ring 0 succeeded [ 102.539821] [drm] ib test on ring 1 succeeded [ 102.539870] [drm] ib test on ring 2 succeeded [ 102.539913] [drm] ib test on ring 3 succeeded [ 102.539955] [drm] ib test on ring 4 succeeded [ 102.539994] [drm] ib test on ring 5 succeeded [ 102.540031] [drm] ib test on ring 6 succeeded [ 102.540072] [drm] ib test on ring 7 succeeded [ 102.540112] [drm] ib test on ring 8 succeeded [ 103.041685] [drm] ib test on ring 9 succeeded [ 103.041722] [drm] ib test on ring 10 succeeded [ 103.041751] [drm] ib test on ring 11 succeeded [ 103.043080] [drm] ib test on ring 12 succeeded [ 103.043301] [drm] ib test on ring 13 succeeded [ 103.141511] [drm] fb mappable at 0xC1437000 [ 103.141519] [drm] vram apper at 0xC0000000 [ 103.141521] [drm] size 7680000 [ 103.141522] [drm] fb depth is 24 [ 103.141523] [drm] pitch is 6400 [ 103.141588] fbcon: amdgpudrmfb (fb0) is primary device [ 103.263499] Console: switching to colour frame buffer device 200x75 [ 103.319197] systemd-journald[303]: Sent WATCHDOG=1 notification. [ 103.365845] amdgpu 0000:02:00.0: fb0: amdgpudrmfb frame buffer device [ 103.388258] [drm] Initialized amdgpu 3.17.0 20150101 for 0000:02:00.0 on minor 0 ----------------- For anybody who wants to reproduce what I did, here the instructions which work for ubuntu 16.04: - git-clone the kernel at git://people.freedesktop.org/~agd5f/linux - checkout the branch amd-staging-4.11 - I specifically used commit 3e3a7c55b8de38e0557fe954f236ca8e8e925d85 - Use the config-file attached below for building the kernel - The config files includes rules to bake the polaris-firmware files into the kernel itself. I do not know if this is good or bad, but it works for me. - Build the kernel - install the kernel (dpkg -i) - Follow the the not-kernel-related instructions on https://linuxconfig.org/getting-the-rx-480-running-with-amdgpu-on-linux Reboot, and enjoy. --------- I am lacking good synthetic tests for the OpenGL capabitlities at the moment, and just tested it by running some games. glxinfo reports that amdgpu is working, so all seems fine. I hope that this will not break again! Thanks again for the suggestions regarding debugging @Michel, I finally chewed through it... Created attachment 257145 [details]
kernel config file for a working for the RX480 system
Found the branch where this was already fixed... |
Created attachment 257069 [details] screen capture of the last kernel messages before system freeze Hey all! I recently swapped out my old Radeon HD7850 for a Radeon RX 480, that I only heard good things about on the Internet because of good kernel support. However, I ran into an issue: all kernels I tried hang after initramfs, but before the kernel actually boots. Here the symptoms: I tried several kernels by now, the stock kernels that come with XUbuntu 16.04, and several upstream ones (e.g. 4.12.0). This will be about 4.11.6, which I now compiled myself. - If I use "nomodeset" to boot the kernel, everything works fine, except for the graphics drivers which fails to initialize (amdgpu fails because of VGACON error). - If I boot in with options "linux /vmlinuz... root=UUID=xxxx ro debug ignore_loglevel boot_delay=100" PS: I also setup a serial link for debugging, but the crash seems to occur before the serial driver can start communicating, so no luck with that! - I see "loading initial ramdisk..." ... booting takes 3 minutes until the crash with that delay... - I taped the whole thing to at least capture some sort of error. The attached screenshot is the best I got at the moment. It says something about IOMMUv2 driver and... about it being not "avaialble on the system" (?) Is that my error? This is the last (half) frame before my screen turns black. After this, my keyboard LEDs turn off, my mouse also turns off (it has fancy LEDs which light up if it is initialized) and my screen stays **on**, curiously. The machine is not responding after this. At the moment I am stuck lacking good debugging tools. If somebody has an idea, I would be greatful. Until now, the only thing I can think of is digging into a fresh codebase - I bet graphics drivers are not really accessible though :-( One last PS: The graphicscard I have works perfectly fine on Windows on the same system. It is clearly some software issue, I assume!