Bug 203779 - booting with kernel version 5.1.6 on RX 580 hangs
Summary: booting with kernel version 5.1.6 on RX 580 hangs
Status: NEW
Alias: None
Product: Drivers
Classification: Unclassified
Component: Video(DRI - non Intel) (show other bugs)
Hardware: x86-64 Linux
: P1 blocking
Assignee: drivers_video-dri
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2019-06-02 04:27 UTC by Gobinda Joy
Modified: 2019-06-04 08:51 UTC (History)
1 user (show)

See Also:
Kernel Version: 5.1.6
Subsystem:
Regression: No
Bisected commit-id:


Attachments
Linux version 5.1.6-350.vanilla.knurd.1.fc30.x86_64 (91.54 KB, text/plain)
2019-06-02 08:30 UTC, Gobinda Joy
Details

Description Gobinda Joy 2019-06-02 04:27:03 UTC
My hardware is as follows:
CPU: i7 3770 at stock clock
Motherboard: Gigabyte G1.Sniper 3 latest BIOS available
RAM: 24 GB DDR3 at 1600 mhz
GPU: RX 580 8GB (Sapphire) latest VBIOS

The problem is with kernel 5.1.0 or higher (currently 5.1.6) Display hangs when amdgpu driver loads. I'm unable to determine if the booting is continued or hangs as well. Disk activity stops after couple seconds and not possible to switch TTY.
Ctrl+Alt+Del is unresponsive as well.

This problem goes away when amdgpu.dpm=0 is used but in that case dynamic power scaling is not available and gpu stuck at low clock, graphics performance is abysmal. Also GPU temp/fan speed utilities doesn't work.

Here is the excerpt of the problematic log lines:

Jun 02 09:54:05 kernel: amdgpu: [powerplay] 
                         last message was failed ret is 65535
Jun 02 09:54:06 kernel: amdgpu: [powerplay] 
                         failed to send message 15b ret is 65535 
Jun 02 09:54:06 kernel: hrtimer: interrupt took 287743313 ns
Jun 02 09:54:06 kernel: clocksource: timekeeping watchdog on CPU3: Marking clocksource 'tsc' as unstable because the skew is too large:
Jun 02 09:54:06 kernel: clocksource:                       'hpet' wd_now: 628dd7b wd_last: 5fef431 mask: ffffffff
Jun 02 09:54:06 kernel: clocksource:                       'tsc' cs_now: 254aa24747 cs_last: 25104a5bfd mask: ffffffffffffffff
Jun 02 09:54:06 kernel: tsc: Marking TSC unstable due to clocksource watchdog
Jun 02 09:54:07 kernel: amdgpu: [powerplay] 
                         last message was failed ret is 65535
Jun 02 09:54:07 kernel: amdgpu: [powerplay] 
                         failed to send message 148 ret is 65535 
Jun 02 09:54:07 kernel: amdgpu: [powerplay] 
                         last message was failed ret is 65535
Jun 02 09:54:07 kernel: amdgpu: [powerplay] 
                         failed to send message 145 ret is 65535 
Jun 02 09:54:08 kernel: amdgpu: [powerplay] 
                         last message was failed ret is 65535
Jun 02 09:54:08 kernel: TSC found unstable after boot, most likely due to broken BIOS. Use 'tsc=unstable'.
Jun 02 09:54:08 kernel: sched_clock: Marking unstable (8791691311, 362291)<-(8817904668, -25851212)
Jun 02 09:54:08 kernel: amdgpu: [powerplay] 
                         failed to send message 146 ret is 65535 
Jun 02 09:54:08 kernel: hid-generic 0003:09DA:FC7C.0003: input,hidraw2: USB HID v1.11 Mouse [COMPANY USB Device] on usb-0000:00:1a.0-1.5.3/input0
Jun 02 09:54:09 kernel: hid-generic 0003:09DA:FC7C.0004: hiddev97,hidraw3: USB HID v1.11 Device [COMPANY USB Device] on usb-0000:00:1a.0-1.5.3/input1
Jun 02 09:54:11 kernel: clocksource: Switched to clocksource hpet
Jun 02 09:54:13 kernel: amdgpu: [powerplay] 
                         last message was failed ret is 65535
Jun 02 09:54:13 kernel: amdgpu: [powerplay] 
                         failed to send message 260 ret is 65535 
Jun 02 09:54:14 kernel: amdgpu: [powerplay] 
                         last message was failed ret is 65535
Jun 02 09:54:14 kernel: amdgpu: [powerplay] 
                         failed to send message 260 ret is 65535 
Jun 02 09:54:14 kernel: amdgpu: [powerplay] 
                         last message was failed ret is 65535
Jun 02 09:54:14 kernel: amdgpu: [powerplay] 
                         failed to send message 260 ret is 65535 
Jun 02 09:54:14 kernel: amdgpu: [powerplay] 
                         last message was failed ret is 65535
Jun 02 09:54:14 kernel: amdgpu: [powerplay] 
                         failed to send message 260 ret is 65535 
Jun 02 09:54:14 kernel: amdgpu: [powerplay] 
                         last message was failed ret is 65535
Jun 02 09:54:14 kernel: amdgpu: [powerplay] 
                         failed to send message 260 ret is 65535 
Jun 02 09:54:14 kernel: amdgpu: [powerplay] 
                         last message was failed ret is 65535
Jun 02 09:54:14 kernel: amdgpu: [powerplay] 
                         failed to send message 260 ret is 65535 
Jun 02 09:54:14 kernel: amdgpu: [powerplay] 
                         last message was failed ret is 65535
Jun 02 09:54:14 kernel: amdgpu: [powerplay] 
                         failed to send message 260 ret is 65535 
Jun 02 09:54:14 kernel: amdgpu: [powerplay] 
                         last message was failed ret is 65535
Jun 02 09:54:14 kernel: amdgpu: [powerplay] 
                         failed to send message 260 ret is 65535 
Jun 02 09:54:14 kernel: amdgpu: [powerplay] 
                         last message was failed ret is 65535
Jun 02 09:54:15 kernel: amdgpu: [powerplay] 
                         failed to send message 260 ret is 65535 
Jun 02 09:54:15 kernel: amdgpu: [powerplay] 
                         last message was failed ret is 65535
Jun 02 09:54:15 kernel: amdgpu: [powerplay] 
                         failed to send message 260 ret is 65535 
Jun 02 09:54:15 kernel: amdgpu: [powerplay] 
                         last message was failed ret is 65535
Jun 02 09:54:16 kernel: amdgpu: [powerplay] 
                         failed to send message 260 ret is 65535 
Jun 02 09:54:16 kernel: amdgpu: [powerplay] 
                         last message was failed ret is 65535
Jun 02 09:54:16 kernel: amdgpu: [powerplay] 
                         failed to send message 260 ret is 65535 
Jun 02 09:54:16 kernel: amdgpu: [powerplay] 
                         last message was failed ret is 65535
Jun 02 09:54:16 kernel: amdgpu: [powerplay] 
                         failed to send message 260 ret is 65535 
Jun 02 09:54:16 kernel: amdgpu: [powerplay] 
                         last message was failed ret is 65535
Jun 02 09:54:16 kernel: amdgpu: [powerplay] 
                         failed to send message 260 ret is 65535 
Jun 02 09:54:16 kernel: amdgpu: [powerplay] 
                         last message was failed ret is 65535
Jun 02 09:54:16 kernel: amdgpu: [powerplay] 
                         failed to send message 260 ret is 65535 
Jun 02 09:54:16 kernel: amdgpu: [powerplay] 
                         last message was failed ret is 65535
Jun 02 09:54:16 kernel: amdgpu: [powerplay] 
                         failed to send message 260 ret is 65535 
Jun 02 09:54:16 kernel: amdgpu: [powerplay] 
                         last message was failed ret is 65535
Jun 02 09:54:16 kernel: amdgpu: [powerplay] 
                         failed to send message 260 ret is 65535 
Jun 02 09:54:17 kernel: [drm] Initialized amdgpu 3.30.0 20150101 for 0000:04:00.0 on minor 0
Jun 02 09:54:17 kernel: EXT4-fs (sda3): mounted filesystem with ordered data mode. Opts: (null)
Jun 02 09:54:20 kernel: amdgpu 0000:04:00.0: [drm:amdgpu_ib_ring_tests [amdgpu]] *ERROR* IB test failed on gfx (-110).
Jun 02 09:54:21 kernel: [drm:amdgpu_device_ip_late_init_func_handler [amdgpu]] *ERROR* ib ring test failed (-110).

Any help is appreciated. Also let me know if I can help in any way.
Comment 1 Gobinda Joy 2019-06-02 08:30:52 UTC
Created attachment 283031 [details]
Linux version 5.1.6-350.vanilla.knurd.1.fc30.x86_64
Comment 2 Alex Deucher 2019-06-04 05:21:25 UTC
Can you bisect?
Comment 3 Gobinda Joy 2019-06-04 06:38:59 UTC
(In reply to Alex Deucher from comment #2)
> Can you bisect?

I thought helping as more in line with if you need more logs/debug logs. Not sure I can bisect this bug though. Sorry for getting your hopes up.
Comment 4 Gobinda Joy 2019-06-04 08:51:25 UTC
(In reply to Alex Deucher from comment #2)
> Can you bisect?

Same bug report: https://bugs.freedesktop.org/show_bug.cgi?id=110822

Note You need to log in before you can comment on or make changes to this bug.