Bug 216277 - X11 doesn't wait for amdgpu driver to be up
Summary: X11 doesn't wait for amdgpu driver to be up
Status: NEW
Alias: None
Product: Drivers
Classification: Unclassified
Component: Video(DRI - non Intel) (show other bugs)
Hardware: All Linux
: P1 normal
Assignee: drivers_video-dri
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2022-07-24 14:55 UTC by dark_sylinc
Modified: 2022-08-10 00:17 UTC (History)
1 user (show)

See Also:
Kernel Version: 5.18.11+
Subsystem:
Regression: No
Bisected commit-id:


Attachments
Xorg log when it fails (23.34 KB, text/plain)
2022-07-24 14:56 UTC, dark_sylinc
Details
Xorg log when it succeeds (39.28 KB, text/plain)
2022-07-24 14:56 UTC, dark_sylinc
Details

Description dark_sylinc 2022-07-24 14:55:50 UTC
# Context

I'm using Xubuntu 20.04
I compiled Kernel 5.18.11+ myself (shows bug)
I compiled Kernel 5.13.7+ myself (does not show bug)
My GPU is AMD Radeon 6800 XT 16GB, I don't have an iGPU (CPU is Ryzen 5900X)

Mesa is:

OpenGL renderer string: AMD Radeon RX 6800 XT (sienna_cichlid, LLVM 14.0.1, DRM 3.46, 5.18.11+)
OpenGL core profile version string: 4.6 (Core Profile) Mesa 22.0.5 - kisak-mesa PPA
OpenGL core profile shading language version string: 4.60


# Steps to reproduce

1. Turn on the PC
2. On *some* occasions X11 will crash, taking down the keyboard; leaving the computer in a seemingly frozen state while displaying tty with the last info messages
3. As a workaround, I can login via ssh and type `sudo service lightdm restart` and the X11 server will start and everything starts working perfectly fine

# Diagnostic

It seems X11 doesn't wait for amdgpu to be up. This can be seen by checking /var/log/Xorg.0.log (attached):

[     7.718] (II) modesetting: Driver for Modesetting Kernel Drivers: kms
[     7.718] (II) FBDEV: driver for framebuffer: fbdev
[     7.718] (II) VESA: driver for VESA chipsets: vesa
[     7.718] (WW) xf86OpenConsole: setpgid failed: Operation not permitted
[     7.718] (WW) xf86OpenConsole: setsid failed: Operation not permitted
[     7.719] (EE) open /dev/dri/card0: No such file or directory
[     7.719] (WW) Falling back to old probe method for modesetting
[     7.719] (EE) open /dev/dri/card0: No such file or directory

Visually speaking, I *think* that X11 tries to init while tty is still in VESA mode before/during switching to 1920x1080

AFAIK, systemd is responsible for waiting the GPU drivers are up. Does anybody know where I should look? Does systemd need an update? Could this be a libDRM issue? I currently have installed 2.4.110 in /usr/lib and libdrm 2.4.111 compiled from source in /usr/local/lib

I could try bisecting but unfortunately the reproducibility isn't "always" which makes it hard to debug.

All of this has been working fine with Kernel 5.13.7+

Cheers
Comment 1 dark_sylinc 2022-07-24 14:56:15 UTC
Created attachment 301479 [details]
Xorg log when it fails
Comment 2 dark_sylinc 2022-07-24 14:56:34 UTC
Created attachment 301480 [details]
Xorg log when it succeeds
Comment 3 Alex Deucher 2022-07-25 15:46:24 UTC
Maybe the driver or firmware is not available in your initrd so the driver can't be loaded during boot?
Comment 4 dark_sylinc 2022-07-25 16:07:47 UTC
Thanks for the hint!

The amdgpu driver (nor the firmware) are definitely NOT in /boot/initrd.img-5.18.11+

I will have to lookup how to include them into initrd.

Though it may be worth mentioning neither are they included in 5.13 (my custom build) nor in Ubuntu's official kernels.

I do wonder if previously was working fine by mere luck (i.e. race condition was just much harder to trigger) or if something changed that causes whatever Ubuntu does to wait on amdgpu to no longer wait
Comment 5 dark_sylinc 2022-07-25 16:27:05 UTC
Your hint is very good.

It tells me upstream kernel devs expect the amdgpu driver & firmware should be in initrd; while Ubuntu does not do that.

This is starting to look more and more like an Ubuntu bug.

I looked further into the matter and found out that
/lib/udev/rules.d/78-graphics-card.rules

has entries for

1. "drm": i915, radeon, nouveau, vmwgfx
2. "graphics": amdgpu, i915, radeon, nouveau, efifb, efi-framebuffer, vesa-framebuffer

I just edited the rules file to include amdgpu on both sections and see what happens. So far rebooted only once and Xorg didn't crash.

I'll monitor how it goes and if the crashes stop I'll close this ticket and report it to Ubuntu.
Comment 6 dark_sylinc 2022-07-26 15:35:13 UTC
OK today it happened again so changing 78-graphics-card.rules did not fix it.

I just found this:

https://bbs.archlinux.org/viewtopic.php?id=260525

Which leads me to this:

https://github.com/sddm/sddm/issues/1316

Apparently SDDM was having the same issue and the "fix" was to add QThread::sleep(1);

Does the Kernel have an interface to know if a GPU driver will be or is being loaded and get notified when it's done?

I assumed there was, but looking at those threads it appears there is not and graphical initialization is basically just YOLO?
Comment 7 dark_sylinc 2022-08-10 00:17:15 UTC
Adding amdgpu to initramfs seems to have workarounded the problem.

I have not experienced this problem after it. I can also visibly see the boot process is slightly different (splash becomes 1920x1080 a bit sooner)

If anyone is having the same issue, the workaround is (Ubuntu):

echo "amdgpu" | sudo tee --append /etc/initramfs-tools/modules
sudo update-initramfs -c -k $(uname -r)

If done properly then running:

lsinitramfs /boot/initrd.img-$(uname -r) | grep amdgpu

Should return multiple hits

Then reboot.

This ticket can be closed; but probably a new one to track an interface to notify when kernel is done loading all video interfaces should be created.

Note You need to log in before you can comment on or make changes to this bug.