Bug 199633 - Stable kernel regression after 4.16.3: startxfce4 no longer works: X startup seems to hang
Summary: Stable kernel regression after 4.16.3: startxfce4 no longer works: X startup ...
Status: RESOLVED OBSOLETE
Alias: None
Product: Drivers
Classification: Unclassified
Component: Video(DRI - non Intel) (show other bugs)
Hardware: x86-64 Linux
: P1 high
Assignee: drivers_video-dri
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2018-05-06 10:37 UTC by Klaus Kusche
Modified: 2018-05-22 11:09 UTC (History)
2 users (show)

See Also:
Kernel Version: 4.16.4
Subsystem:
Regression: No
Bisected commit-id:


Attachments

Description Klaus Kusche 2018-05-06 10:37:42 UTC
On my system, X is not started during boot:
My system boots with DRM framebuffer text consoles only.
After login, a "exec startxfce4" in .bash_profile brings up X and the desktop
for GUI users (not for root etc.) on the login vt (not an unused vt).

This worked fine up to and including 4.16.3
and fails with any later 4.16 kernel:
After login, the screen becomes and stays completely black:
No text, no desktop or background, not even a mouse cursor.

After switching to another virtual terminal, typing something,
and switching back, the X and desktop startup complete normally:
Up to now, I was unable to find any error message or other trace
of the problem afterwards, everything works fine.

Details:

* Just switching vt's forth and back does not suffice
(X still hangs after switching back).
I actually have to type something in the second vt,
but there is no need to login on the second vt to make things going.

* When logging in on the second vt and checking the process list,
the xorg server is up, as are the first few processes of the session startup
(but by far not all, not even a third of them).

* Bug component drivers/video was just a guess, could be anything from video/fb 
to mouse or evdev, could even be unrelated to drivers.
The only thing I know for sure is that it depends on the kernel
and only the kernel: The problem disappears when booting a 4.16.3 or earlier
and reappears when booting a later kernel, with everything else unchanged.

* It is not related to a specific DRM and X driver:
radeon kernel + modesetting X
radeon kernel + radeon X
amdgpu kernel + modesetting X
amdgpu kernel + amdgpu X
all show exactly the same problem.
(radeon + modesetting is my default, on AMD cape verde)
Comment 1 Michel Dänzer 2018-05-07 10:12:06 UTC
Can you bisect between 4.16.3 and .4?
Comment 2 Klaus Kusche 2018-05-07 10:28:38 UTC
(In reply to Michel Dänzer from comment #1)
> Can you bisect between 4.16.3 and .4?

Never did that.
Is there any how-to for bisect?
But at least my system should boot with vanilla kernels.
Comment 3 Alex Deucher 2018-05-09 05:01:33 UTC
google for "kernel git bisect howto"
Comment 4 Klaus Kusche 2018-05-20 10:52:08 UTC
Found time to bisect, the result was a surprise:
The culprit is the random patchset by Theodore Tso:
[1/5] random: fix crng_ready() test
[2/5] random: use a different mixing algorithm for add_device_randomness()
[3/5] random: set up the NUMA crng instances after the CRNG is fully initialized
[4/5] random: crng_reseed() should lock the crng instance that it is modifying
[5/5] random: add new ioctl RNDRESEEDCRNG

Don't know exactly which one of those five, because the results 
with only part of the patchset applied are inconsistent.

To double-check the result, I installed a 4.16.9.
X session startup failed as described above.
Then I replaced drivers/char/random.c in 4.16.9 with the 4.16.3 version
and this fixed X startup.
Comment 5 Linux_Chemist 2018-05-20 23:52:15 UTC
I've recently had the same issue caused by that very patch you mentioned
(random: fix crng_ready() test) and had exactly the same problem.

https://bugzilla.kernel.org/show_bug.cgi?id=199567

I installed rng-tools and turned back on options in my custom kernel that lead to more entropy generation (for example CONFIG_ATH9K_HWRNG) and can successfully use kernels past 4.17-rc1 (just before the patch was applied) again without any hang or lasting black screen.
Comment 6 Linux_Chemist 2018-05-20 23:55:18 UTC
(Note that this patch was backported to 4.16 in 4.16.4, hence why 4.16.3 doesn't have the issue. But reverting the patch is ill-advised as it address an important security vulnerability.)
Comment 7 Klaus Kusche 2018-05-21 08:44:41 UTC
Installing rng-tools fixed the problem for me, too.
Comment 8 Michel Dänzer 2018-05-22 08:26:31 UTC
Can you update the status of this report accordingly?

Note You need to log in before you can comment on or make changes to this bug.