Bug 211179 - xorg freezes, kill -9 fails
Summary: xorg freezes, kill -9 fails
Status: NEW
Alias: None
Product: Drivers
Classification: Unclassified
Component: Console/Framebuffers (show other bugs)
Hardware: All Linux
: P1 normal
Assignee: James Simmons
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2021-01-13 16:17 UTC by Don Allen
Modified: 2021-01-22 20:28 UTC (History)
1 user (show)

See Also:
Kernel Version: 5.10.6
Subsystem:
Regression: No
Bisected commit-id:


Attachments

Description Don Allen 2021-01-13 16:17:50 UTC
I am running up-to-date Arch Linux on a system with an Intel I5-9400 processor, 16 GB of ram, Asus mini-ITX motherboard. I do not use a desktop system, just a window manager (bspwm), starting X with startx at the login shell. I have tried using a different window manager and have seen the same problem described below.

About once/day, I am experiencing complete X freezes. The system is unresponsive to mouse or keyboard inputs, even attempts to switch consoles. I can ssh into the system from another and attempts to kill the xorg process with kill -9 fail, which suggests a driver problem to me. When I try to reboot, I get messages from systemd that it is waiting for the xorg process to die. Perhaps 5 or 10 minutes later, the system finally reboots.
Comment 1 rds1944 2021-01-16 17:01:42 UTC
This behavior is observable for the last few weeks since Slackware64-current adopted KV 5.10.x (1<x<8). It has never been seen in any prior implementation with a similar configuration.

Random permanent lockups (onset in minutes - hours - days). Keyboard + all mouse buttons dead, mouse pointer moves. Restore by hard reset only (Power button).

Identical issue on two machines. Lenovo ThinkPads: P15v (i7 8c/16t integrated UHD graphics, new) & X390 (i7 4c/8t integrated UHD graphics, 1 year in service)

General configuration: X/Xorg default-configured, libinput, window managers (FVWM|Openbox|i3), browsers (Google-Chrome-Stable|Firefox); no display manager, nor any KDE or Xfce. (The char '|' indicates alternatives that do not matter.)

General remarks: Each of these machines also carries recent Fedora 33 (KV 5.9.16) & Ubuntu 20.04.1 (KV 5.8) Gnome installations; no lockups occur. [Fedora 33 just added 5.10.6, too soon to say its fate.]

Using either of the machines to ssh as root at a console into the other seized machine shows that everything is actually running normally except the X server. kill(all) -9 on X, startx, xinit, .xinitrc has no effect. The logs in /var/log do not show any suspicious entries.

Some old postings suggest intel_idle.max_cstate=1 on the kernel command line has ameliorated the problem. It is being tested, but, really should not be needed.
Comment 2 Don Allen 2021-01-16 17:22:30 UTC
Created attachment 294675 [details]
attachment-3296-0.html

I am testing intel_idle.max_cstate=1 now. The system has been up without
incident since I made that change, but not long enough
for me to be able to say it has worked around this problem.

This is really a serious issue. Without a fix or a workaround, my system is
just not usable (and I just upgraded this machine a few weeks ago with a
new motherboard, an Intel i5-9400, 16 GB of memory and an nvme ssd). It
takes about 10 minutes to recover from one of these seizures.

On Sat, 16 Jan 2021 at 12:01, <bugzilla-daemon@bugzilla.kernel.org> wrote:

> https://bugzilla.kernel.org/show_bug.cgi?id=211179
>
> rds1944@gmail.com changed:
>
>            What    |Removed                     |Added
>
> ----------------------------------------------------------------------------
>                  CC|                            |rds1944@gmail.com
>
> --- Comment #1 from rds1944@gmail.com ---
> This behavior is observable for the last few weeks since
> Slackware64-current
> adopted KV 5.10.x (1<x<8). It has never been seen in any prior
> implementation
> with a similar configuration.
>
> Random permanent lockups (onset in minutes - hours - days). Keyboard + all
> mouse buttons dead, mouse pointer moves. Restore by hard reset only (Power
> button).
>
> Identical issue on two machines. Lenovo ThinkPads: P15v (i7 8c/16t
> integrated
> UHD graphics, new) & X390 (i7 4c/8t integrated UHD graphics, 1 year in
> service)
>
> General configuration: X/Xorg default-configured, libinput, window managers
> (FVWM|Openbox|i3), browsers (Google-Chrome-Stable|Firefox); no display
> manager,
> nor any KDE or Xfce. (The char '|' indicates alternatives that do not
> matter.)
>
> General remarks: Each of these machines also carries recent Fedora 33 (KV
> 5.9.16) & Ubuntu 20.04.1 (KV 5.8) Gnome installations; no lockups occur.
> [Fedora 33 just added 5.10.6, too soon to say its fate.]
>
> Using either of the machines to ssh as root at a console into the other
> seized
> machine shows that everything is actually running normally except the X
> server.
> kill(all) -9 on X, startx, xinit, .xinitrc has no effect. The logs in
> /var/log
> do not show any suspicious entries.
>
> Some old postings suggest intel_idle.max_cstate=1 on the kernel command
> line
> has ameliorated the problem. It is being tested, but, really should not be
> needed.
>
> --
> You may reply to this email to add a comment.
>
> You are receiving this mail because:
> You reported the bug.
Comment 3 rds1944 2021-01-17 18:10:35 UTC
Mainstream Linux uses modesetting driver only! A few of us still use KMS & the Intel video driver with X. The latter has received little attention over the last few years. This gives a couple options. Kernel KMS and/or Intel video developers both collaborate to update all. Or abandon the Intel video project & let the community explicitly know this route is dead.

Arch Linux has some good online docs on enabling / disabling each scheme.
Comment 4 Don Allen 2021-01-22 20:28:14 UTC
Since changing syslinux.cfg to include the two training options

APPEND root=/dev/nvme0n1p3 rw intel_idle.max_cstate=1 consoleblank=0

I have had no xorg crashes. This is over about 5 days of operation (shutdown at night, reboot in the morning). Without these option, the crash would occur approximately daily. So far, so good, as far as reliability. But what's the cost of these options? Increased energy use?

Note You need to log in before you can comment on or make changes to this bug.