Bug 206125 - Freezing on boot since kernel 4.15.0-72-generic release
Summary: Freezing on boot since kernel 4.15.0-72-generic release
Status: RESOLVED CODE_FIX
Alias: None
Product: Platform Specific/Hardware
Classification: Unclassified
Component: x86-64 (show other bugs)
Hardware: x86-64 Linux
: P1 normal
Assignee: platform_x86_64@kernel-bugs.osdl.org
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2020-01-08 09:00 UTC by Tony
Modified: 2020-02-09 15:32 UTC (History)
3 users (show)

See Also:
Kernel Version: 5.5-rc5
Subsystem:
Regression: No
Bisected commit-id:


Attachments

Description Tony 2020-01-08 09:00:31 UTC
After the update to install kernel 4.15.0-72-generic my computer will not boot. On boot, all I see is the purple screen with:
  Loading Linux 4.15.0-72-generic ...
  Loading initial ramdisk ...
  and nothing happens. Just sits there. I've waited about 5-10 minutes on occasion but to no avail.
  I've checked a number of logs in /var/log but not found anything.

  If I go into the advanced options and select kernel
  4.15.0-70-generic, the computer boots normally.

This has previously been bugged on Launchpad [Bug 1856387],but have been advised to also lodge a bug on bugzilla.

ProblemType: Bug
DistroRelease: Ubuntu 18.04
Package: linux-image-4.15.0-72-generic 4.15.0-72.81
ProcVersionSignature: Ubuntu 4.15.0-70.79-generic 4.15.18
Uname: Linux 4.15.0-70-generic x86_64
ApportVersion: 2.20.9-0ubuntu7.9
Architecture: amd64
AudioDevicesInUse:
 USER PID ACCESS COMMAND
 /dev/snd/controlC0: tony 1977 F.... pulseaudio
CurrentDesktop: ubuntu:GNOME
Date: Sat Dec 14 21:53:14 2019
HibernationDevice: RESUME=UUID=5475ce25-e091-45e2-9811-9b5cddc08dd1
InstallationDate: Installed on 2018-09-16 (454 days ago)
InstallationMedia: Ubuntu 18.04.1 LTS "Bionic Beaver" - Release amd64 (20180725)
Lsusb:
 Bus 002 Device 001: ID 1d6b:0003 Linux Foundation 3.0 root hub
 Bus 001 Device 003: ID 04f2:b59e Chicony Electronics Co., Ltd
 Bus 001 Device 002: ID 046d:c063 Logitech, Inc. DELL Laser Mouse
 Bus 001 Device 004: ID 8087:0aaa Intel Corp.
 Bus 001 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub
MachineType: GIGABYTE Sabre 17WV8
ProcFB: 0 inteldrmfb
ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-4.15.0-70-generic root=UUID=9455257c-d3b7-4d61-853d-ab0b0ee40013 ro acpi=off
RelatedPackageVersions:
 linux-restricted-modules-4.15.0-70-generic N/A
 linux-backports-modules-4.15.0-70-generic N/A
 linux-firmware 1.173.13
SourcePackage: linux
UpgradeStatus: No upgrade log present (probably fresh install)
dmi.bios.date: 05/22/2018
dmi.bios.vendor: American Megatrends Inc.
dmi.bios.version: F05
dmi.board.asset.tag: Tag 12345
dmi.board.name: Sabre 17WV8
dmi.board.vendor: GIGABYTE
dmi.board.version: Not Applicable
dmi.chassis.asset.tag: No Asset Tag
dmi.chassis.type: 10
dmi.chassis.vendor: GIGABYTE
dmi.chassis.version: N/A
dmi.modalias: dmi:bvnAmericanMegatrendsInc.:bvrF05:bd05/22/2018:svnGIGABYTE:pnSabre17WV8:pvrNotApplicable:rvnGIGABYTE:rnSabre17WV8:rvrNotApplicable:cvnGIGABYTE:ct10:cvrN/A:
dmi.product.family: Sabre
dmi.product.name: Sabre 17WV8
dmi.product.version: Not Applicable
dmi.sys.vendor: GIGABYTE



Doing a Bisect (Ubuntu) I have established that the problem arose as a result of commit f723dd269d0740e09af47bb5590ffc4f61766153 as follows:-

git bisect good
f723dd269d0740e09af47bb5590ffc4f61766153 is the first bad commit
commit f723dd269d0740e09af47bb5590ffc4f61766153
Author: Thomas Gleixner <tglx@linutronix.de>
Date:   Thu Nov 7 09:05:00 2019 +0100

    x86/timer: Skip PIT initialization on modern chipsets

    BugLink: https://bugs.launchpad.net/bugs/1851216

    Recent Intel chipsets including Skylake and ApolloLake have a special
    ITSSPRC register which allows the 8254 PIT to be gated.  When gated, the
    8254 registers can still be programmed as normal, but there are no IRQ0
    timer interrupts.

    Some products such as the Connex L1430 and exone go Rugged E11 use this
    register to ship with the PIT gated by default. This causes Linux to fail
    to boot:

      Kernel panic - not syncing: IO-APIC + timer doesn't work! Boot with
      apic=debug and send a report.

    The panic happens before the framebuffer is initialized, so to the user, it
    appears as an early boot hang on a black screen.

    Affected products typically have a BIOS option that can be used to enable
    the 8254 and make Linux work (Chipset -> South Cluster Configuration ->
    Miscellaneous Configuration -> 8254 Clock Gating), however it would be best
    to make Linux support the no-8254 case.

    Modern sytems allow to discover the TSC and local APIC timer frequencies,
    so the calibration against the PIT is not required. These systems have
    always running timers and the local APIC timer works also in deep power
    states.

    So the setup of the PIT including the IO-APIC timer interrupt delivery
    checks are a pointless exercise.

    Skip the PIT setup and the IO-APIC timer interrupt checks on these systems,
    which avoids the panic caused by non ticking PITs and also speeds up the
    boot process.

    Thanks to Daniel for providing the changelog, initial analysis of the
    problem and testing against a variety of machines.

    Reported-by: Daniel Drake <drake@endlessm.com>
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
    Tested-by: Daniel Drake <drake@endlessm.com>
    Cc: bp@alien8.de
    Cc: hpa@zytor.com
    Cc: linux@endlessm.com
    Cc: rafael.j.wysocki@intel.com
    Cc: hdegoede@redhat.com
    Link: https://lkml.kernel.org/r/20190628072307.24678-1-drake@endlessm.com

    (backported from commit c8c4076723daca08bf35ccd68f22ea1c6219e207)
    Signed-off-by: You-Sheng Yang <vicamo.yang@canonical.com>
    Acked-by: Stefan Bader <stefan.bader@canonical.com>
    Acked-by: Connor Kuehl <connor.kuehl@canonical.com>
    Signed-off-by: Stefan Bader <stefan.bader@canonical.com>

:040000 040000 9c51f067713006f928684555c3254e89bdc10361 ad4d7a34eed39a733c78e630f4d9125f67e001bb M      arch


Regards
Comment 1 Tony 2020-01-09 07:29:41 UTC
Sorry, I should have added this as well. I have tried a number of upstream kernels and the problem is still reproducible with v5.5-rc5.
Comment 2 rcpa0 2020-01-10 19:48:38 UTC
MSI X470 Gaming M7 AC + AMD Ryzen 7 2700 8-Core
NX Mode [Disabled] will cause kernel 4.15.0-72-generic and 4.15.0-74-generic to hang after "  Loading initial ramdisk ...".  Changing MX Mode to [Enabled] allows it to boot.

This comment may or may not be relevant to this issue.  I came here from https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1856387
Comment 3 Tony 2020-01-11 06:27:33 UTC
Hello, thanks for responding. Sadly, I couldn't find any option for that in the  bios. Searched for both NX (amd) and XD (intel).
Comment 4 Thomas Gleixner 2020-01-23 10:02:26 UTC
(In reply to rcpa0 from comment #2)
> MSI X470 Gaming M7 AC + AMD Ryzen 7 2700 8-Core
> NX Mode [Disabled] will cause kernel 4.15.0-72-generic and 4.15.0-74-generic
> to hang after "  Loading initial ramdisk ...".  Changing MX Mode to
> [Enabled] allows it to boot.
> 
> This comment may or may not be relevant to this issue.  I came here from
> https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1856387

That's a different problem.

This setting Enables or Disables the no-execute page protection function.

Please open a separate bug for this.
Comment 5 Tony 2020-01-29 08:51:26 UTC
Thomas has proposed a patch to resolve this problem and I have applied these changes at both the identified commit and also at the latest version and in both cases my computer booted successfully.
Also tagged and commented in Launchpad.
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1856387
Comment 6 Tom Ivar Johansen 2020-01-29 18:53:19 UTC
I can confirm that I have applied the same patch to 4.15.0-76.86-generic. Without the patch my system failed as described above. With the patch it seems to work. 

This is my first post and my first attempt at building a kernel, so I hope this is not considered to be noise.

Note You need to log in before you can comment on or make changes to this bug.