Bug 93171 - System boot hang unless add acpi=off or pci=noacpi with coreboot
Summary: System boot hang unless add acpi=off or pci=noacpi with coreboot
Status: RESOLVED CODE_FIX
Alias: None
Product: Drivers
Classification: Unclassified
Component: Video(DRI - Intel) (show other bugs)
Hardware: i386 Linux
: P1 blocking
Assignee: Daniel Vetter
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2015-02-13 08:25 UTC by info
Modified: 2015-04-14 16:52 UTC (History)
6 users (show)

See Also:
Kernel Version: 3.19
Subsystem:
Regression: Yes
Bisected commit-id:


Attachments
copy of /proc/acpi from 3.13 (975 bytes, application/x-bzip)
2015-02-13 10:32 UTC, info
Details
output of `dmesg` on Lenovo X60t with Debian Jessie/testing and Linux 3.19 from the experimental repository (55.93 KB, text/plain)
2015-02-17 11:17 UTC, Paul Menzel
Details
dmesg on X60 with coreboot booting v3.19.1 with aaecdf61 reverted (72.61 KB, text/plain)
2015-03-15 08:17 UTC, Mono
Details
X60 with coreboot booting v3.19.2 loglevel 8 (42.55 KB, text/plain)
2015-03-29 20:34 UTC, Mono
Details
X60 with coreboot booting v3.19.2 loglevel 8 commit aaecdf61 reverted (58.09 KB, text/plain)
2015-03-29 20:37 UTC, Mono
Details
dont even enable CS error interrupts (2.41 KB, patch)
2015-04-01 11:41 UTC, Daniel Vetter
Details | Diff
X60 with coreboot booting v3.19.2 with patch from attachment 172941 (58.69 KB, text/plain)
2015-04-01 20:44 UTC, Mono
Details

Description info 2015-02-13 08:25:19 UTC
I get this message when booting on 3.19, and it freezes at that point, forcing me to power off at the wall.

This did not happen on earlier kernels on the same system. Booting with acpi=off makes the system boot without this error.

The system in question in a ThinkPad X60 with coreboot.
Comment 1 info 2015-02-13 08:32:50 UTC
Earlier kernels tested: between versions 3.2 all the way to 3.18. None of these exhibited the error. I'm getting some output based on https://www.kernel.org/doc/Documentation/acpi/debug.txt and https://wiki.ubuntu.com/DebuggingACPI now
Comment 2 Paul Menzel 2015-02-13 10:09:00 UTC
Thank you for the report. Please paste the output of both the latest working Linux kernel (3.18.? – did you test the latest stable version too) and Linux 3.19.

The Linux ACPI developers will probably know what to ask for specifically.
Comment 3 info 2015-02-13 10:32:07 UTC
Created attachment 166631 [details]
copy of /proc/acpi from 3.13
Comment 4 info 2015-02-13 10:32:19 UTC
Taken on 3.13:
dmidecode: http://paste.debian.net/plainh/6cf9e338/
acpidump: http://paste.debian.net/plainh/ac2bd14b/
lspci -vvnn: http://paste.debian.net/plainh/1763f05a/
dmesg: http://paste.debian.net/plainh/6eb7b5fa/

Attached acpi.tar.bz is a copy of /proc/acpi from 3.13 (I could not boot 3.19).

By the way, I found a report of the same error here: https://bugzilla.kernel.org/show_bug.cgi?id=92551

3.19 (kernel parameters in GRUB):
No options: system won't boot. ACPI PCC probe fail.
acpi=off -> system boots!
acpi=ht -> system won't boot. same error.
pci=noacpi -> system boots!
acpi=noirq -> system won't boot. same error.
pnpacpi=off -> system won't boot. same error.
noapic -> system won't boot. same error, plus "ACPI: SCI (ACPI GSI 9) not registered
nolapic -> won't boot. same errors as with noapic, plus "e1000e 0000:01:00.0 0000:01:00.0 (uninitialized): Failed to initialize MSI interrupts. Falling back to legacy interrupts."
Comment 5 info 2015-02-14 10:12:15 UTC
This might be related: https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=7e7e8fe69820c6fa31395dbbd8e348e3c69cd2a9

I need to perform a git bisect to find out exactly what commit caused my issue.
Comment 6 Aaron Lu 2015-02-16 08:23:00 UTC
The debug print "ACPI PCC probe failed" shouldn't cause any system problem, it's just a misuse of the pr_warn function and now has been changed to pr_debug in latest kernel. The fact that pci=noacpi could work around the problem suggests it is a PCI/ACPI issue. Please do the bisect, thanks.
Comment 7 info 2015-02-16 08:54:48 UTC
I will bisect linux later on to find out what commit caused the issue. It is a mistake to immediately blame coreboot until I have done that.
Comment 8 Aaron Lu 2015-02-17 01:48:57 UTC
Not blame coreboot, it's just not that ordinary so worth mark out in the subject line.
Comment 9 info 2015-02-17 11:02:47 UTC
Here is a boot log of 3.19 from another person, on the same machine, but with the vendor bios (not coreboot): http://paste.debian.net/plain/150469

it boots fine there. I'll bisect linux later as promised.
Comment 10 Paul Menzel 2015-02-17 11:17:50 UTC
Created attachment 167331 [details]
output of `dmesg` on Lenovo X60t with Debian Jessie/testing and Linux 3.19 from the experimental repository

(In reply to info from comment #9)
> Here is a boot log of 3.19 from another person, on the same machine, but
> with the vendor bios (not coreboot): http://paste.debian.net/plain/150469

As that paste expires in three days, I’ll attach it here.

I can boot normally, but there seem to be new ACPI messages unrelated to this bug report though.

    […]
    [    0.000000] ACPI BIOS Warning (bug): 32/64X length mismatch in FADT/Gpe0Block: 64/32 (20141107/tbfadt-618)
    [    0.000000] ACPI BIOS Warning (bug): 32/64X length mismatch in FADT/Gpe1Block: 0/32 (20141107/tbfadt-618)
    [    0.000000] ACPI BIOS Warning (bug): Optional FADT field Gpe1Block has zero address or length: 0x000000000000102C/0x0 (20141107/tbfadt-649)
    […]
    [    0.155940] ACPI Exception: AE_NOT_FOUND, While evaluating Sleep State [\_S1_] (20141107/hwxface-580)
    [    0.155947] ACPI Exception: AE_NOT_FOUND, While evaluating Sleep State [\_S2_] (20141107/hwxface-580)
    […]
Comment 11 Aaron Lu 2015-02-25 02:20:13 UTC
The last two lines doesn't matter, it means your laptop(as most of laptops today) does not support ACPI sleeping state S1 and S2. The first three lines may cause problems for S3, suspend-to-mem. If you do not see any problem, then you can safely ignore those messages.
Comment 12 Aaron Lu 2015-03-13 03:08:36 UTC
info@gluglug.org.uk,

Any news on the bisect?
Comment 13 Mono 2015-03-15 08:12:53 UTC
aaecdf61 (drm/i915: Stop gathering error states for CS error interrupts) might be the first bad commit. With what remains from v3.19.1 after reverting this commit boots on a X60 with coreboot. It throws some erros though (I'll try add the dmesg as an attachment)
Comment 14 Mono 2015-03-15 08:17:20 UTC
Created attachment 170671 [details]
dmesg on X60 with coreboot booting v3.19.1 with aaecdf61 reverted
Comment 15 info 2015-03-15 10:30:35 UTC
Wow, thanks for the bisect Mono.

Why was this commit even accepted into linux? Breaking old hardware just isn't acceptable. This should be fixed in the next release of Linux.
Comment 16 Paul Menzel 2015-03-15 18:46:49 UTC
By the way, Linux 3.19 from Debian experimental booted fine with the *vendor* firmware.

Francis, your second paragraph in comment #15 was not helpful. The reason is simple, you didn’t test it. The intention was certainly not to break old hardware. Commits just cannot be tested on every device.

Mono, big thanks to you!
Comment 17 Aaron Lu 2015-03-16 02:23:41 UTC
Daniel,

commit aaecdf611a05cac26a94713bad25297e60225c29
Author: Daniel Vetter <daniel.vetter@ffwll.ch>
Date:   Tue Nov 4 15:52:22 2014 +0100

    drm/i915: Stop gathering error states for CS error interrupts

caused ThinkPad X60 with coreboot as its firmware boots hang, can you please take a look? Thanks.

BTW, with vendor firmware, there is no such problem.
Comment 18 Paul Menzel 2015-03-18 12:10:42 UTC
Francis, Mono, could you please provide the log files as requested on the page »How to report a bugs« [1]?

[1] https://01.org/linuxgraphics/documentation/how-report-bugs
Comment 19 info 2015-03-18 16:15:25 UTC
I won't be around for about 7 days, if you want it done sooner, mono will have to.
Comment 20 Mono 2015-03-18 18:08:23 UTC
Ahoi Paul. I'd want to upload the log files, but can't figure out which once you refer to. Could you please specify which log files you think would be helpful? 

Just to make sure, you mean log files from running the X60 with coreboot and v3.19.1 with the reverted commit aaecdf61, right?
Comment 21 Paul Menzel 2015-03-26 11:28:27 UTC
(In reply to Mono from comment #20)
> Ahoi Paul.

Hi Mono. I am sorry for the late reply!

> I'd want to upload the log files, but can't figure out which once
> you refer to. Could you please specify which log files you think would be
> helpful? 
> 
> Just to make sure, you mean log files from running the X60 with coreboot and
> v3.19.1 with the reverted commit aaecdf61, right?

Actually I meant the one *without* the commit reverted. If you have an ultra bay/docking station you should easily get these over serial. Otherwise, Linux’ module netconsole [1] might also help.

Logs from both runs would be helpful though.

Lastly, have you tried 4.0-rc5 too?

[1] https://www.kernel.org/doc/Documentation/networking/netconsole.txt
Comment 22 Mono 2015-03-29 20:34:03 UTC
Created attachment 172571 [details]
X60 with coreboot booting v3.19.2 loglevel 8

kernel buffer booting linux v3.19.2 on a Thinkpad X60 with coreboot caught via usb console
Comment 23 Mono 2015-03-29 20:37:12 UTC
Created attachment 172581 [details]
X60 with coreboot booting v3.19.2 loglevel 8 commit aaecdf61 reverted

kernel buffer booting linux v3.19.2 on a Thinkpad X60 with coreboot caught via usb console

commit aaecdf61 reverted
Comment 24 Mono 2015-03-29 20:47:44 UTC
Hallo Paul,
thanks for your reply, unfortunately I do not have an ultra bay or docking station and netconsole somehow didn't work for me. I caught the kernel buffer via usb though. comment 22 has the kernel buffer from unchanged v3.19.2 which hangs and comment 23 with commit aaecdf61 reverted which does not hang.
I hope it helps. best regards
Mono
(In reply to Paul Menzel from comment #21)
> (In reply to Mono from comment #20)
> > Ahoi Paul.
> 
> Hi Mono. I am sorry for the late reply!
> 
> > I'd want to upload the log files, but can't figure out which once
> > you refer to. Could you please specify which log files you think would be
> > helpful? 
> > 
> > Just to make sure, you mean log files from running the X60 with coreboot
> and
> > v3.19.1 with the reverted commit aaecdf61, right?
> 
> Actually I meant the one *without* the commit reverted. If you have an ultra
> bay/docking station you should easily get these over serial. Otherwise,
> Linux’ module netconsole [1] might also help.
> 
> Logs from both runs would be helpful though.
> 
> Lastly, have you tried 4.0-rc5 too?
> 
> [1] https://www.kernel.org/doc/Documentation/networking/netconsole.txt
Comment 25 Daniel Vetter 2015-04-01 11:41:47 UTC
Created attachment 172941 [details]
dont even enable CS error interrupts

Can you please test the attached patche, without the revert?
Comment 26 info 2015-04-01 13:43:46 UTC
Mono ^
Comment 27 Mono 2015-04-01 20:44:31 UTC
Created attachment 172961 [details]
X60 with coreboot booting v3.19.2 with patch from attachment 172941 [details]

Hallo Daniel,

with your patch applied to v3.19.2 the machine boots. There is still a warning in the kernel buffer though, not sure what it means. thanks!

(see the attachment for the complete dmesg output)

best regards
Mono

[    0.353981] [drm] Initialized i915 1.6.0 20141121 for 0000:00:02.0 on minor 0
[    0.376881] ------------[ cut here ]------------
[    0.376900] WARNING: CPU: 1 PID: 44 at drivers/gpu/drm/drm_irq.c:1121 drm_wait_one_vblank+0x190/0x1a0 [drm]()
[    0.376902] vblank not available on crtc 0, ret=-22
[    0.376904] Modules linked in: i915 button intel_gtt i2c_algo_bit video drm_kms_helper drm i2c_core
[    0.376916] CPU: 1 PID: 44 Comm: kworker/u4:1 Not tainted 3.19.2-1-ARCH #1
[    0.376918] Hardware name: LENOVO 1709H6U/1709H6U, BIOS CBET4000 CBET4000 (2.17 ) 03/29/2015
[    0.376925] Workqueue: events_unbound async_run_entry_fn
[    0.376928]  0000000000000000 000000001ba54600 ffff8800b8edb768 ffffffff8155d97f
[    0.376931]  0000000000000000 ffff8800b8edb7c0 ffff8800b8edb7a8 ffffffff81073a4a
[    0.376935]  ffff8800b8edb7c8 ffff880035d30000 ffff880035cc4800 0000000000000000
[    0.376939] Call Trace:
[    0.376946]  [<ffffffff8155d97f>] dump_stack+0x4c/0x6e
[    0.376951]  [<ffffffff81073a4a>] warn_slowpath_common+0x8a/0xc0
[    0.376954]  [<ffffffff81073ad5>] warn_slowpath_fmt+0x55/0x70
[    0.376961]  [<ffffffffa001f0e0>] drm_wait_one_vblank+0x190/0x1a0 [drm]
[    0.377004]  [<ffffffffa011403f>] ? gen4_read32+0x4f/0xd0 [i915]
[    0.377037]  [<ffffffffa01686b5>] intel_enable_tv+0x25/0x60 [i915]
[    0.377066]  [<ffffffffa0131eab>] i9xx_crtc_enable+0x3fb/0x4c0 [i915]
[    0.377095]  [<ffffffffa0130422>] __intel_set_mode+0x8a2/0xca0 [i915]
[    0.377124]  [<ffffffffa013607c>] intel_set_mode+0x7c/0xc0 [i915]
[    0.377152]  [<ffffffffa0121b3c>] ? intel_framebuffer_init+0x31c/0x440 [i915]
[    0.377181]  [<ffffffffa0136346>] intel_get_load_detect_pipe+0x286/0x620 [i915]
[    0.377213]  [<ffffffffa01692b4>] intel_tv_detect+0x134/0x5c0 [i915]
[    0.377218]  [<ffffffff810db360>] ? migrate_timer_list+0xd0/0xd0
[    0.377225]  [<ffffffffa007cbe3>] drm_helper_probe_single_connector_modes_merge_bits+0x303/0x460 [drm_kms_helper]
[    0.377229]  [<ffffffff81014635>] ? __switch_to+0x175/0x5f0
[    0.377234]  [<ffffffffa007cd53>] drm_helper_probe_single_connector_modes+0x13/0x20 [drm_kms_helper]
[    0.377238]  [<ffffffffa0086eb0>] drm_fb_helper_probe_connector_modes.isra.3+0x50/0x70 [drm_kms_helper]
[    0.377243]  [<ffffffffa008810c>] drm_fb_helper_initial_config+0x5c/0xf50 [drm_kms_helper]
[    0.377272]  [<ffffffffa0145ccb>] intel_fbdev_initial_config+0x1b/0x20 [i915]
[    0.377275]  [<ffffffff810942bc>] async_run_entry_fn+0x4c/0x170
[    0.377279]  [<ffffffff8108c0a4>] process_one_work+0x144/0x3f0
[    0.377283]  [<ffffffff8108c6cb>] worker_thread+0x4b/0x460
[    0.377286]  [<ffffffff8108c680>] ? init_pwq.part.23+0x10/0x10
[    0.377289]  [<ffffffff81091748>] kthread+0xd8/0xf0
[    0.377293]  [<ffffffff81091670>] ? kthread_create_on_node+0x1c0/0x1c0
[    0.377296]  [<ffffffff81563118>] ret_from_fork+0x58/0x90
[    0.377300]  [<ffffffff81091670>] ? kthread_create_on_node+0x1c0/0x1c0
[    0.377302] ---[ end trace 7131527686ecda8a ]---
Comment 28 info 2015-04-01 23:53:03 UTC
confirmed. Daniel, I booted 3.19.3 but with your patch applied, and it now boots. same messages as Mono saw, but it works.
Comment 29 Jani Nikula 2015-04-02 12:14:21 UTC
The "vblank not available on crtc 0, ret=-22" message is probably https://bugs.freedesktop.org/show_bug.cgi?id=89108, queued to stable, see thread starting at http://mid.gmane.org/874mq3oif5.fsf@intel.com.
Comment 30 Mono 2015-04-03 13:13:48 UTC
Hallo Jani

you are right. f9b61ff6bce9a44555324b29e593fdffc9a115bc removes the warning.
does "queued to stable" mean that f9b61ff6bce9a44555324b29e593fdffc9a115bc is about to be commited to next 3.19 version?

thanks and best regards
Mono

(In reply to Jani Nikula from comment #29)
> The "vblank not available on crtc 0, ret=-22" message is probably
> https://bugs.freedesktop.org/show_bug.cgi?id=89108, queued to stable, see
> thread starting at http://mid.gmane.org/874mq3oif5.fsf@intel.com.
Comment 31 Paul Menzel 2015-04-06 16:54:18 UTC
(In reply to Mono from comment #30)

> you are right. f9b61ff6bce9a44555324b29e593fdffc9a115bc removes the warning.

Awesome. Thank you for testing. Could you please attach the Linux kernel output with the two patches applied?

And just to be sure, with these two patches applied the regression is solved, right?

> does "queued to stable" mean that f9b61ff6bce9a44555324b29e593fdffc9a115bc
> is about to be commited to next 3.19 version?

That’s what should happen. But it hasn’t been picked up yet [1][2] – no file with *vblank* in the name in there. So we need to stay patient.

[1] http://git.kernel.org/cgit/linux/kernel/git/stable/stable-queue.git/tree/queue-3.19
[2] http://git.kernel.org/cgit/linux/kernel/git/stable/stable-queue.git/tree/releases/3.19.3?id=v3.19.3
Comment 32 Jani Nikula 2015-04-14 13:04:35 UTC
commit 37ef01ab5d24d1d520dc79f6a98099d451c2a901
Author: Daniel Vetter <daniel.vetter@ffwll.ch>
Date:   Wed Apr 1 13:43:46 2015 +0200

    drm/i915: Dont enable CS_PARSER_ERROR interrupts at all

in drm-intel-next-fixes, to be merged for 4.1 and backported to stable. Thanks for the report. Please reopen if the problem persists with that commit.
Comment 33 info 2015-04-14 16:52:38 UTC
Thanks!

Note You need to log in before you can comment on or make changes to this bug.