Bug 93171
Summary: | System boot hang unless add acpi=off or pci=noacpi with coreboot | ||
---|---|---|---|
Product: | Drivers | Reporter: | info |
Component: | Video(DRI - Intel) | Assignee: | Daniel Vetter (daniel) |
Status: | RESOLVED CODE_FIX | ||
Severity: | blocking | CC: | aaron.lu, daniel, intel-gfx-bugs, kai.huuhko, mono-for-kernel-org, paulepanter |
Priority: | P1 | ||
Hardware: | i386 | ||
OS: | Linux | ||
Kernel Version: | 3.19 | Subsystem: | |
Regression: | Yes | Bisected commit-id: | |
Attachments: |
copy of /proc/acpi from 3.13
output of `dmesg` on Lenovo X60t with Debian Jessie/testing and Linux 3.19 from the experimental repository dmesg on X60 with coreboot booting v3.19.1 with aaecdf61 reverted X60 with coreboot booting v3.19.2 loglevel 8 X60 with coreboot booting v3.19.2 loglevel 8 commit aaecdf61 reverted dont even enable CS error interrupts X60 with coreboot booting v3.19.2 with patch from attachment 172941 |
Description
info
2015-02-13 08:25:19 UTC
Earlier kernels tested: between versions 3.2 all the way to 3.18. None of these exhibited the error. I'm getting some output based on https://www.kernel.org/doc/Documentation/acpi/debug.txt and https://wiki.ubuntu.com/DebuggingACPI now Thank you for the report. Please paste the output of both the latest working Linux kernel (3.18.? – did you test the latest stable version too) and Linux 3.19. The Linux ACPI developers will probably know what to ask for specifically. Created attachment 166631 [details]
copy of /proc/acpi from 3.13
Taken on 3.13: dmidecode: http://paste.debian.net/plainh/6cf9e338/ acpidump: http://paste.debian.net/plainh/ac2bd14b/ lspci -vvnn: http://paste.debian.net/plainh/1763f05a/ dmesg: http://paste.debian.net/plainh/6eb7b5fa/ Attached acpi.tar.bz is a copy of /proc/acpi from 3.13 (I could not boot 3.19). By the way, I found a report of the same error here: https://bugzilla.kernel.org/show_bug.cgi?id=92551 3.19 (kernel parameters in GRUB): No options: system won't boot. ACPI PCC probe fail. acpi=off -> system boots! acpi=ht -> system won't boot. same error. pci=noacpi -> system boots! acpi=noirq -> system won't boot. same error. pnpacpi=off -> system won't boot. same error. noapic -> system won't boot. same error, plus "ACPI: SCI (ACPI GSI 9) not registered nolapic -> won't boot. same errors as with noapic, plus "e1000e 0000:01:00.0 0000:01:00.0 (uninitialized): Failed to initialize MSI interrupts. Falling back to legacy interrupts." This might be related: https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=7e7e8fe69820c6fa31395dbbd8e348e3c69cd2a9 I need to perform a git bisect to find out exactly what commit caused my issue. The debug print "ACPI PCC probe failed" shouldn't cause any system problem, it's just a misuse of the pr_warn function and now has been changed to pr_debug in latest kernel. The fact that pci=noacpi could work around the problem suggests it is a PCI/ACPI issue. Please do the bisect, thanks. I will bisect linux later on to find out what commit caused the issue. It is a mistake to immediately blame coreboot until I have done that. Not blame coreboot, it's just not that ordinary so worth mark out in the subject line. Here is a boot log of 3.19 from another person, on the same machine, but with the vendor bios (not coreboot): http://paste.debian.net/plain/150469 it boots fine there. I'll bisect linux later as promised. Created attachment 167331 [details] output of `dmesg` on Lenovo X60t with Debian Jessie/testing and Linux 3.19 from the experimental repository (In reply to info from comment #9) > Here is a boot log of 3.19 from another person, on the same machine, but > with the vendor bios (not coreboot): http://paste.debian.net/plain/150469 As that paste expires in three days, I’ll attach it here. I can boot normally, but there seem to be new ACPI messages unrelated to this bug report though. […] [ 0.000000] ACPI BIOS Warning (bug): 32/64X length mismatch in FADT/Gpe0Block: 64/32 (20141107/tbfadt-618) [ 0.000000] ACPI BIOS Warning (bug): 32/64X length mismatch in FADT/Gpe1Block: 0/32 (20141107/tbfadt-618) [ 0.000000] ACPI BIOS Warning (bug): Optional FADT field Gpe1Block has zero address or length: 0x000000000000102C/0x0 (20141107/tbfadt-649) […] [ 0.155940] ACPI Exception: AE_NOT_FOUND, While evaluating Sleep State [\_S1_] (20141107/hwxface-580) [ 0.155947] ACPI Exception: AE_NOT_FOUND, While evaluating Sleep State [\_S2_] (20141107/hwxface-580) […] The last two lines doesn't matter, it means your laptop(as most of laptops today) does not support ACPI sleeping state S1 and S2. The first three lines may cause problems for S3, suspend-to-mem. If you do not see any problem, then you can safely ignore those messages. info@gluglug.org.uk, Any news on the bisect? aaecdf61 (drm/i915: Stop gathering error states for CS error interrupts) might be the first bad commit. With what remains from v3.19.1 after reverting this commit boots on a X60 with coreboot. It throws some erros though (I'll try add the dmesg as an attachment) Created attachment 170671 [details]
dmesg on X60 with coreboot booting v3.19.1 with aaecdf61 reverted
Wow, thanks for the bisect Mono. Why was this commit even accepted into linux? Breaking old hardware just isn't acceptable. This should be fixed in the next release of Linux. By the way, Linux 3.19 from Debian experimental booted fine with the *vendor* firmware. Francis, your second paragraph in comment #15 was not helpful. The reason is simple, you didn’t test it. The intention was certainly not to break old hardware. Commits just cannot be tested on every device. Mono, big thanks to you! Daniel, commit aaecdf611a05cac26a94713bad25297e60225c29 Author: Daniel Vetter <daniel.vetter@ffwll.ch> Date: Tue Nov 4 15:52:22 2014 +0100 drm/i915: Stop gathering error states for CS error interrupts caused ThinkPad X60 with coreboot as its firmware boots hang, can you please take a look? Thanks. BTW, with vendor firmware, there is no such problem. Francis, Mono, could you please provide the log files as requested on the page »How to report a bugs« [1]? [1] https://01.org/linuxgraphics/documentation/how-report-bugs I won't be around for about 7 days, if you want it done sooner, mono will have to. Ahoi Paul. I'd want to upload the log files, but can't figure out which once you refer to. Could you please specify which log files you think would be helpful? Just to make sure, you mean log files from running the X60 with coreboot and v3.19.1 with the reverted commit aaecdf61, right? (In reply to Mono from comment #20) > Ahoi Paul. Hi Mono. I am sorry for the late reply! > I'd want to upload the log files, but can't figure out which once > you refer to. Could you please specify which log files you think would be > helpful? > > Just to make sure, you mean log files from running the X60 with coreboot and > v3.19.1 with the reverted commit aaecdf61, right? Actually I meant the one *without* the commit reverted. If you have an ultra bay/docking station you should easily get these over serial. Otherwise, Linux’ module netconsole [1] might also help. Logs from both runs would be helpful though. Lastly, have you tried 4.0-rc5 too? [1] https://www.kernel.org/doc/Documentation/networking/netconsole.txt Created attachment 172571 [details]
X60 with coreboot booting v3.19.2 loglevel 8
kernel buffer booting linux v3.19.2 on a Thinkpad X60 with coreboot caught via usb console
Created attachment 172581 [details]
X60 with coreboot booting v3.19.2 loglevel 8 commit aaecdf61 reverted
kernel buffer booting linux v3.19.2 on a Thinkpad X60 with coreboot caught via usb console
commit aaecdf61 reverted
Hallo Paul, thanks for your reply, unfortunately I do not have an ultra bay or docking station and netconsole somehow didn't work for me. I caught the kernel buffer via usb though. comment 22 has the kernel buffer from unchanged v3.19.2 which hangs and comment 23 with commit aaecdf61 reverted which does not hang. I hope it helps. best regards Mono (In reply to Paul Menzel from comment #21) > (In reply to Mono from comment #20) > > Ahoi Paul. > > Hi Mono. I am sorry for the late reply! > > > I'd want to upload the log files, but can't figure out which once > > you refer to. Could you please specify which log files you think would be > > helpful? > > > > Just to make sure, you mean log files from running the X60 with coreboot > and > > v3.19.1 with the reverted commit aaecdf61, right? > > Actually I meant the one *without* the commit reverted. If you have an ultra > bay/docking station you should easily get these over serial. Otherwise, > Linux’ module netconsole [1] might also help. > > Logs from both runs would be helpful though. > > Lastly, have you tried 4.0-rc5 too? > > [1] https://www.kernel.org/doc/Documentation/networking/netconsole.txt Created attachment 172941 [details]
dont even enable CS error interrupts
Can you please test the attached patche, without the revert?
Mono ^ Created attachment 172961 [details] X60 with coreboot booting v3.19.2 with patch from attachment 172941 [details] Hallo Daniel, with your patch applied to v3.19.2 the machine boots. There is still a warning in the kernel buffer though, not sure what it means. thanks! (see the attachment for the complete dmesg output) best regards Mono [ 0.353981] [drm] Initialized i915 1.6.0 20141121 for 0000:00:02.0 on minor 0 [ 0.376881] ------------[ cut here ]------------ [ 0.376900] WARNING: CPU: 1 PID: 44 at drivers/gpu/drm/drm_irq.c:1121 drm_wait_one_vblank+0x190/0x1a0 [drm]() [ 0.376902] vblank not available on crtc 0, ret=-22 [ 0.376904] Modules linked in: i915 button intel_gtt i2c_algo_bit video drm_kms_helper drm i2c_core [ 0.376916] CPU: 1 PID: 44 Comm: kworker/u4:1 Not tainted 3.19.2-1-ARCH #1 [ 0.376918] Hardware name: LENOVO 1709H6U/1709H6U, BIOS CBET4000 CBET4000 (2.17 ) 03/29/2015 [ 0.376925] Workqueue: events_unbound async_run_entry_fn [ 0.376928] 0000000000000000 000000001ba54600 ffff8800b8edb768 ffffffff8155d97f [ 0.376931] 0000000000000000 ffff8800b8edb7c0 ffff8800b8edb7a8 ffffffff81073a4a [ 0.376935] ffff8800b8edb7c8 ffff880035d30000 ffff880035cc4800 0000000000000000 [ 0.376939] Call Trace: [ 0.376946] [<ffffffff8155d97f>] dump_stack+0x4c/0x6e [ 0.376951] [<ffffffff81073a4a>] warn_slowpath_common+0x8a/0xc0 [ 0.376954] [<ffffffff81073ad5>] warn_slowpath_fmt+0x55/0x70 [ 0.376961] [<ffffffffa001f0e0>] drm_wait_one_vblank+0x190/0x1a0 [drm] [ 0.377004] [<ffffffffa011403f>] ? gen4_read32+0x4f/0xd0 [i915] [ 0.377037] [<ffffffffa01686b5>] intel_enable_tv+0x25/0x60 [i915] [ 0.377066] [<ffffffffa0131eab>] i9xx_crtc_enable+0x3fb/0x4c0 [i915] [ 0.377095] [<ffffffffa0130422>] __intel_set_mode+0x8a2/0xca0 [i915] [ 0.377124] [<ffffffffa013607c>] intel_set_mode+0x7c/0xc0 [i915] [ 0.377152] [<ffffffffa0121b3c>] ? intel_framebuffer_init+0x31c/0x440 [i915] [ 0.377181] [<ffffffffa0136346>] intel_get_load_detect_pipe+0x286/0x620 [i915] [ 0.377213] [<ffffffffa01692b4>] intel_tv_detect+0x134/0x5c0 [i915] [ 0.377218] [<ffffffff810db360>] ? migrate_timer_list+0xd0/0xd0 [ 0.377225] [<ffffffffa007cbe3>] drm_helper_probe_single_connector_modes_merge_bits+0x303/0x460 [drm_kms_helper] [ 0.377229] [<ffffffff81014635>] ? __switch_to+0x175/0x5f0 [ 0.377234] [<ffffffffa007cd53>] drm_helper_probe_single_connector_modes+0x13/0x20 [drm_kms_helper] [ 0.377238] [<ffffffffa0086eb0>] drm_fb_helper_probe_connector_modes.isra.3+0x50/0x70 [drm_kms_helper] [ 0.377243] [<ffffffffa008810c>] drm_fb_helper_initial_config+0x5c/0xf50 [drm_kms_helper] [ 0.377272] [<ffffffffa0145ccb>] intel_fbdev_initial_config+0x1b/0x20 [i915] [ 0.377275] [<ffffffff810942bc>] async_run_entry_fn+0x4c/0x170 [ 0.377279] [<ffffffff8108c0a4>] process_one_work+0x144/0x3f0 [ 0.377283] [<ffffffff8108c6cb>] worker_thread+0x4b/0x460 [ 0.377286] [<ffffffff8108c680>] ? init_pwq.part.23+0x10/0x10 [ 0.377289] [<ffffffff81091748>] kthread+0xd8/0xf0 [ 0.377293] [<ffffffff81091670>] ? kthread_create_on_node+0x1c0/0x1c0 [ 0.377296] [<ffffffff81563118>] ret_from_fork+0x58/0x90 [ 0.377300] [<ffffffff81091670>] ? kthread_create_on_node+0x1c0/0x1c0 [ 0.377302] ---[ end trace 7131527686ecda8a ]--- confirmed. Daniel, I booted 3.19.3 but with your patch applied, and it now boots. same messages as Mono saw, but it works. The "vblank not available on crtc 0, ret=-22" message is probably https://bugs.freedesktop.org/show_bug.cgi?id=89108, queued to stable, see thread starting at http://mid.gmane.org/874mq3oif5.fsf@intel.com. Hallo Jani you are right. f9b61ff6bce9a44555324b29e593fdffc9a115bc removes the warning. does "queued to stable" mean that f9b61ff6bce9a44555324b29e593fdffc9a115bc is about to be commited to next 3.19 version? thanks and best regards Mono (In reply to Jani Nikula from comment #29) > The "vblank not available on crtc 0, ret=-22" message is probably > https://bugs.freedesktop.org/show_bug.cgi?id=89108, queued to stable, see > thread starting at http://mid.gmane.org/874mq3oif5.fsf@intel.com. (In reply to Mono from comment #30) > you are right. f9b61ff6bce9a44555324b29e593fdffc9a115bc removes the warning. Awesome. Thank you for testing. Could you please attach the Linux kernel output with the two patches applied? And just to be sure, with these two patches applied the regression is solved, right? > does "queued to stable" mean that f9b61ff6bce9a44555324b29e593fdffc9a115bc > is about to be commited to next 3.19 version? That’s what should happen. But it hasn’t been picked up yet [1][2] – no file with *vblank* in the name in there. So we need to stay patient. [1] http://git.kernel.org/cgit/linux/kernel/git/stable/stable-queue.git/tree/queue-3.19 [2] http://git.kernel.org/cgit/linux/kernel/git/stable/stable-queue.git/tree/releases/3.19.3?id=v3.19.3 commit 37ef01ab5d24d1d520dc79f6a98099d451c2a901 Author: Daniel Vetter <daniel.vetter@ffwll.ch> Date: Wed Apr 1 13:43:46 2015 +0200 drm/i915: Dont enable CS_PARSER_ERROR interrupts at all in drm-intel-next-fixes, to be merged for 4.1 and backported to stable. Thanks for the report. Please reopen if the problem persists with that commit. Thanks! |