Bug 7839
Summary: | boot hang unless "nmi_watchdog=0" - 2.6.19 regression - Asus M2400N (Centrino) | ||
---|---|---|---|
Product: | ACPI | Reporter: | Andrej Podzimek (andrej) |
Component: | Config-Interrupts | Assignee: | Len Brown (lenb) |
Status: | CLOSED CODE_FIX | ||
Severity: | high | CC: | acpi-bugzilla, andi-bz, andreas.thalhammer, barneyman, dgollub, gentuu, mingo |
Priority: | P2 | ||
Hardware: | i386 | ||
OS: | Linux | ||
Kernel Version: | 2.6.19.2 | Subsystem: | |
Regression: | --- | Bisected commit-id: | |
Attachments: |
2.6.18.6: dmesg -s64000
2.6.18.6: /proc/cpuinfo 2.6.18.6: /proc/interrupts 2.6.20: dmesg -s64000 2.6.20: /proc/cpuinfo 2.6.20: /proc/interrupts dmesg output with nmi_watchdog=0 dmesg output without nmi_watchdog=0 keep-watchdog-disabled-by-default fix keep-watchdog-disabled-by-default fix v2 |
Description
Andrej Podzimek
2007-01-17 04:37:46 UTC
does 2.6.20-rc kernel boot? Nope... rc5 hangs as well. What could be wrong? Are there any config or boot options I should try? what is you kernel config?I tried -rc5 on a Asus A6B, it boot just fine.Before boot, please remove any un-necessarily attached devices. Does the boot pass through request_standard_resources()? Please also add a printk in topology_init to verify. Done. With -rc6, the message I aded to topology_init() appears after "Setting up standard PCI resources", becoming the last line before disaster. ;-) As for request_standard_resources(), I couldn't see its message anywhere. Either its end is not reached, or it gets scrolled away too quickly. One more comment: I don't have any special devices attached and no cards inserted in my PCMCIA slots. I haven't replaced any hardware parts since I bought this laptop. Presumably, -rc6 does work with acpi=off in the kernel command line. However, this is not a viable solution, as soundcard and modem don't work when ACPI is disabled. 2.6.20 freezes the same way. This is pretty strange. I had the same issue here with linux 2.6.20 and a ASUS M2400N notebook. The
solution for me was to deactivate APIC support (Processor type and features
---> Local APIC support on uniprocessors). After that the kernel runs quite well.
Here is a diff of my changed .config:
< CONFIG_X86_UP_APIC=y
< CONFIG_X86_UP_IOAPIC=y
< CONFIG_X86_LOCAL_APIC=y
< CONFIG_X86_IO_APIC=y
---
> # CONFIG_X86_UP_APIC is not set
164d160
< # CONFIG_X86_MCE_P4THERMAL is not set
306,307d301
< # CONFIG_PCI_MSI is not set
< CONFIG_HT_IRQ=y
1787,1788d1780
< CONFIG_X86_FIND_SMP_CONFIG=y
< CONFIG_X86_MPPARSE=y
Bingo! Many thanks for the hint. The kernel command line token "nolapic" does the thing. What was wrong? Do I need local APIC on a uniprocessor? Does it bring any advantage? > The kernel command line token "nolapic" does the thing.
Please attach the output from dmesg -s64000 after this boot,
as well as past /proc/interrupts and /proc/cpuinfo.
There are a lot of laptops where the LAPIC is not configured
and is supposed to be disabled by the BIOS. A few years back
Linux got too clever and re-enabled the LAPIC on these boxes so
it could use the Local APIC timer -- and a whole bunch of
laptops stopped booting. I thought we fixed that, but maybe
we've had some sort of regression...
dmesg and /proc/interrupts from working 2.6.18 would help us find out
if you can attach them.
If you are doing profiling with NMI, you need the LAPIC.
If you have an IOAPIC, you need the LAPIC to talk to it.
If none of the above, there isn't a benefit
to the LAPIC vs no LAPIC on a uni-processor laptop.
Created attachment 10380 [details]
2.6.18.6: dmesg -s64000
Kernel command line:
devfs=nomount acpi=on acpi_irq_balance acpi_irq_isa=3,7,12
resume2=swap:/dev/hda7
Created attachment 10381 [details]
2.6.18.6: /proc/cpuinfo
Created attachment 10382 [details]
2.6.18.6: /proc/interrupts
Created attachment 10383 [details]
2.6.20: dmesg -s64000
Kernel command line:
devfs=nomount acpi=on nolapic acpi_irq_balance acpi_irq_isa=3,7,12
resume2=swap:/dev/hda7
Created attachment 10384 [details]
2.6.20: /proc/cpuinfo
Created attachment 10385 [details]
2.6.20: /proc/interrupts
Attachments ready. Thank you for your interest. May you need any further information, just let me know. Same thing affects HP nx6110. Intel Celeron M 1.3GHz, intel915. Letest Gentoo newest working kernel: 2.6.18.6 affected 2.6.19-rc1 and up (including 2.6.20) When closing the lid during boot-time system freezes instantly, while normal work (Xorg running) closing and opening the lid has a 1/2 chance to freeze the system, also happens when X blanks monitor and tries to turn it back on - pretty annoying. The nolapic option solves the problem. Do you need any more info? I have the same problems with an IBM Thinkpad R51 1830-DG4. You can find some valuable information about this laptop here: http://www-307.ibm.com/pc/support/site.wss/product.do?subcategoryind=0&familyind=168386&brandind=10&doccategoryind=0&modelind=171165&doctypeind=9&validate=true&partnumberind=0&sitestyle=lenovo&template=%2Fproductpage%2Flandingpages%2FproductPageLandingPage.vm&operatingsystemind=49979&machineind=168593 My kernel configuration and dmesg output can be found here: http://forums.gentoo.org/viewtopic-t-540369-start-0-postdays-0-postorder-asc-highlight-.html My system: recent Gentoo linux Last usable kernel: gentoo-sources-2.6.18-r6 (kernel 2.6.18.6 with a few patches) I will check if nolapic will work here, but I think it will. Ok, tried "nolapic" and works now. Note, I only had system freezes during the kernel initial boot phase - it would boot roughly 3/10 (three out of ten) times. If it did boot with the local APIC enabled it worked stable, I had no other limitations and no freezes after the boot process. In addition I don't use any acpi_irq_*something statements. I use an IBM Thinkpad R51 as stated in comment #19. Hope we find what the problem is here with > 2.6.18 kernels... Is booting with "nmi_watchdog=0" instead of "nolapic" a sufficient workaround? Yes, kernel 2.6.20 boots with kernel option "nmi_watchdog=0" instead of "nolapic" and seems to run stable. Yes, that works here, too. Here, too. Tried 2.6.19-gentoo-r6 with "nmi_watchdog=0" and the boot freeze is gone. I had a freeze at a very different point tough, during the init process (when gentoo is starting all the services in /etc/init.d, just before then runlevel 3 message) - once. Don't know if this has anything to do with it. Otherwise works here too. I'll continue testing... I've booted Linux with "nmi_watchdog=0" about 6 times since and worked for more than 10 hours (sometimes with a lot of applications performing their calculations) with no hangs or errors what so ever. Thank you for the workaround. Ingo re-sent the patch to disable NMI watchdog by default: http://lkml.org/lkml/2007/3/5/111 And Linus applied it today: http://lkml.org/lkml/2007/3/5/303 So this should be fixed as of the release that follows 2.6.21-rc2-git4 Closed. The regression seems to persist with kernel 2.6.21. With vanilla kernel 2.6.21.1 the boot hangs unless I append nmi_watchdog=0 or deactivate apic. 2.6.21 doesn't enable the NMI watchdog by default so that would be hard to believe. Hmm, what can I say. The following boot-parameter does work: kernel (hd0,6)/linux-2.6.21.1 root=/dev/hda3 nmi_watchdog=0 And this one does not: kernel (hd0,6)/linux-2.6.21.1 root=/dev/hda3 $ uname -a Linux topeka 2.6.21.1 #1 Wed May 2 21:06:56 CEST 2007 i686 Intel(R) Pentium(R) M processor 1400MHz GenuineIntel GNU/Linux How can I verify that the NMI watchdog is disabled by default here? As an attempt, I have deactivated all kernel options except APIC in my kernel-config. After that the kernel does not freeze when "Setting up PCI resources". But I have discovered some interesting kernel output: Testing NMI watchdog ... <6>Time: tsc clocksource has been installed. OK. Using IPI Shortcut mode The line "Testing NMI watchdog ... <6>" does not occur, when booting the kernel with nmi_watchdog=0. Maybe a module enables the NMI watchdog or something like that? I can attach my .config and my stripped .config if that helps. Well it's a placebo what you did. Maybe it just fails randomly? dmesg | grep -i watchdog or cat /proc/interrupts and check if NMIs are increasing. No, it's not a random behaviour. With "nmi_watchdog=0" the kernel runs stable, without it always fails to boot (tested approx. 10 times). Furthermore I have done some tests with the stripped kernel (APIC, PCI, PATA and ext3 enabled). These are the results, when booting without "nmi_watchdog=0": $ cat /proc/interrupts CPU0 0: 5868 XT-PIC-XT timer 1: 245 XT-PIC-XT i8042 2: 0 XT-PIC-XT cascade 12: 17 XT-PIC-XT i8042 14: 314 XT-PIC-XT ide0 15: 12 XT-PIC-XT ide1 NMI: 53 LOC: 5936 ERR: 0 MIS: 0 $ dmesg | grep "watchdog" Testing NMI watchdog ... OK. When I append "nmi_watchdog=0" I get the following output: $ cat /proc/interrupts CPU0 0: 7534 XT-PIC-XT timer 1: 240 XT-PIC-XT i8042 2: 0 XT-PIC-XT cascade 12: 17 XT-PIC-XT i8042 14: 312 XT-PIC-XT ide0 15: 12 XT-PIC-XT ide1 NMI: 0 LOC: 7602 ERR: 0 MIS: 0 $ dmesg | grep "watchdog" Kernel command line: root=/dev/hda3 nmi_watchdog=0 Can you add full boot.msg with and without nmi_watchdog=0? Created attachment 11382 [details]
dmesg output with nmi_watchdog=0
dmesg output with kernel option nmi_watchdog=0 and stripped kernel.
Note: the boot hangs for several seconds at this line:
0000:00:1d.7 EHCI: BIOS handoff failed (BIOS bug ?) 01010001
But this does not occur with my 'normal' kernel configuration.
Created attachment 11383 [details]
dmesg output without nmi_watchdog=0
dmesg output without kernel option nmi_watchdog=0 and stripped kernel.
Note: without the parameter the boot also hangs for several seconds at this
line:
0000:00:1d.7 EHCI: BIOS handoff failed (BIOS bug ?) 0101000
Just some observations: Christians dmesg(without forcing nmi_watchdog=0) says Found and enabled local APIC! which is emitted 2 lines after setting the nmi_watchdog to NMI_LOCAL_APIC, if it was not NMI_NONE(0) before(in apic.c:1063). nmi_watchdog gets initialized to NMI_DEFAULT in nmi.c, so his nmi watchdog is enabled. Created attachment 12347 [details] keep-watchdog-disabled-by-default fix I hit the same problem with the same machine and latest kernel (2.6.23-rc2). My patch should solve ... at least it solves for my ASUS M2400N. For more details please have a look at: https://bugzilla.novell.com/show_bug.cgi?id=298084#c9 I guess this could solve some other issues which can worked around by nmi_watchdog=0 as well. (In reply to comment #36) > which is emitted 2 lines after setting the nmi_watchdog to NMI_LOCAL_APIC, if > it > was not NMI_NONE(0) before(in apic.c:1063). > > nmi_watchdog gets initialized to NMI_DEFAULT in nmi.c, so his nmi watchdog is > enabled. Exactly. Created attachment 12356 [details]
keep-watchdog-disabled-by-default fix v2
Fixed little typo in x86_64 changes.
x86_64 changes compiled but untested.
Patch is in 2.6.23 mainline. Len you can close it. *** Bug 9787 has been marked as a duplicate of this bug. *** |