Bug 14746 - Kernel fails to boot if KVM switch doesn't point to it
Summary: Kernel fails to boot if KVM switch doesn't point to it
Status: CLOSED INSUFFICIENT_DATA
Alias: None
Product: ACPI
Classification: Unclassified
Component: Other (show other bugs)
Hardware: All Linux
: P1 high
Assignee: acpi_other
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2009-12-06 04:56 UTC by av1474
Modified: 2012-10-30 02:12 UTC (History)
2 users (show)

See Also:
Kernel Version: 2.6.32
Subsystem:
Regression: No
Bisected commit-id:


Attachments
Boot log in failure case (11.68 KB, text/plain)
2009-12-06 04:57 UTC, av1474
Details
Boot log in case of success (12.18 KB, text/plain)
2009-12-06 04:57 UTC, av1474
Details
Failed boot log (with initcall_debug and printk.time=1) (47.24 KB, text/plain)
2009-12-07 08:51 UTC, av1474
Details
Successful boot log (with initcall_debug and printk.time=1) (47.24 KB, text/plain)
2009-12-07 08:52 UTC, av1474
Details
Failed boot log (with initcall_debug and printk.time=1) [for real this time] (49.33 KB, text/plain)
2009-12-08 10:17 UTC, av1474
Details
Failed log (46.57 KB, text/plain)
2009-12-09 09:38 UTC, av1474
Details
Successful log (139.95 KB, text/plain)
2009-12-09 09:39 UTC, av1474
Details
Failure log for kernel v2.6.32 (29.07 KB, text/plain)
2010-02-02 12:17 UTC, av1474
Details

Description av1474 2009-12-06 04:56:17 UTC
Linux fails to boot first time after a cold boot and after
/sbin/reboot.  I was able to capture the output via serial
console. Both successful and failed logs are included. This happens
with stock Slackware64 kernel as well as recently released 2.6.32.
Booting with 'nolapic' parameter works every time.

http://lkml.org/lkml/2009/12/4/309
Comment 1 av1474 2009-12-06 04:57:08 UTC
Created attachment 24046 [details]
Boot log in failure case
Comment 2 av1474 2009-12-06 04:57:43 UTC
Created attachment 24047 [details]
Boot log in case of success
Comment 3 Zhang Rui 2009-12-07 06:33:47 UTC
please add the boot option "initcall_debug printk.time=1" and re-catch the boot log. :)
Comment 4 av1474 2009-12-07 08:51:22 UTC
Created attachment 24068 [details]
Failed boot log (with initcall_debug and printk.time=1)
Comment 5 av1474 2009-12-07 08:52:01 UTC
Created attachment 24069 [details]
Successful boot log  (with initcall_debug and printk.time=1)
Comment 6 Zhang Rui 2009-12-08 08:33:09 UTC
Hmm, it seems that you attached the failure boot log twice.
Comment 7 av1474 2009-12-08 10:17:31 UTC
Created attachment 24090 [details]
Failed boot log (with initcall_debug and printk.time=1) [for real this time]
Comment 8 Zhang Rui 2009-12-09 01:37:16 UTC
(In reply to comment #7)
> Created an attachment (id=24090) [details]
> Failed boot log (with initcall_debug and printk.time=1) [for real this time]

I think you mean the successful boot log, right?
but it's truncated, please attach the FULL dmesg output.
Comment 9 av1474 2009-12-09 07:56:00 UTC
Please help me doing it properly this time. Since capturing those logs
are not as easy as i would have liked, involves physically moving a box
from one room to the next fooling around with network settings and KVM
switch.

a. http://bugzilla.kernel.org/attachment.cgi?id=24090 is indeed the log
   of successful boot

b. http://bugzilla.kernel.org/attachment.cgi?id=24090 is intentionally
   truncated by me, since nothing particularly stellar goes on there

c. http://bugzilla.kernel.org/attachment.cgi?id=24068 (the failed one)
   is truncated because the box just resets, without an oops or any
   indication of the reset cause

Regarding item (b), one of the reasons why i've truncated it is the
fact that the box has a video acquisition board with 4 saa7134 chips
and v4l is printing a lot of useless "your board is missing EEPROM"
messages with long and tedious enumeration of all saa7134 derived
grabbers it supports.

So, if _really_ needed, i can remove the board (since the problem is
always there regardless of the boards presence) and try to recapture
the logs, this time attaching the whole, regardless how big,
successful log. Is that what i should do?

Thanks.
Comment 10 Zhang Rui 2009-12-09 08:23:08 UTC
> So, if _really_ needed, i can remove the board (since the problem is
> always there regardless of the boards presence) and try to recapture
> the logs, this time attaching the whole, regardless how big,
> successful log. Is that what i should do?
> 
yes, this is what i want. thanks!
Comment 11 av1474 2009-12-09 09:38:05 UTC
Created attachment 24113 [details]
Failed log
Comment 12 av1474 2009-12-09 09:39:30 UTC
Created attachment 24114 [details]
Successful log

Probably worth noting that it took 3 resets for kernel to finally boot successfully this time.
Comment 13 av1474 2009-12-23 06:50:13 UTC
Another data point.

I have a KVM switch and in all cases when boot was unsuccessful the machine
was powered up when KVM was active for the other box. Just tried to perform
a full cycle with KVM pointing to this box before powering the box up and
everything was fine. Some more observations:

a. Also Linux on that keeps trying to boot failing if the KWM is not pointing
   to it.

b. Switching KVM after POST but before boot loader (GRUB) even tries to start
   the kernel also leads to an unsuccessful boot attempt

c. Windows has no problems.
Comment 14 Zhang Rui 2009-12-23 07:04:27 UTC
does the problem happens on this machine only?
i.e. if you connect another Linux machine to the KVM, does Linux boot successfully if the machine is not pointed by the KVM?
Comment 15 av1474 2009-12-23 07:31:01 UTC
Only on this machine.
Comment 16 Zhang Rui 2010-01-26 07:30:40 UTC
would you please try the latest upstream kernel,s ay 2.6.32 or 2.6.32-rc4 and attach the failure boot log?
Comment 17 av1474 2010-02-02 12:17:25 UTC
Created attachment 24869 [details]
Failure log for kernel v2.6.32
Comment 18 Zhang Rui 2010-06-09 05:49:47 UTC
Now it seems that the kernel failure is related to the KVM switch.
and the "nolapic" boot option no longer works, right?
Comment 19 av1474 2010-06-09 11:48:00 UTC
No, nolapic boot option does work.
Comment 20 Len Brown 2011-01-18 05:15:12 UTC
What does this have to do with ACPI?

Does the machine boot with "acpi=off"?
If yes, can the failure be reproduced with "acpi=off"?
Comment 21 Zhang Rui 2011-04-19 07:53:02 UTC
bug closed as there is no response from the bug reporter.
please re-open it if the problem still exists in the latest upstream kernel.

Note You need to log in before you can comment on or make changes to this bug.