Bug 7640
Summary: | [2.6.18] Significant delays when booting in vmware | ||
---|---|---|---|
Product: | Platform Specific/Hardware | Reporter: | Frans Pop (elendil) |
Component: | i386 | Assignee: | john stultz (john.stultz) |
Status: | REJECTED INVALID | ||
Severity: | normal | CC: | john.stultz, jth, mann, zach |
Priority: | P2 | ||
Hardware: | i386 | ||
OS: | Linux | ||
Kernel Version: | 2.6.18.3 | Subsystem: | |
Regression: | --- | Bisected commit-id: | |
Attachments: | Patch to verify calibrate_delay is being slow. |
Description
Frans Pop
2006-12-06 00:58:39 UTC
Does booting w/ clocksource=acpi_pm or clocksource=pit change the behavior? > Does booting w/ clocksource=acpi_pm or clocksource=pit change the behavior?
No, neither helps.
Huh. That's stumps me a bit. It looks like the delay calibration is going slowly, but the lpj numbers and the cpu_khz values between the two dmesgs (2.6.18 vs 2.6.17) are pretty similar. Its odd. Created attachment 9759 [details]
Patch to verify calibrate_delay is being slow.
This is just a simple printk patch that should verify that calibrate_delay is
the cause of the initial stall. You should see the "starting calibrate" message
then the stall, then the "finishing calibrate" message.
I'll check whether we've seen this internally at VMware. I didn't notice it when I was testing a slightly earlier version of John's patches. > You should see the "starting calibrate" message then the stall, then
> the "finishing calibrate" message.
Yes, confirmed. Only for the first delay though, it shows nothing for the
second one (but from your message that was expected).
Huh. Still stumped on this one. I suspect it has something to do w/ changes to read_current_timer(), but it appears both cases are using the TSC, so it shouldn't really be different. Other potential causes for slowness would be the timer tick being off, which could affect jiffies from being updated at the right frequency. But neither of these really seem right. Does booting w/ "noapic" change anything? A couple of people have indeed seen this internally at VMware, and we'll look into it from our end too. No results yet. Also see this LKML thread: http://groups.google.com/group/fa.linux.kernel/browse_thread/thread/ede1e4e63b08b916/11e5725a86524552?lnk=st&q=2.6.18+delay+boot+vmware&rnum=1&hl=en#11e5725a86524552 The max_cstate part of that thread looks like a red herring to me, though. > Does booting w/ "noapic" change anything?
Not for the initial delay.
It does get rid of the second delay, but seems to me only by skipping a whole
codepath as the message preceding that delay is gone too.
It also seems to make the boot as a whole more sluggish.
Any news on this one? I've just discovered that this issue seems to also have another effect: a 30 second timeout (for swapping floppies in a floppy based Debian install) takes ~60 seconds in VMWare. On real hardware it is exactly 30 seconds. The code used for the timeout is in: http://svn.debian.org/wsvn/d-i/trunk/packages/rootskel/src-bootfloppy/bin/timeout_read.c?op=file&rev=0&sc=0 John, your code looks great. This turns out to be a VMware bug. Two workarounds, put this in the .vmx configuration file for the VM: timer.calibrationUsec = 90000 Or boot Linux with lpj=XXXX as a bootparameter, where XXXX is the host value. Closing report as it turns out to be a VMWare issue, not a bug in the kernel. Setting timer.calibrationUsec works for me. Thanks to all for their efforts. |