Bug 14244
Summary: | 2.6.31-1 hangs early during boot | ||
---|---|---|---|
Product: | ACPI | Reporter: | Stefan Krause (Stefan.Krause) |
Component: | Other | Assignee: | ykzhao (yakui.zhao) |
Status: | CLOSED DUPLICATE | ||
Severity: | normal | CC: | acpi-bugzilla, lenb, rui.zhang, tj, yakui.zhao, yggdrasil8722 |
Priority: | P1 | ||
Hardware: | All | ||
OS: | Linux | ||
Kernel Version: | Subsystem: | ||
Regression: | Yes | Bisected commit-id: | |
Attachments: |
Screenshot of acpi=verbose
Dmesg of sucessful boot dmidecode lscpi -vnvn output of acpidump output of cpuinfo maybe related error message |
Description
Stefan Krause
2009-09-28 18:16:40 UTC
Created attachment 23197 [details]
Screenshot of acpi=verbose
Created attachment 23198 [details]
Dmesg of sucessful boot
Created attachment 23199 [details]
dmidecode
Created attachment 23200 [details]
lscpi -vnvn
I just tried with 2.6.32-rc2 and it has the same problem. I needed to start my laptop three times to boot sucessfully. Will you please try the following boot option and see whether the issue still exists? a. nolapic_timer b. idle=poll c. processor.max_cstate=1 will you please also attach the output of acpidump? Thanks. Will you please also attach the output of "cat /proc/cpuinfo"? Thanks. Thanks for your reply. I got the following results: nolapic_timer: First attempt failed idle=poll: Failed on the fifth boot attempt processor.max_cstate=1: Booted 6 times without problems! I've attached what you requested (I booted without quiet, acpi or any of the aforementioned options). Created attachment 23202 [details]
output of acpidump
Created attachment 23203 [details]
output of cpuinfo
Will you please double check the boot option of "idle=poll"? I can't believe that the box can be booted with "processor.max_cstate=1" but it fails in the boot option of "idle=poll". When the boot option of "idle=poll" is added, the C-state is totally disabled. Thanks. I'm afraid to say you're right with your assumption. I ran the tests again with "idle=poll" and it failed on the eighth attempt. Then I tried processor.max_cstate=1 again and it failed this time the second and third time. The last message was something like "sata_sil24 0000:20:00.0 PCI INT A -> GSI 19 (level,low) -> IRQ 19 I checked a few more things: * So far I could boot 25 times without any problems with acpi=off (but of course this is not a valid workaround) * I also tried acpi=noirq, but it doesn't help. The last message was 'ACPI Error: No handler or method for GPE[19] disabling event (20090903/evgpe-706)' before the system hang. All in all this is very bad. I'll probably switch back to Fedora 11 and kernel 2.6.30 where I haven't had any problems with booting so far. I tried a few more thing. I switched from the 64bit to the 32bit version of Ubuntu 9.10. Until now I booted about 15 times and haven't had any problems yet. I'll post an comment if the 32 bit version hangs on boot too. Just to confirm my findings: The 32 bit version of Ubuntu 9.10 keeps working without any problems, the 64 bit RC hangs on boot with acpi enabled most of the time. I also have an HP HDX 9300, Each time I boot in 64bit Karmic Koala, I only have a 1/8 chance of a successful bootup. Trying to find a solution as quickly as possible. If successful, I'll be sure to post it. Update: I'm still running Ubuntu 9.10 32 bit and it just boots fine. Recently I decided to install OpenSuse 11.2 64 bit. The installed version boots fine so far (which is really odd, since the 64 bit versions of both Ubuntu 9.10 and Fedora 12 don't), but during installation I saw one error message that I'd like to post here since it might be related to that bug. As you can see in the screenshot (titled "maybe related error message") the ata_piix message with some interrupt info is there (... -> IRQ 16), but the rest is something I haven't seen before - a hardware error for the CPUs. Could this be the issue I'm seeing with the other 64 bit distributions? (If so, why is it printed just during installation and never during a normal boot?) Created attachment 24418 [details]
maybe related error message
Hi, Stefan From the info in comment #18 it seems that the kernel panic happens. And it is caused by the MCE check. >mce_machine_check >mce_panic I am not sure whether this is related with BIOS. Can you try the boot option of "maxcpus=1" and see whether it can be booted? thanks. It seems as if I can boot with maxcpus=1 (6 out of 6 boot attempts were successful on Ubuntu lucid Alpha 3 with kernel 2.6.32). The same Ubuntu alpha version boots fine with the 32 bit kernel. Bad news! I recently switched to Ubuntu lucid 32 bit and my system hang a few times when I booted (uname -a: Linux hape 2.6.32-17-generic #26-Ubuntu SMP Fri Mar 19 23:58:53 UTC 2010 i686 GNU/Linux). This is really very bad news since I booted 2.6.31 a few hundred times and had no problems with 32 bit, but now 2.6.32-17 is causing problems with the 32 bit version too. The boot process stopped saying someting like: uhci_hcd 0000:00:1a.0: PCI INT A -> GSI 16 (level, low) -> IRQ 16 In a successful boot these are the next lines in dmesg. uhci_hcd 0000:00:1a.0: setting latency timer to 64 uhci_hcd 0000:00:1a.0: UHCI Host Controller uhci_hcd 0000:00:1a.0: new USB bus registered, assigned bus number 3 uhci_hcd 0000:00:1a.0: irq 16, io base 0x00007000 I'll remove the 64 bit only remark from the title. One more comment: I'm using lucid since Alpha 2. I didn't notice any problems with booting until beginning of this week (I'd say before 2.6.32-16 I didn't notice any problems). I currently suspect that this issue is (at least for the 32bit version) not acpi related, but it might be an ata_piix issue. I've posted a new bug report regarding this issue on https://bugzilla.kernel.org/show_bug.cgi?id=15708 and will report back if that should be the cause. HI, Stefan Thanks for the updating. From the log info it seems that the hang issue is not related with ACPI. So this bug will be marked as the dup of bug 15708. thanks. *** This bug has been marked as a duplicate of bug 15708 *** ykzhao, it isn't clear where the issue is. At this point, I'm highly skeptical this is something caused by something in libata proper. It definitely looks like something at much lower level is broken. It might not be acpi but it's not like we have any evidence to conclusively rule out anything at this point. I don't really mind which bug we use to track the issue but it would be great if you can stick around. Stefan, has the MCE error happened again? The problem could be hardware related and we might be just seeing essentially lucky and unlucky code / data / whatever layouts depending on specific build. How many memory sticks do you have on the machine? If there are more than one, can you please remove one memory stick, see whether the problem is reproducible and if so swap with the other one and see whether anything changes? Thanks. |