Bug 10377

Summary: Kernel usually freezes during boot when AC is unplugged - unless hpet=disable - Asus A6JC - 2.6.25 regression
Product: ACPI Reporter: Roman Jarosz (kedgedev)
Component: Power-ProcessorAssignee: Venkatesh Pallipadi (venki)
Status: REJECTED INSUFFICIENT_DATA    
Severity: high CC: acpi-bugzilla, bunk, crmafra, rui.zhang
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: 2.6.25-rc8 Subsystem:
Regression: Yes Bisected commit-id:
Bug Depends on:    
Bug Blocks: 9832    
Attachments: Boot log when AC was plugged in
Boot freeze screenshot
2.6.25-rc8-git5 on DC
2.6.25-rc8-git5 on DC with CONFIG_CPU_IDLE=n
2.6.25-rc8-git5 on AC
kernel config
cpuidle test patch
Boot log with patch

Description Roman Jarosz 2008-04-01 16:23:55 UTC
Latest working kernel version:2.6.24.4
Earliest failing kernel version: 2.6.25-rc1
Distribution: Gentoo
Hardware Environment: NB Asus A6JC
2x Genuine Intel(R) CPU T2300 @ 1.66GHz
Software Environment: gcc 4.2.3

Problem Description:
Kernel freezes during boot if AC is unplugged. I doesn't freeze every time but very often. It freezes after this two lines:
Apr  2 00:46:15 kedgecomp ata1: PATA max UDMA/100 cmd 0x1f0 ctl 0x3f6 bmdma 0xffa0 irq 14
Apr  2 00:46:15 kedgecomp ata2: PATA max UDMA/100 cmd 0x170 ctl 0x376 bmdma 0xffa8 irq 15

If I boot with debug parameter then it prints one more line and then freezes.
Apr  2 00:46:15 kedgecomp Switched to high resolution mode on CPU 1

If kernel is compiled without CONFIG_CPU_IDLE then it boots.

I've also tried to revert
http://git.kernel.org/gitweb.cgi?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=9a0b841586c3c6c846effdbe75885c2ebc0031b0
or boot with processor.max_cstate=1 as Venkatesh Pallipadi suggested in bug 10093, but it didn't help

Also if it doesn't freeze during boot with unplugged AC then it often
freezes when Gentoo prints "Setting system clock using the hardware clock UTC"
the clock script uses "/sbin/hwclock"

Any suggestions how to fix it?

Regards,
Roman
Comment 1 Roman Jarosz 2008-04-01 16:26:29 UTC
Created attachment 15556 [details]
Boot log when AC was plugged in
Comment 2 Rafael J. Wysocki 2008-04-02 10:17:07 UTC
This entry is being used for tracking a regression from 2.6.24.  Please don't
close it until the problem is fixed in the mainline.
Comment 3 Roman Jarosz 2008-04-04 05:22:15 UTC
Still freezes with 2.6.25-rc8-git3
Comment 4 Zhang Rui 2008-04-06 20:10:41 UTC
can you get the log using a serial console? Or take a picture when the kernel freezes. :)
Comment 5 Roman Jarosz 2008-04-07 01:47:02 UTC
Created attachment 15643 [details]
Boot freeze screenshot

I don't have serial console so here's a screenshot.
This is with kernel 2.6.25-rc8-git5.
Comment 6 Len Brown 2008-04-09 18:40:05 UTC
When running on AC, this system does not export C3.
eg. the dmesg shows

CPU0 (power states: C1[C1] C2[C2])

but when booting on DC, the screenshot shows that C3 is present.

This suggests that processor.max_cstate=2 is a good candidate
for a workaround.  Please boot on DC with CONFIG_ACPI_PROCESSOR=y
(not =m) and this cmdline param and see if the hang goes away.
If it does, then CPU_IDLE's C3 code is implicated.

In any event, a hang in the device probe part of kernel boot
usually smells like an interrupt problem...
Comment 7 Len Brown 2008-04-09 18:43:17 UTC
With CONFIG_CPU_IDLE=n, and CONFIG_ACPI_PROCESSOR=y,
please try processor.bm_history=0
to encourage use of C3 to see if the non CPU_IDLE code can hit this too.
Comment 8 Roman Jarosz 2008-04-10 04:51:45 UTC
the processor.max_cstate=2 didn't help it still freezes

With CONFIG_CPU_IDLE=n, and CONFIG_ACPI_PROCESSOR=y and processor.bm_history=0 it doesn't freeze when running on DC

After etc. 15 attempts I've managed to boot with 2.6.25-rc8-git5 without any parameters running on DC so I'll attach the boot log.
Comment 9 Roman Jarosz 2008-04-10 04:53:43 UTC
Created attachment 15713 [details]
2.6.25-rc8-git5 on DC

This is the log when 2.6.25-rc8-git5 didn't freeze on DC
Comment 10 Roman Jarosz 2008-04-10 04:54:39 UTC
Created attachment 15714 [details]
2.6.25-rc8-git5 on DC with CONFIG_CPU_IDLE=n

Log with CONFIG_CPU_IDLE=n, and CONFIG_ACPI_PROCESSOR=y and processor.bm_history=0 when running on DC
Comment 11 Roman Jarosz 2008-04-10 04:55:20 UTC
Created attachment 15715 [details]
2.6.25-rc8-git5 on AC

2.6.25-rc8-git5 on DC without any parameters
Comment 12 Roman Jarosz 2008-04-10 04:56:05 UTC
(In reply to comment #11)
> Created an attachment (id=15715) [details]
> 2.6.25-rc8-git5 on AC
> 
> 2.6.25-rc8-git5 on DC without any parameters
> 
Should be 2.6.25-rc8-git5 on AC without any parameters
Comment 13 Venkatesh Pallipadi 2008-04-10 10:49:47 UTC
Can you attach the .config you are using...
Comment 14 Roman Jarosz 2008-04-10 12:35:22 UTC
Created attachment 15725 [details]
kernel config
Comment 15 Venkatesh Pallipadi 2008-04-10 13:55:28 UTC
Created attachment 15728 [details]
cpuidle test patch

Can you please try the attached patch and report back. If it hangs, picture of
last few messages will help.
Also, try using vga=6 boot option, which gives us more number of lines of text
on console (may need config changes with VIDEO_SELECT enabled).
Comment 16 Roman Jarosz 2008-04-10 15:13:53 UTC
Created attachment 15729 [details]
Boot log with patch

I've noticed that 
PCI: Setting latency timer of device ... 
and
ata_piix 0000:00:1f.1: version 2.12
lines are missing when the boot freezes when running on DC
Comment 17 Roman Jarosz 2008-04-13 08:18:48 UTC
It looks like the hpet=disable parameter has "fixed" the problem as Carlos R. Mafra suggested. (bug 10117)
dmesg logs are here http://kedge.wz.cz/kernel/
Comment 18 Roman Jarosz 2008-04-13 14:35:00 UTC
I've tried to bisect this and I think I found the commit that broke this.
I think it is:
commit b02aae9cf52956dfe1bec73f77f81a3d05d3902b
Author: Rene Herman <rene.herman@gmail.com>
Date:   Wed Jan 30 13:30:05 2008 +0100
    x86: provide a DMI based port 0x80 I/O delay override.

The source files which are changed by this commit changed a lot so I didn't managed to revert only this patch. But the 4c6b8b4d62fb4cb843c32db71e0a8301039908f3 seems to work.

I've also tried to pass io_delay=0x80 and io_delay=0xed to 2.6.25-rc9 but it didn't help.
Comment 19 Roman Jarosz 2008-04-13 15:02:09 UTC
Ok I was wrong it still freezes but not very often it took 15 boots to freeze
Kernel also prints 3 more lines.

ata1.00: ATA-6: HTS541080G9AT00, MB4OA60A, max UDMA/100
ata1.00: 156301488 sectors, multi 16: LBA48
ata1.01: ATAPI: HL-DT-ST DVDRAM GMA-4082N, HJ02, max UDMA/33

(b02aae9cf52956dfe1bec73f77f81a3d05d3902b and newer commit shows the 3 lines too)
Comment 20 Venkatesh Pallipadi 2008-04-14 18:34:05 UTC
OK. These two bugs #10117 and #10377 are looking more and more alike. Markng them duplicate...

*** This bug has been marked as a duplicate of bug 10117 ***
Comment 21 Adrian Bunk 2008-04-15 14:33:34 UTC
This bug is not a duplicate of what was originally tracked in #10117.
Comment 22 Roman Jarosz 2008-04-17 02:14:58 UTC
Still freezes with 2.6.25 :(
Comment 23 Len Brown 2008-04-28 18:58:37 UTC
does this still happen when booted with maxcpus=1?
does this still happen when booted with idle=poll?
Comment 24 Roman Jarosz 2008-04-29 01:41:56 UTC
It doesn't freeze with maxcpus=1 and idle=poll
Comment 25 Roman Jarosz 2008-05-13 15:02:53 UTC
I've tried 2.6.26-rc1 and 2.6.26-rc2 and both versions freeze "every time", even with AC plugged in.

I've also tried to get some debug info with nmi_watchdog=1 but it doesn't freeze with this parameter. I'm willing to debug this, but I need some help.
Comment 26 Len Brown 2008-05-19 18:42:53 UTC
 
> I've tried 2.6.26-rc1 and 2.6.26-rc2 and both versions 
>freeze "every time", even with AC plugged in.

ouch!
does hpet=disable still fix all cases?
Comment 27 Roman Jarosz 2008-05-20 01:03:48 UTC
yes, hpet=disable does fix it.

btw. 2.6.26-rc3 freezes too
Comment 28 Zhang Rui 2008-08-28 01:29:19 UTC
(In reply to comment #27)
> yes, hpet=disable does fix it.
> 
> btw. 2.6.26-rc3 freezes too
you probably mean 2.6.26-rc3 freezes if no using hpet=disable here, right?

Does the problem still exists in the latest kernel?

(In reply to comment #24)
> It doesn't freeze with maxcpus=1 and idle=poll
idle=poll should workaround this issue.
does it freeze with maxcpus=1 only? 
Comment 29 Roman Jarosz 2008-08-28 01:50:04 UTC
(In reply to comment #28)
> (In reply to comment #27)
> > yes, hpet=disable does fix it.
> > 
> > btw. 2.6.26-rc3 freezes too
> you probably mean 2.6.26-rc3 freezes if no using hpet=disable here, right?

Yes

> Does the problem still exists in the latest kernel?
> 
> (In reply to comment #24)
> > It doesn't freeze with maxcpus=1 and idle=poll
> idle=poll should workaround this issue.
> does it freeze with maxcpus=1 only? 
> 
IIRC I've tested maxcpus=1 and idle=poll separately and it didn't freeze
so no it doesn't freeze with maxcpus=1 only.

Btw I can't test it right now because I got new laptop and
removed the system from old one, but if anybody is willing
to debug this with me I can install gentoo on the old laptop
again.
Comment 30 Shaohua 2008-10-14 22:31:10 UTC
can you please try a latest kerenel? There are a lot of time/c-state related fixes in recent kernel.
Comment 31 Zhang Rui 2008-11-16 22:47:05 UTC
no response from the bug reporter.
Roman, please re-open it if the problem still exists in the latest kernel.