Bug 11240 - 2.6.26.1 boot hang unless "nolapic_timer" - acer ferrari 1100, AMD X2, SB600
Summary: 2.6.26.1 boot hang unless "nolapic_timer" - acer ferrari 1100, AMD X2, SB600
Status: REJECTED INVALID
Alias: None
Product: ACPI
Classification: Unclassified
Component: Other (show other bugs)
Hardware: All Linux
: P1 blocking
Assignee: ykzhao
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2008-08-03 04:01 UTC by Rus
Modified: 2008-11-06 19:02 UTC (History)
4 users (show)

See Also:
Kernel Version: 2.6.26.1
Subsystem:
Regression: Yes
Bisected commit-id:


Attachments
acpidump (text version) (199.53 KB, text/plain)
2008-08-04 00:57 UTC, Rus
Details
acpidump (binary version) (43.02 KB, application/octet-stream)
2008-08-04 00:57 UTC, Rus
Details
2.6.26.1 hang screenshot (391.36 KB, image/jpeg)
2008-08-04 00:58 UTC, Rus
Details
acpidump diff (BIOS 1.06 -> 1.07) (1.11 KB, text/x-patch)
2008-08-04 02:10 UTC, Rus
Details
dmesg 2.6.25.10 (29.32 KB, text/plain)
2008-08-04 02:10 UTC, Rus
Details
2.6.26.1 hang screenshot with new 1.07 BIOS (72.01 KB, image/jpeg)
2008-08-04 08:07 UTC, Rus
Details
2.6.26.1 hang screenshot with acpi.debug_layer=0x00010000 acpi.debug_level=0x17 (123.26 KB, image/jpeg)
2008-08-05 22:42 UTC, Rus
Details
2.6.26.1 hang screenshot with acpi.debug_layer=0x00010000 acpi.debug_level=0x17 and initcall_debug (136.66 KB, image/jpeg)
2008-08-05 22:42 UTC, Rus
Details
2.6.25.10 dmesg with acpi.debug_layer=0x00010000 acpi.debug_level=0x17 (44.80 KB, application/octet-stream)
2008-08-05 22:46 UTC, Rus
Details
2.6.27-rc2 dmesg (36.71 KB, application/octet-stream)
2008-08-06 05:46 UTC, Rus
Details
try the debug patch on the kernel of 2.6.25.10 (3.58 KB, patch)
2008-08-14 20:26 UTC, ykzhao
Details | Diff
2.6.26.1 booted with nosmp kernel option (30.29 KB, application/octet-stream)
2008-08-15 04:26 UTC, Rus
Details
2.6.26.1 booted with noapic maxcpus=1 (120.01 KB, image/jpeg)
2008-08-17 20:01 UTC, Rus
Details
2.6.26.1 dmesg booted with nolapic_timer (31.24 KB, application/octet-stream)
2008-08-17 20:02 UTC, Rus
Details

Description Rus 2008-08-03 04:01:56 UTC
Latest working kernel version: 2.6.25.10
Earliest failing kernel version: 2.6.26.1
Distribution: slackware-current
Hardware Environment: mobo acer ferrari 1100, AMD X2, SB600 4GB RAM
Software Environment:  Binutils - 2.18.50.0.8.20080709, gcc-4.3.1
Problem Description: freeze on boot, if booted with initcall_debug - hang occured in acpi_scan_init. no sysrq working.

Steps to reproduce: boot the kernel
Comment 1 Rus 2008-08-03 04:03:45 UTC
can supply any additional info. mobo has no serial port, so can submit only photo or network console logs.
Comment 2 ykzhao 2008-08-03 22:45:42 UTC
Will you please boot the system with acpi disabled and attach the output of acpidump?

It seems that is is a regression. Will you please use git-bisect to identify which commit causes this regression?

Will you please capture the picture when the system hangs in the boot phase?
Thanks.
Comment 3 Rus 2008-08-04 00:55:45 UTC
With acpi=off 2.6.26.1 kernel doesn't find sata disk, so boot is failed. I've attached acpidumps made under working 2.6.25.10 kernel. Picture (sorry for quality) is attached too - it hangs at:

....
calling acpi_scan_init+0x0/0xed
....

I can't bisect, but can insert any debug output in acpi_scan_init or patch the code for getting needed info.  
Comment 4 Rus 2008-08-04 00:57:27 UTC
Created attachment 17070 [details]
acpidump (text version)
Comment 5 Rus 2008-08-04 00:57:53 UTC
Created attachment 17071 [details]
acpidump (binary version)
Comment 6 Rus 2008-08-04 00:58:32 UTC
Created attachment 17072 [details]
2.6.26.1 hang screenshot
Comment 7 Rus 2008-08-04 01:01:45 UTC
Oh, have found that Acer released 29 July a new version (1.07) of the bios for this mobo - currently flashing - will post results. 
Comment 8 ykzhao 2008-08-04 01:16:51 UTC
Will you please attach the output of dmesg on the kernel of 2.6.25.10?
Thank.s
Comment 9 Rus 2008-08-04 02:09:39 UTC
With the new BIOS 2.6.26.1 still hangs in the same place. acpidump.diff for the new BIOS is attached. dmesg is attached too
Comment 10 Rus 2008-08-04 02:10:22 UTC
Created attachment 17073 [details]
acpidump diff (BIOS 1.06 -> 1.07)
Comment 11 Rus 2008-08-04 02:10:46 UTC
Created attachment 17074 [details]
dmesg 2.6.25.10
Comment 12 Rus 2008-08-04 08:03:43 UTC
Noticed after flashing new BIOS and power-off/power-on cycle that hangs in 2.6.26.1 is occuring in another place. screenshot attached.
Comment 13 Rus 2008-08-04 08:07:35 UTC
Created attachment 17079 [details]
2.6.26.1 hang screenshot with new 1.07 BIOS
Comment 14 ykzhao 2008-08-05 21:02:21 UTC
Will you please enable CONFIG_ACPI_DEBUG in kernel configuration and boot the system with the option of " acpi.debug_layer=0x00010000 acpi.debug_level=0x17"?
Please test it on the kernel of 2.6.25.10 and 2.6.26.1.

If the system hangs, please attach the screenshot. If not ,please attach the output of dmesg.

It is noted that you had better use the old BIOS.

Thanks.
Comment 15 Rus 2008-08-05 22:26:24 UTC
It is not possible on this brain-dead laptop to downgrade BIOS ;) But 2.6.25.10 is working ok on both new and old BIOS. Compiling now kernel, will post results shortly ...
Comment 16 Rus 2008-08-05 22:42:00 UTC
Created attachment 17097 [details]
2.6.26.1 hang screenshot with acpi.debug_layer=0x00010000 acpi.debug_level=0x17
Comment 17 Rus 2008-08-05 22:42:29 UTC
Created attachment 17098 [details]
2.6.26.1 hang screenshot with acpi.debug_layer=0x00010000 acpi.debug_level=0x17 and initcall_debug
Comment 18 Rus 2008-08-05 22:46:54 UTC
Created attachment 17099 [details]
2.6.25.10 dmesg with acpi.debug_layer=0x00010000 acpi.debug_level=0x17
Comment 19 Rus 2008-08-05 22:48:50 UTC
Tell me if you need netconsole logs of 2.6.26.1 hangs with debug output, may be screenshots not too informative.
Comment 20 Rus 2008-08-06 05:45:12 UTC
Forget about netconsole, it is unlikely it will work. Today I've compiled new shiny 2.6.27-rc2 - it is booted perfectly (excluding small ahci bug). dmesg is attached. Now I don't need 2.6.26.x ;), so the bug may be closed as rare case (?)
Comment 21 Rus 2008-08-06 05:46:26 UTC
Created attachment 17103 [details]
2.6.27-rc2 dmesg
Comment 22 Adrian Bunk 2008-08-06 09:02:37 UTC
Thanks for this update.
Comment 23 Andrew Morton 2008-08-07 11:12:24 UTC
I've reopened this because as far as I can tell the regression is
still present in 2.6.26.x.  harbour@sfinx.od.ua is unable to make progress
with http://bugzilla.kernel.org/show_bug.cgi?id=11262 because 2.6.26 doesn't
work.

Do we know which patch fixed 2.6.27.x?
Comment 24 ykzhao 2008-08-13 03:48:41 UTC
Hi, Rus
   From the 2.6.25.10 dmesg we can get the following message:
  >  Clockevents: could not switch to one-shot mode:<6>Clockevents: 
  >could not    switch to one-shot mode: lapic is not functional.
  > Could not switch to high resolution mode on CPU 1
  > lapic is not functional.
 >Could not switch to high resolution mode on CPU 0
   
   Will you please confirm whether the different .config file is used on the different kernel?
   Please add the boot option of "nohz=off highres=off" on 2.6.26.x kernel and see whether the system still hangs.
   Thanks.
Comment 25 Rus 2008-08-13 19:16:59 UTC
2.6.25/26 cant enable high resolution timers with Turion X2 C1E, as local APIC timers stop in C1E state, please see my invalid bugreport http://bugzilla.kernel.org/show_bug.cgi?id=10986, highres started working on this hardware only from 2.6.27 kernel.
 Anyway I've tested 2.6.26.1 with "nohz=off highres=off" - it freezes in the same place. As the new kernels >= 2.6.27* works, I've propose to close this bug as rare hardware one (bios made for vista only).
Comment 26 Rus 2008-08-13 19:17:42 UTC
Forgot to say - config for all (2.6.25/26/27) kernels is the same.
Comment 27 ykzhao 2008-08-14 02:38:34 UTC
Hi, Rus
    Thanks for the reminder and test.
    But from the screenshot in comment #17 it seems that the system doesn't hang in acpi_scan_init any more. Instead it hangs in the function of genl_init.
    Will you please double check it?
    Will you please use git-bisect to find which commit causes the regression between 2.6.25.10 and 2.6.26.1? Although the system can work well on the kernel of 2.6.27-rc2, maybe it will be better to find the root cause.
    
    Appreciate your efforts.
    thanks.
Comment 28 Rus 2008-08-14 03:57:33 UTC
Yes, it depends of the laptop bios version:

Bios v1.06 - kernel was hanging in acpi_scan_init
Bios v1.07 - kernel hangs in genl_init

Sorry, I cant't bisect 2.6.26.1.
Comment 29 ykzhao 2008-08-14 20:13:58 UTC
Hi, Rus
    Thanks for the reply. But I am still confused about the hangs in acpi_scan_init (BIOS v1.06). After checking the change log about the acpi_scan_init between 2.6.25.10 and 2.6.26.1, only a very few commits are merged. The difference in the function of acpi_scan_init between 2.6.25.10 and 2.6.26.1 is that the _PSW/_DSW control method will be called in the boot phase.
    
    Will you please revert the following commit on the 2.6.26.1 kernel and see whether the problem still exists? (If possible, please try it on 1.06 BIOS)
   > commit 729b2bdbfa19dd9be98dbd49caf2773b3271cc24
   > Author: Zhao Yakui <yakui.zhao@intel.com>
   > Date:   Wed Mar 19 13:26:54 2008 +0800
   > ACPI : Disable the device's ability to wake the sleeping system in the boot phase
   
   Thanks.
Comment 30 ykzhao 2008-08-14 20:26:28 UTC
Created attachment 17254 [details]
try the debug patch on the kernel of 2.6.25.10

HI, Rus
    Maybe it is not very easy to revert the commit 729b2bdbfa19dd9be98dbd49caf2773b3271cc24. 
    Will you please try the attached debug patch on the kernel of 2.6.25.10 and see whether the system hangs in the function of acpi_scan_init?
    In BIOS v1.07: The commit is already included in the 2.6.27-rc2/2.6.26.1 kernel. The system can be booted normally on the kernel of 2.6.27-rc2. Although the system hangs on 2.6.26.1 kernel, it hangs in the function of genl_init instead of acpi_scan_init.
    So it will be great that you can test it on BIOS v1.06.
        
    thanks.
Comment 31 Rus 2008-08-15 04:16:25 UTC
 I've flashed back the v1.06 BIOS, applied attached patch for 2.6.25.10 and
successfully boot it - no hangs. Double checked 2.6.26.1 - it still hangs, but now in genl_init _only_, even on v1.06 ! Seems like flashing new bios changed something in laptop that can't be reverted by flashing old one.
Comment 32 Rus 2008-08-15 04:26:01 UTC
Found that 2.6.26.1 boots with nosmp kernel boot option. Dmesg attached.
Comment 33 Rus 2008-08-15 04:26:54 UTC
Created attachment 17265 [details]
2.6.26.1 booted with nosmp kernel option
Comment 34 ykzhao 2008-08-17 08:11:03 UTC
Hi, Rus
    Thanks for the test. It seems that the system will hang in the function of genl_init instead of acpi_scan_init.
    It is very interesting that the 2.6.26.1 kernel can be booted with the option of "nosmp".
    Will you please try the following options on 2.6.26.1 kernel? 
    a. noapic maxcpus=1 
    b. processor.max_cstate=1( The processor driver in drivers/acpi/ should be built in kernel)
    c. nolapic_timer 
    
    thanks.
    
Comment 35 Rus 2008-08-17 20:00:22 UTC
Exactly not - I've got single hang again in acpi_scan_init, but hangs in
genl_init occures more often.

a) hanged in ide_scan_pcibus, sreenshot attached
b) hanged in genl_init as usual
c) booted ok, dmesg attached
Comment 36 Rus 2008-08-17 20:01:55 UTC
Created attachment 17286 [details]
2.6.26.1 booted with noapic maxcpus=1
Comment 37 Rus 2008-08-17 20:02:42 UTC
Created attachment 17287 [details]
2.6.26.1 dmesg booted with nolapic_timer
Comment 38 Len Brown 2008-08-22 11:10:58 UTC
Does "idle=poll" allow the 2.6.26 system to boot w/o hanging?
(if yes, then this is surely an issue with the lapic timer workaround)
Comment 39 Rus 2008-08-22 11:54:15 UTC
Yes, "idle=poll" allow normal 2.6.26.1 boot too.
Comment 40 ykzhao 2008-08-26 20:27:35 UTC
Hi, Len
   What you said is right. It seems that the system can be booted very normally if the boot option of "nolapic_timer" is added. Maybe this is an issue related with the lapic timer workaround. As there exists such an issue, the C-state is also affected.
   How can we to solve this problem? Is it appropriate to add the DMI check to disable lapic timer on such laptops? Or the max C-state is limited to C1?
   thanks.
   
Comment 41 ykzhao 2008-09-04 23:59:13 UTC
Hi, Rus
   Will you please try the latest vanilla kernel(2.6.27-rc5) and see whether the system can work well?
   Please don't add the boot option of "nolapic_timer" or "processor.max_cstate=".
   Thanks.
Comment 42 Rus 2008-09-05 00:26:49 UTC
As said kernel starting from 2.6.27 is working ok on this system. I'm running all recent rc's as they appears.
Comment 43 ykzhao 2008-09-05 01:02:18 UTC
   In fact this bug is related with local APIC timer (in 2.6.26.1 kernel). The boot option of "nolapic_timer" can make the system work well. 
   As the issue is already fixed in the kernel 2.6.27, the bug will be marked as resovled. 
Comment 44 Len Brown 2008-10-16 18:02:54 UTC
The reason this bug report is not closed is that we have not
identified why 2.6.26.1 broke, and why 2.6.27 works.

In the mean-time, 2.6.26.6 has been released.
Please test it, and if it works, we don't care about
2.6.26.1 any more and we can close this report.
Comment 45 Rus 2008-10-20 02:11:36 UTC
Sorry, I can't test 2.6.26.x on this hardware more.

Note You need to log in before you can comment on or make changes to this bug.