Bug 11427 - AMD systems with c1e enabled freeze without "nolapic_timer" option
Summary: AMD systems with c1e enabled freeze without "nolapic_timer" option
Status: CLOSED CODE_FIX
Alias: None
Product: Timers
Classification: Unclassified
Component: Other (show other bugs)
Hardware: All Linux
: P1 normal
Assignee: john stultz
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2008-08-25 17:04 UTC by Chuck Ebbert
Modified: 2008-09-04 11:21 UTC (History)
4 users (show)

See Also:
Kernel Version: 2.6.26
Subsystem:
Regression: Yes
Bisected commit-id:


Attachments
pic/lapic/apic dump from system that stalls (5.03 KB, text/plain)
2008-08-25 19:57 UTC, Chuck Ebbert
Details
boot messages with apic=verbose (32.63 KB, text/plain)
2008-08-27 07:57 UTC, Chuck Ebbert
Details
/proc/timer_list from 2.6.25 (3.07 KB, text/plain)
2008-08-27 14:57 UTC, Chuck Ebbert
Details
/proc/timer_list from 2.6.26 (3.66 KB, text/plain)
2008-08-27 14:58 UTC, Chuck Ebbert
Details
Proposed patch (1.47 KB, text/plain)
2008-08-30 21:54 UTC, Chuck Ebbert
Details

Description Chuck Ebbert 2008-08-25 17:04:46 UTC
Latest working kernel version:
2.6.25.11

Earliest failing kernel version:
2.6.26.3

Distribution:
Fedora 8

Hardware Environment:
Acer Aspire 5102
AMD RS480 chipset, Turion dual-core CPU

Software Environment:
x86 kernel compiled for i686 32-bit CPU

Problem Description:
Local APIC timer does not get disabled because it is set up and marked operational before c1e gets detected on the second CPU. You can sometimes make progress by pressing keys but the system can get stuck in __do_softirq() and must be powered off manually.

In 2.6.25.11, c1e gets detected before the timers are set up.

Adding "nolapic_timer" option fixes it.
Comment 1 Andrew Morton 2008-08-25 17:14:20 UTC
Marked as a regression.  Ingo, Thomas, please take a peek?

Chuck, a bisection might help here.  Could be some acpi thing..
Comment 2 Chuck Ebbert 2008-08-25 19:57:59 UTC
Created attachment 17448 [details]
pic/lapic/apic dump from system that stalls
Comment 3 Chuck Ebbert 2008-08-25 19:59:23 UTC
System goes into nohz mode by default and stalls, but will make progress if keys are pressed.  Adding 'nohz=off highres=off' makes it lock up completely after a while.
Comment 4 Thomas Gleixner 2008-08-26 00:50:49 UTC
Chuck, can you please provide a boot log of the failing kernel with
apic=verbose on the kernel command line ?

Thanks,
	tglx
Comment 5 Chuck Ebbert 2008-08-27 07:57:44 UTC
Created attachment 17481 [details]
boot messages with apic=verbose

(The attachment in comment #2 is the dump from using apic=debug together with Maciej's patch that re-enables that option.)
Comment 6 Thomas Gleixner 2008-08-27 11:10:38 UTC
You confuse me :)

comment #1
Local APIC timer does not get disabled because it is set up and marked
operational before c1e gets detected on the second CPU.

Bootlog:
AMD C1E detected late. 	Force timer broadcast.

Here we disable the local APIC timer when we detect C1E on the second CPU.
That code was not changed between .25 and .26

Can you please provide the output of /proc/timer_list for .25 and .26 ?
Comment 7 Chuck Ebbert 2008-08-27 14:54:43 UTC
> Bootlog:
> AMD C1E detected late.  Force timer broadcast.

All that does is set local_apic_timer_disabled = 1. But the timer setup code has already checked that earlier (in .26; .25 set up the timers later) and decided to set up the local apic timer as a real timer instead of as a dummy one.
Comment 8 Chuck Ebbert 2008-08-27 14:57:30 UTC
Created attachment 17490 [details]
/proc/timer_list from 2.6.25
Comment 9 Chuck Ebbert 2008-08-27 14:58:07 UTC
Created attachment 17491 [details]
/proc/timer_list from 2.6.26
Comment 10 Thomas Gleixner 2008-08-27 15:30:32 UTC
sigh, so we need the same logic as we have in the 64bit tree - to force the timer broadcast of the boot cpu. Will cook a patch tomorrow morning.
Comment 11 Chuck Ebbert 2008-08-30 21:54:52 UTC
Created attachment 17546 [details]
Proposed patch

Patch builds and doesn't break non-AMD machines.
Will test on the affected machine when I get a Fedora 8 kernel built.
Comment 12 Chuck Ebbert 2008-08-31 07:11:37 UTC
The patch works on my system. Thomas, can you ack/signoff on it so I can send it to -stable?
Comment 13 Thomas Gleixner 2008-08-31 14:09:31 UTC
> ------- Comment #12 from cebbert@redhat.com  2008-08-31 07:11 -------
> The patch works on my system. Thomas, can you ack/signoff on it so I can send
> it to -stable?

Please add

Acked-by: Thomas Gleixner <tglx@linutronix.de>

Thanks for debugging this,

      tglx

Note You need to log in before you can comment on or make changes to this bug.