Bug 5303 - AMD64 Erratum: Should not enable C2 when using APIC
Summary: AMD64 Erratum: Should not enable C2 when using APIC
Status: CLOSED CODE_FIX
Alias: None
Product: ACPI
Classification: Unclassified
Component: BIOS (show other bugs)
Hardware: i386 Linux
: P2 normal
Assignee: Alexey Starikovskiy
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2005-09-24 06:27 UTC by Bertro Simul
Modified: 2008-01-09 12:01 UTC (History)
10 users (show)

See Also:
Kernel Version: 2.6.12, 2.6.13.2
Subsystem:
Regression: ---
Bisected commit-id:


Attachments
Lost ticks (225.93 KB, text/plain)
2005-09-24 06:39 UTC, Bertro Simul
Details
dmidecode (14.28 KB, text/plain)
2005-09-29 12:57 UTC, Bertro Simul
Details
Lost ticks with different HZ values (8.86 KB, text/plain)
2005-10-02 13:00 UTC, Bertro Simul
Details
Workaround for buggy BIOS (against 2.6.15.4) (10.91 KB, patch)
2006-02-12 08:44 UTC, Bertro Simul
Details | Diff
Set max_cstate for early Opterons (1.38 KB, patch)
2007-06-14 04:11 UTC, Alexey Starikovskiy
Details | Diff

Description Bertro Simul 2005-09-24 06:27:07 UTC
Most recent kernel where this bug did not occur: 2.6.12
Distribution: Gentoo
Hardware Environment: Athlon 64 3500+
Problem Description:

The document at
http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/25759.pdf?info=EXLINK
says in section 78:

     
Comment 1 Bertro Simul 2005-09-24 06:39:23 UTC
Created attachment 6125 [details]
Lost ticks

For reference, acpi_processor_idle+0x143 is
instruction 14aa in the following snippet:

    14a5:	ed			in     (%dx),%eax
    14a6:	ed			in     (%dx),%eax
    14a7:	fb			sti
    14a8:	39 c8			cmp    %ecx,%eax
    14aa:	72 04			jb     14b0 <acpi_processor_idle+0x149>


Here, instruction 14a7 is local_irq_enable() at
drivers/acpi/processor_idle.c, line 294 (where the
processor has just left C2).
Comment 2 Len Brown 2005-09-28 18:45:12 UTC
> time.c: Lost 1 timer tick(s)! rip acpi_processor_idle+0x143/0x31d 
[processor])

is this an SMP system?  It may be that the 2.6.13 support for SMP C2
has triggered a regression in going from 2.6.12 to 2.6.13.

Please confirm that this did or did not happen in 2.6.12, and if it did not
then please attach the output under 2.6.12 of /proc/acpi/processor/*/power

on a failing system, please verify that
# echo 1 > /sys/module/processor/parameters/max_cstate
makes the issue go away.  Alternatively you can boot with 
"processor.max_csate=1"
Comment 3 Bertro Simul 2005-09-29 01:27:11 UTC
No, this is not an SMP system. I
Comment 4 Andi Kleen 2005-09-29 01:46:13 UTC
I complained about this to AMD some time ago (ran into it with my no idle tick
work) and they said it was a BIOS bug that happens only on some BIOS
and is even supposed to be fixed. So first I would suggest a BIOS update.
C2 state does execute SMM code and that probably does something wrong. 
Possibly just needs a DMI blacklist. It's not a generic

Can people seeing this please start attaching dmidecode output.?

Also it's a big sledgehammer to disable C2 because it may make the system
much hotter. If the delays are not too bad it might be better to disable the
message and eat the latency. Do you see the problem still with CONFIG_HZ_250?
With CONFIG_HZ_100?

I guess we need some test for gettimeofday accuracy to evaluate this. Perhaps
John has something handy.
Comment 5 Bertro Simul 2005-09-29 12:57:29 UTC
Created attachment 6189 [details]
dmidecode

The machine is a HP Pavilion k755.de, and the mainboard
has MS-7124 Ver. 1.0 printed on it. I couldn
Comment 6 Bertro Simul 2005-10-02 13:00:28 UTC
Created attachment 6212 [details]
Lost ticks with different HZ values

As promised here are the lost ticks messages
with different values for HZ, namely 100, 250,
and 1000. The kernel was 2.6.12.

I
Comment 7 Bertro Simul 2006-01-28 09:26:25 UTC
I would like to ask about the status of
this bug. Is it possible to get some information
from AMD about this, i.e., whether it
Comment 8 Andi Kleen 2006-01-28 20:53:09 UTC
The bug report is bogus because the Athlon 64 3500 doesn't even have
C2 - only the mobile parts support it. It's probably something else.

Venkatesh - feel free to assign it to me since it is clearly not your problem.
Comment 9 Bertro Simul 2006-01-29 00:42:26 UTC
Uhm, what? You mean the BIOS is so broken that it reports
C2 capabilities to the OS without checking that the
processor supports it? Maybe I should open the case and
see if there
Comment 10 Andi Kleen 2006-01-29 02:03:42 UTC
Ah, never mind - you really seem to have a mobile CPU in a latop.
Those of course have C2. Just the desktop parts don't.

Perhaps Mark Langsdorf can comment on the problem if he's still reading email.
I would be reluctant to always disable c2 on these CPUs. If not I can ping
people at AMD next week and figure out what to do.

Again, Venki please assign to me. I can't do that myself unfortunately.
Comment 11 Mark Langsdorf 2006-01-30 11:54:30 UTC
Does the Linux kernel have an interrupt pending routine to handle pending
interrupts when going into C2?

I've asked some AMD hardware engineers to look at this and see if they can ask
more pertinent questions.
Comment 12 Andi Kleen 2006-01-30 12:31:15 UTC
The idle function starts with interrupts enabled. Anything pending
should be processed then. During the actual C2 and before that
(reading bus master activity, reading start time) interrupts are disabled.
Comment 13 Bertro Simul 2006-01-30 23:57:14 UTC
I would like make it explicit that for testing
purposes I enabled interrupts directly before
the code that does an inb to enter C2 and I still
lost ticks. So the problem seems to be with the
BIOS or the hardware, not with Linux
Comment 14 Andi Kleen 2006-01-31 00:14:24 UTC
Good that you tested that already because I was about to suggest it :)

The inl() should cause an SMI calling into the SMM BIOS and that
code is known to often do strange and broken things. I guess we have
to wait for AMD people to comment.
Comment 15 Rahul Tikoo 2006-01-31 08:32:51 UTC
When the processor sees a C2 Stop Clock, it generates a Stop Grant message to 
the SB. In such a situation an interrupt that is pending can only be serviced 
after the processor sees a stop clock deaasertion message. This might add 
latency to the interrupt service and hence the warning messages reported by 
Linux. There are 2 ways to get around this - 
1. Interrupt pending message - AMD 64 architecture has implemented an MSR 
which when setup by the BIOS enables our processor to generate an interrupt 
pending hypertransport message to the SB when an interrupt is pending and a 
stop clock is recieved by the processor. This message is followed by the stop 
grant message by the processor. The SB is then supposed to generate a stop 
clock deassert immidiately allowing the processor to service this pending 
interrupt.
2. Some new SB's do not require the interrupt pending message as they perceive 
that an interrupt is pending when a stop clock was sent and hence generate a 
wakeup event and/or stop clock deassert allowing the processor to handle this 
pending interrupt.

I would suggest getting in touch with your hardware manufacturer (HP?) and 
finding out if any of the above are supported.
Comment 16 Andi Kleen 2006-01-31 08:40:51 UTC
Thanks for the information. Unfortunately in practice it's common
that it's not possible to do anything about this from the platform
side (no new BIOS for old systems etc.) 

Can you recommend a less intrusive software workaround other than disabling C2?

I guess one could implement some logic to disable C2/C3 if a few ticks get lost
and see if that changes things.
Comment 17 Bertro Simul 2006-02-01 11:40:51 UTC
Rahul, the MSR you mentioned is c001_0055h, right?
Its value (read through /dev/cpu/0/msr) is
0x33300b0, which means that on a pending interrupt
the processor will write 0x33 to I/O port 0xb0; this
is indeed this BIOS
Comment 18 Andi Kleen 2006-02-01 11:52:41 UTC
Keep it open for now - we do workarounds for common BIOS bugs
(and I know these lost ticks happen on various machines)

Also I would like to understand what your change actually does. I thought
the MSR Rahul mentioned was supposed to be written by hardware, not software,
but maybe I misunderstood things. Rahul?

Comment 19 Rahul Tikoo 2006-02-02 04:04:38 UTC
Int Pending MSR is indeed C001_0055h. ANyone with access to the AMD BKDG 
should be able to get the fiekd definitions also.
This register enables the processor to send a message to the I/O hub that 
results in the pending interrupt being serviced. The two message types are the 
IO space message and the HyperTransport INT_PENDING message. HyperTransport 
INT_PENDING message is defined by the HyperTransport 1.05c specification.
If the processor or the I/O hub does not support the INT_PENDING 
HyperTransport message, the IO space message should be selected by IntPndMsg. 
A check for a pending interrupt is performed at the end of an IO instruction. 
If there is a pending interrupt and STPCLK is asserted, the processor
executes a byte-size read or write to IO space defined with IORd, IOMsgAddr, 
and IOMsgData (used only for IO writes) to generate an SMI. The SMI wakes up 
the processor so the original pending
interrupt can be serviced. The SMI handler should not take any action if the 
SMI is generated by this mechanism. In order to prevent SMI generation with 
this mechanism in the SMI handler,IntrPndMsgDis bit should be set in the SMI 
handler before the first IO instruction is executed, and itshould be cleared 
prior to resuming from SMM.
If the processor and the I/O hub support the INT_PENDING HyperTransport 
message, it should be selected by IntPndMsg bit. The check for a pending 
interrupt is performed when entering the stop grant state.

This MSR is writable by software and can be used to enable both of the above 
defined methods. SMI method needs a BIOS SMI handler implementation that 
supports it and the Int_Pending HT message needs a SB that is HT spec 1.05c 
compliant.

Andy, if you want to apply a OS workaround for such problems, one 
implementation could be to enable int_pending message support in the OS. One 
way I can think of doing this is to force c2/c3 transitions, enable 
int_pending messages - SMI method or HT message method and check if there are 
lost ticks. the method that does not yield lost ticks can be implemented on 
that platform. I know I make it sound very simplistic where as the 
implementation might not be that simple.
Comment 20 Mark Langsdorf 2006-02-02 04:08:36 UTC
I will be on a 2 month sabbatical starting February 2nd, 2006 and not returning April 12thl.  I will not be reachable by email or phone while on sabbatical. 

If you have technical concerns on general Linux issues, please direct them to David Keck and Jacob Shin (david.keck@amd.com and jacob.shin@amd.com).

If you have an issue dealing with Red Hat, please address it to Bhavana Nagendra (bhavana.nagendra@amd.com).

If you have an issue dealing with Xen, please address it to Tom Woller (thomas.woller@amd.com).

For questions about AMD's strategic relationship with Linux, please email Rich Brunner (richard.brunner@amd.com).

I will deal with selected emails when I return.

-Mark Langsdorf
Linux Validation Tools and Support
AMD, Inc.
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2//EN">
<HTML>
<HEAD>
<META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=iso-8859-1">
<META NAME="Generator" CONTENT="MS Exchange Server version 6.5.7232.82">
<TITLE>Out of Office AutoReply: [Bug 5303] AMD64 Erratum: Should not enable C2 when using APIC</TITLE>
</HEAD>
<BODY>
<!-- Converted from text/plain format -->

<P><FONT SIZE=2>I will be on a 2 month sabbatical starting February 2nd, 2006 and not returning April 12thl.&nbsp; I will not be reachable by email or phone while on sabbatical.<BR>
<BR>
If you have technical concerns on general Linux issues, please direct them to David Keck and Jacob Shin (david.keck@amd.com and jacob.shin@amd.com).<BR>
<BR>
If you have an issue dealing with Red Hat, please address it to Bhavana Nagendra (bhavana.nagendra@amd.com).<BR>
<BR>
If you have an issue dealing with Xen, please address it to Tom Woller (thomas.woller@amd.com).<BR>
<BR>
For questions about AMD's strategic relationship with Linux, please email Rich Brunner (richard.brunner@amd.com).<BR>
<BR>
I will deal with selected emails when I return.<BR>
<BR>
-Mark Langsdorf<BR>
Linux Validation Tools and Support<BR>
AMD, Inc.<BR>
</FONT>
</P>

</BODY>
</HTML>
Comment 21 Bertro Simul 2006-02-12 08:44:04 UTC
Created attachment 7301 [details]
Workaround for buggy BIOS (against 2.6.15.4)

First take at a workaround. It is enabled
by passing an option to the ACPI processor
module.

This patch tests for the setup of MSR C001_0055h
at entering acpi_processor_idle(). I could not
make it part of acpi_processor_start() because
I don
Comment 22 Andi Kleen 2006-10-22 06:45:51 UTC
Revisiting this.

I still would prefer to fix this in the BIOS instead of applying
this ugly (sorry, Bertro) patch. 

Mark, does AMD have an opinion on this issue?
 
Comment 23 Mark Langsdorf 2006-10-30 08:30:38 UTC
Fix in BIOS if at all possible continues to be AMD's position.
Comment 24 Joachim Deguara 2007-02-27 08:02:30 UTC
I saw this bug is still open.
The errata has been fixed in processors Opteron revision C3 and later.  I take
it Bertro had a bad BIOS and a CPU with this errata.  So fix should be that the
BIOS is always perfectly programmed (keep dreaming) and if we want to blacklist
then we could even do it on CPU revision to limit max_cstate=1.
Comment 25 Alexey Starikovskiy 2007-06-14 01:51:03 UTC
Joachim,
I'd like to implement mentioned blacklist, could you be more specific on revision
"Revision Guide for AMD Athlon..." does not mention anything like C3.
Could you give family/model/stepping numbers for these "Opteron revision C3"?(In reply to comment #24)
> I saw this bug is still open.
> The errata has been fixed in processors Opteron revision C3 and later.  I
> take
> it Bertro had a bad BIOS and a CPU with this errata.  So fix should be that
> the
> BIOS is always perfectly programmed (keep dreaming) and if we want to
> blacklist
> then we could even do it on CPU revision to limit max_cstate=1.
Comment 26 Joachim Deguara 2007-06-14 02:42:02 UTC
Alexey, sorry for the confusion I meant revisiong CG and there erratum is number 78 and the Revision Guide can be found here http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/25759.pdf

Thanks for following up on this.
Comment 27 Alexey Starikovskiy 2007-06-14 04:11:43 UTC
Created attachment 11748 [details]
Set max_cstate for early Opterons

Here is a proposed patch to automatically set max_cstate for early Opterons, please check.
Comment 28 Andi Kleen 2007-06-14 07:54:27 UTC
The recent timer code disables APIC timer in some cases.
The c state should be only limited when the APIC timer is actually used.
This code is a lot different now than it was in the kernel where this was
originally reported.

I think the better way is to prefer irq 0 instead of APIC timer for the single core case on these systems and disable dyntick if C > 1 is available.

That's ok because there are no dual core systems with
such early steppings (only E+ is dualcore) and the multi socket system don't support anything deeper than C1 anyways.
Comment 29 Rahul Tikoo 2007-06-14 07:59:35 UTC
Subject: Out of Office AutoReply:  AMD64 Erratum: Should not
 enable C2 when using APIC

I will be out of office till 05/18 and will have intermittent email access. Email responses will be delayed.
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2//EN">
<HTML>
<HEAD>
<META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=iso-8859-1">
<META NAME="Generator" CONTENT="MS Exchange Server version 6.5.7652.24">
<TITLE>Out of Office AutoReply: [Bug 5303] AMD64 Erratum: Should not enable C2 when using APIC</TITLE>
</HEAD>
<BODY>
<!-- Converted from text/plain format -->

<P><FONT SIZE=2>I will be out of office till 05/18 and will have intermittent email access. Email responses will be delayed.</FONT>
</P>

</BODY>
</HTML>
Comment 30 Fu Michael 2007-11-05 19:28:10 UTC
(In reply to comment #27)
> Created an attachment (id=11748) [details]
> Set max_cstate for early Opterons
> 
> Here is a proposed patch to automatically set max_cstate for early Opterons,
> please check.
> 

Alex, any update for this patch?
Comment 31 Rahul Tikoo 2007-11-05 19:31:25 UTC
I will be out of office from 11/02 to 11/09 and will have intermittent email access. Email responses will be delayed. For urgent matters please call me at 9972093530. In my absence, the following people will provide coverage - 

CSS - Minesh Parekh - Minesh.Parekh@amd.com

DTE - Neel Subramani - Neelamegam.Subramani@amd.com

Management - Jay Hiremath - Jay.Hiremath@amd.com
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2//EN">
<HTML>
<HEAD>
<META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=iso-8859-1">
<META NAME="Generator" CONTENT="MS Exchange Server version 6.5.7652.24">
<TITLE>Out of Office AutoReply: [Bug 5303] AMD64 Erratum: Should not enable C2 when using APIC</TITLE>
</HEAD>
<BODY>
<!-- Converted from text/plain format -->

<P><FONT SIZE=2>I will be out of office from 11/02 to 11/09 and will have intermittent email access. Email responses will be delayed. For urgent matters please call me at 9972093530. In my absence, the following people will provide coverage -<BR>
<BR>
CSS &#8211; Minesh Parekh &#8211; Minesh.Parekh@amd.com<BR>
<BR>
DTE &#8211; Neel Subramani &#8211; Neelamegam.Subramani@amd.com<BR>
<BR>
Management &#8211; Jay Hiremath &#8211; Jay.Hiremath@amd.com<BR>
</FONT>
</P>

</BODY>
</HTML>
Comment 32 Thomas Gleixner 2007-11-13 15:29:04 UTC
I'm going to add this to the local APIC timer disable quirks in 2.6.24-rc

Thanks,

       tglx
Comment 33 Bertro Simul 2007-11-16 01:43:49 UTC
Thomas Gleixner  wrote:

> I'm going to add this to the local APIC timer disable quirks in 2.6.24-rc


I’m not sure I understand correctly what you intend to do, but what does this mean for the other interrupts? Is erratum 78 specific to the local APIC timer or does it apply to the interrupts coming from the IOAPIC, too (this is how I understand the erratum)?
Comment 34 Thomas Gleixner 2007-11-16 14:59:35 UTC
On Fri, 16 Nov 2007, bugme-daemon@bugzilla.kernel.org wrote:
> ------- Comment #33 from bertro_simul@yahoo.com  2007-11-16 01:43 -------
> Thomas Gleixner  wrote:
> 
> > I'm going to add this to the local APIC timer disable quirks in 2.6.24-rc
> 
> 
> I’m not sure I understand correctly what you intend to do, but what does this
> mean for the other interrupts? Is erratum 78 specific to the local APIC timer
> or does it apply to the interrupts coming from the IOAPIC, too (this is how I
> understand the erratum)?

Oops. Right, I confused this with some other problem. I read the
errata again and I think your patch is correct. I try to figure out
whether we can avoid the extra check in the acpi code and figure this
out right at boot time. I get this into mainline asap.

Thanks,

	tglx
Comment 35 Mark Langsdorf 2007-11-16 15:01:46 UTC
I will be on vacation from November 17th until November 22nd.  I will respond to all emails when I return.

In my absence, refer Linux issues to the OSRC at osrc@elbe.amd.com.  Personal email should be sent to mlangsdo@io.com.  

-Mark Langsdorf
Operating System Research Center
AMD
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2//EN">
<HTML>
<HEAD>
<META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=iso-8859-1">
<META NAME="Generator" CONTENT="MS Exchange Server version 6.5.7652.24">
<TITLE>Out of Office AutoReply: [Bug 5303] AMD64 Erratum: Should not enable C2 when using APIC</TITLE>
</HEAD>
<BODY>
<!-- Converted from text/plain format -->

<P><FONT SIZE=2>I will be on vacation from November 17th until November 22nd.&nbsp; I will respond to all emails when I return.<BR>
<BR>
In my absence, refer Linux issues to the OSRC at osrc@elbe.amd.com.&nbsp; Personal email should be sent to mlangsdo@io.com.&nbsp;<BR>
<BR>
-Mark Langsdorf<BR>
Operating System Research Center<BR>
AMD<BR>
</FONT>
</P>

</BODY>
</HTML>
Comment 36 Bertro Simul 2007-12-08 04:05:03 UTC
Thomas,


I’ve noticed that a patch made it into 2.6.24-rc4
that cures the disease by killing the patient.

Is this the last word on this problem or will we
see a patch that provides a workaround for erratum 78
that makes C2 available again?
Comment 37 Rahul Tikoo 2007-12-08 04:07:32 UTC
I will be out of office from 12/03 to 12/10 and will have intermittent email and no cell phone access. Email responses will be delayed. In my absence, the following people will provide coverage - 

CSS - Minesh Parekh - Minesh.Parekh@amd.com

DTE - Neel Subramani - Neelamegam.Subramani@amd.com

Management - Jay Hiremath - Jay.Hiremath@amd.com
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2//EN">
<HTML>
<HEAD>
<META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=iso-8859-1">
<META NAME="Generator" CONTENT="MS Exchange Server version 6.5.7652.24">
<TITLE>Out of Office AutoReply: [Bug 5303] AMD64 Erratum: Should not enable C2 when using APIC</TITLE>
</HEAD>
<BODY>
<!-- Converted from text/plain format -->

<P><FONT SIZE=2>I will be out of office from 12/03 to 12/10 and will have intermittent email and no cell phone access. Email responses will be delayed. In my absence, the following people will provide coverage -<BR>
<BR>
CSS &#8211; Minesh Parekh &#8211; Minesh.Parekh@amd.com<BR>
<BR>
DTE &#8211; Neel Subramani &#8211; Neelamegam.Subramani@amd.com<BR>
<BR>
Management &#8211; Jay Hiremath &#8211; Jay.Hiremath@amd.com<BR>
</FONT>
</P>

</BODY>
</HTML>
Comment 38 Thomas Gleixner 2007-12-09 22:02:35 UTC
> I’ve noticed that a patch made it into 2.6.24-rc4
> that cures the disease by killing the patient.
> 
> Is this the last word on this problem or will we
> see a patch that provides a workaround for erratum 78
> that makes C2 available again?

There is no known workaround. Sorry.

      tglx
Comment 39 Len Brown 2007-12-28 22:37:48 UTC
a re-worked version of Alexey's patch in comment #27
shipped in linux-2.6.24-rc4.
c1c306344669ca40255e36192b101060ffbb1271
(ACPI: Set max_cstate to 1 for early Opterons.)

yes, Bertro, this always disables C2 on your box,
in favor of fixing the interrupt issue.

no, i don't understand Andi's comment #28 either,
as i've seen no indication that timers different
than the lapic timer are immune from the issue.
(if they were, you could have used "nolapic" for a workaround)

closed.
Comment 40 Mark Langsdorf 2007-12-28 22:45:40 UTC
I will be on vacation from December 15th 2007 until January 3rd 2008.  I will respond to all emails when I return.

In my absence, refer Linux issues to the OSRC at osrc@elbe.amd.com.  Personal email should be sent to mlangsdo@io.com.  

-Mark Langsdorf
Operating System Research Center
AMD
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2//EN">
<HTML>
<HEAD>
<META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=iso-8859-1">
<META NAME="Generator" CONTENT="MS Exchange Server version 6.5.7652.24">
<TITLE>Out of Office AutoReply: [Bug 5303] AMD64 Erratum: Should not enable C2 when using APIC</TITLE>
</HEAD>
<BODY>
<!-- Converted from text/plain format -->

<P><FONT SIZE=2>I will be on vacation from December 15th 2007 until January 3rd 2008.&nbsp; I will respond to all emails when I return.<BR>
<BR>
In my absence, refer Linux issues to the OSRC at osrc@elbe.amd.com.&nbsp; Personal email should be sent to mlangsdo@io.com.&nbsp;<BR>
<BR>
-Mark Langsdorf<BR>
Operating System Research Center<BR>
AMD<BR>
</FONT>
</P>

</BODY>
</HTML>
Comment 41 Andi Kleen 2007-12-29 04:22:08 UTC
I don't think that patch was the correct solution. The problem is essentially
equivalent to APIC timer not working in C2, and that is handled by broadcasting e.g. on Intel. No need to really disable the C states completely.

If there are other APIC interrupts they can probably be delayed a bit
until the next broadcast

Alexey can you expand why did it exactly this way?

I think it should be REOPENED, although I unfortunately don't have enough
bugzilla rights to do that.
Comment 42 Alexey Starikovskiy 2007-12-29 06:52:22 UTC
Andi, I referred to #24 as a guide.
Comment 43 Andi Kleen 2007-12-30 04:48:29 UTC
I think it would have been better to try broadcasting first and only
if that didn't help go to such drastic measures.
Comment 44 Alexey Starikovskiy 2007-12-30 08:25:42 UTC
Andy, was it you who dismissed "ugly" broadcast patch at #22?
Comment 45 Andi Kleen 2007-12-30 18:43:53 UTC
Yes, but the timer broadcast as currently implemented for Intel platforms
in the tree is quite different than that. Still not pretty, but bearable.
I also don't like disabling power saving modi automatically in general -- 
imho power saving is very important. And the broadcast should really already
help for this I hope.
Comment 46 Alexey Starikovskiy 2007-12-31 02:26:42 UTC
AMD recommends to disable power saving to avoid errata. Bug is two years old and nobody seem to care to implement anything better that this.
Comment 47 Thomas Gleixner 2008-01-05 03:31:45 UTC
We use timer broadcasting in C2 anyway, but this does not change the problem at all. We can not avoid a situation like this:

CPU goes idle
Timer interrupt happens while interrupts are disabled right before we go into C2
We wake up only when some other interrupt (keyboard, network whatever) comes in.

This is completely independent of local apic timer or hpet/pit broadcast mode.

To verify this we can simply backout the quirk (git commit c1c306344669ca40255e36192b101060ffbb1271) and add "noapictimer" to the kernel command line.
Comment 48 Andi Kleen 2008-01-09 12:01:05 UTC
You're right; since it applies to all APIC interrupts. I somehow assumed
it only applied to APIC timer interrupts, but that was wrong.
Objection withdrawn.

Note You need to log in before you can comment on or make changes to this bug.