Bug 11191 - 2.6.26-git8: spinlock lockup in c1e_idle()
Summary: 2.6.26-git8: spinlock lockup in c1e_idle()
Status: CLOSED DUPLICATE of bug 11418
Alias: None
Product: Platform Specific/Hardware
Classification: Unclassified
Component: x86-64 (show other bugs)
Hardware: All Linux
: P1 normal
Assignee: Thomas Gleixner
URL:
Keywords:
Depends on:
Blocks: Regressions-2.6.26
  Show dependency tree
 
Reported: 2008-07-31 06:16 UTC by Rafael J. Wysocki
Modified: 2008-09-04 15:35 UTC (History)
5 users (show)

See Also:
Kernel Version: 2.6.26-git8
Subsystem:
Regression: Yes
Bisected commit-id:


Attachments
config-2.6.27-rc1-git4 (72.67 KB, text/plain)
2008-08-23 18:42 UTC, Mikhail Kshevetskiy
Details
dmesg-2.6.26-rc1-git4 (111.47 KB, text/plain)
2008-08-23 18:45 UTC, Mikhail Kshevetskiy
Details
syslog-2.6.27-rc2-git4 (71.29 KB, text/plain)
2008-08-23 18:46 UTC, Mikhail Kshevetskiy
Details
lspci -tv (1.88 KB, text/plain)
2008-08-23 18:49 UTC, Mikhail Kshevetskiy
Details
lspci -xvvv (32.89 KB, text/plain)
2008-08-23 18:49 UTC, Mikhail Kshevetskiy
Details
/proc/timer_list (2.6.26.3) (3.76 KB, text/plain)
2008-08-23 18:58 UTC, Mikhail Kshevetskiy
Details
dmesg-2.6.27-rc4-git6 (51.19 KB, text/plain)
2008-08-27 03:06 UTC, Mikhail Kshevetskiy
Details
/proc/timer_list (2.6.27-rc4-git6, nohpet) (3.98 KB, text/plain)
2008-08-27 13:53 UTC, Mikhail Kshevetskiy
Details
/proc/timer_list (2.6.27-rc4-git6, idle=poll) (4.00 KB, text/plain)
2008-08-27 13:54 UTC, Mikhail Kshevetskiy
Details
combo patch of various clockevents fixes which might be related (5.33 KB, patch)
2008-09-03 08:25 UTC, Thomas Gleixner
Details | Diff
patch to fix the problem (at least it worked for the HP tx1000 where I've tested) (1.60 KB, patch)
2008-09-03 13:16 UTC, herrmann.der.user
Details | Diff

Description Rafael J. Wysocki 2008-07-31 06:16:00 UTC
Subject    : BUG: 2.6.26-git8: spinlock lockup in c1e_idle()
Submitter  : Mikhail Kshevetskiy <mikhail.kshevetskiy@gmail.com>
Date       : 2008-07-24 03:22
References : http://lkml.org/lkml/2008/7/23/317
Handled-By : Thomas Gleixner <tglx@linutronix.de>

This entry is being used for tracking a regression from 2.6.26.  Please don't
close it until the problem is fixed in the mainline.
Comment 1 Rafael J. Wysocki 2008-08-11 04:38:38 UTC
On Monday, 11 of August 2008, Mikhail Kshevetskiy wrote:
> As of 2.6.27-rc2-git4 the bug still exists, so nothing has changed.
> The syslog for 2.6.27-rc2-git4 is attached.
> 
> Mikhail
> 
> 
> On Sun, 10 Aug 2008 00:43:50 +0200 (CEST)
> "Rafael J. Wysocki" <rjw@sisk.pl> wrote:
> 
> > This message has been generated automatically as a part of a report
> > of recent regressions.
> > 
> > The following bug entry is on the current list of known regressions
> > from 2.6.26.  Please verify if it still should be listed and let me know
> > (either way).
> > 
> > 
> > Bug-Entry   : http://bugzilla.kernel.org/show_bug.cgi?id=11191
> > Subject             : 2.6.26-git8: spinlock lockup in c1e_idle()
> > Submitter   : Mikhail Kshevetskiy <mikhail.kshevetskiy@gmail.com>
> > Date                : 2008-07-24 03:22 (17 days old)
> > References  : http://lkml.org/lkml/2008/7/23/317
> > Handled-By  : Thomas Gleixner <tglx@linutronix.de>
Comment 2 Rafael J. Wysocki 2008-08-18 14:51:30 UTC
On Monday, 18 of August 2008, Mikhail Kshevetskiy wrote:
> As of 2.6.26-rc3-git3 bug still exist.
> It affect both i386 and x86_64 architectures.
> 
> Mikhail
> 
> On Sat, 16 Aug 2008 21:02:46 +0200 (CEST)
> "Rafael J. Wysocki" <rjw@sisk.pl> wrote:
> 
> > This message has been generated automatically as a part of a report
> > of recent regressions.
> > 
> > The following bug entry is on the current list of known regressions
> > from 2.6.26.  Please verify if it still should be listed and let me know
> > (either way).
> > 
> > 
> > Bug-Entry   : http://bugzilla.kernel.org/show_bug.cgi?id=11191
> > Subject             : 2.6.26-git8: spinlock lockup in c1e_idle()
> > Submitter   : Mikhail Kshevetskiy <mikhail.kshevetskiy@gmail.com>
> > Date                : 2008-07-24 03:22 (24 days old)
> > References  : http://lkml.org/lkml/2008/7/23/317
> > Handled-By  : Thomas Gleixner <tglx@linutronix.de>
Comment 3 Mikhail Kshevetskiy 2008-08-23 18:42:31 UTC
Created attachment 17405 [details]
config-2.6.27-rc1-git4
Comment 4 Mikhail Kshevetskiy 2008-08-23 18:45:19 UTC
Created attachment 17406 [details]
dmesg-2.6.26-rc1-git4
Comment 5 Mikhail Kshevetskiy 2008-08-23 18:46:55 UTC
Created attachment 17407 [details]
syslog-2.6.27-rc2-git4
Comment 6 Mikhail Kshevetskiy 2008-08-23 18:49:22 UTC
Created attachment 17408 [details]
lspci -tv
Comment 7 Mikhail Kshevetskiy 2008-08-23 18:49:44 UTC
Created attachment 17409 [details]
lspci -xvvv
Comment 8 Mikhail Kshevetskiy 2008-08-23 18:55:50 UTC
As of 2.6.26-rc3-git3 bug still exist.
It affect both i386 and x86_64 architectures.

Here is the summary of kernel parameters I tried:
1) noapictimer      -- boot ok, tickless disabled
2) nohpet           -- boot ok, (i suppose tickless disabled)
3) clocksource=tsc  -- the same as with "nohpet"
4) idle=halt        -- need to press buttons to boot
5) none of above    -- the kernel freeze after switch to high resolution
                       mode on CPU0 and detect spinlock lockup 5 minutes
                       later
Comment 9 Mikhail Kshevetskiy 2008-08-23 18:58:51 UTC
Created attachment 17410 [details]
/proc/timer_list (2.6.26.3)
Comment 10 Thomas Gleixner 2008-08-26 01:26:17 UTC
Is this problem still there with 2.6.27-rc4 ?
Comment 11 Mikhail Kshevetskiy 2008-08-27 03:06:39 UTC
Created attachment 17476 [details]
dmesg-2.6.27-rc4-git6
Comment 12 Mikhail Kshevetskiy 2008-08-27 03:32:18 UTC
Yep, the problem still present.

I try the following:
  1) replace default_idle() with poll_idle() in c1e_idle() -- [BUG PRESENT]
  2) "nohpet" or "clocksource=tsc" kernel parameters -- hpet disabled, bug not observed
  3) "noapictimer" or "highres=off nohz=off" kernel parameters -- hpet in periodic mode, tickless disabled, bug not observed
  4) "idle=poll" kernel parameter -- hpet in oneshot mode, tickless enabled, power saving is off, bug not observed
so it looks like the problem linked with hpet breakage during clockevents_notify(CLOCK_EVT_NOTIFY_BROADCAST_FORCE/ENTER/EXIT, &cpu) calls.
Comment 13 Thomas Gleixner 2008-08-27 11:28:44 UTC
Can you please provide the output of /proc/timer_list after you booted a kernel with "nohpet" on the kernel command line ?

Also can you please try to boot with "clocksource=acpi_pm" ?
Comment 14 Mikhail Kshevetskiy 2008-08-27 13:53:24 UTC
Created attachment 17488 [details]
/proc/timer_list (2.6.27-rc4-git6, nohpet)
Comment 15 Mikhail Kshevetskiy 2008-08-27 13:54:24 UTC
Created attachment 17489 [details]
/proc/timer_list (2.6.27-rc4-git6, idle=poll)
Comment 16 Mikhail Kshevetskiy 2008-08-27 14:02:52 UTC
(In reply to comment #13)
> Can you please provide the output of /proc/timer_list after you booted a
> kernel
> with "nohpet" on the kernel command line ?

see the attachmets (two cases)
1) nohpet    (this is your request)
2) idle=poll (for compare with 2.6.23 case)
 
> Also can you please try to boot with "clocksource=acpi_pm" ?

the same problem as in hpet case, so no results
Comment 17 Thomas Gleixner 2008-08-27 14:40:44 UTC
> see the attachmets (two cases)
> 1) nohpet    (this is your request)

working highres just using PIT instead of HPET

> > Also can you please try to boot with "clocksource=acpi_pm" ?
> 
> the same problem as in hpet case, so no results

Well, it's an result. Using HPET on that box seems to be the real problem.

Maciej, any idea on that ?
Comment 18 Mikhail Kshevetskiy 2008-08-28 07:03:52 UTC
> Well, it's an result. Using HPET on that box seems to be the real problem.

I am not really sure that HPET is broken completely. It work in periodic mode at least. According to /proc/timer_list from 2.6.27-rc4-git6 booted with "idle=poll" kernel parameter, there is a chance that HPET may work in oneshot mode also. 

I want to boot with "idle=poll" (this will switch HPET to oneshot mode) and then switch idle function to "c1e_idle". If no errors occurred, then only HPET mode switching should be fixed for my notebook.

Is there any way to switch idle function after booting?
Comment 19 Thomas Gleixner 2008-08-28 07:45:50 UTC
> > Well, it's an result. Using HPET on that box seems to be the real problem.
> 
> I am not really sure that HPET is broken completely. It work in periodic mode
> at least. According to /proc/timer_list from 2.6.27-rc4-git6 booted with
> "idle=poll" kernel parameter, there is a chance that HPET may work in oneshot
> mode also. 
> 
> I want to boot with "idle=poll" (this will switch HPET to oneshot mode) and
> then switch idle function to "c1e_idle". If no errors occurred, then only
> HPET
> mode switching should be fixed for my notebook.

The mode switch of HPET is not the problem. The point is that you
disable the C1E functionality when you boot with idle=poll. Once you
switch to c1e_idle you will have the same problem again. It seems,
that once the CPUs switch into C1E mode, then the HPET becomes
disfunctional.

If you run in periodic mode, then the CPUs might not switch into the
deepest saving modes which affect the HPET. We have no clue about the
internal details of the C1E implementation in the System Management
code, which is inside of your BIOS.
Comment 20 Thomas Gleixner 2008-08-28 07:51:22 UTC
Can the AMD folks please shed some light on the internals of the C1E magic ?
Comment 21 Rafael J. Wysocki 2008-09-01 14:23:54 UTC
On Monday, 1 of September 2008, Mikhail Kshevetskiy wrote:
> On Sat, 30 Aug 2008 21:50:13 +0200 (CEST)
> "Rafael J. Wysocki" <rjw@sisk.pl> wrote:
> 
> > This message has been generated automatically as a part of a report
> > of recent regressions.
> > 
> > The following bug entry is on the current list of known regressions
> > from 2.6.26.  Please verify if it still should be listed and let me know
> > (either way).
> > 
> > 
> > Bug-Entry   : http://bugzilla.kernel.org/show_bug.cgi?id=11191
> > Subject             : 2.6.26-git8: spinlock lockup in c1e_idle()
> > Submitter   : Mikhail Kshevetskiy <mikhail.kshevetskiy@gmail.com>
> > Date                : 2008-07-24 03:22 (38 days old)
> > References  : http://lkml.org/lkml/2008/7/23/317
> > Handled-By  : Thomas Gleixner <tglx@linutronix.de>
> > 
> > 
> 
> not fixed yet
Comment 22 Mikhail Kshevetskiy 2008-09-02 12:14:07 UTC
> what notebook do you have? From the bugzilla it seems to be an Asus with
> nvidia chipset -- but what model?
>
> So far I don't know what the root cause is, and I was not able
> to reproduce the problem with different hardware -- but maybe I have a
> chance to get access to a similar notebook model.
> This would help to better debug the problem.

ASUS F3T, bios F3TAS.228
http://gentoo-wiki.com/Asus_F3T
Comment 23 Rafael J. Wysocki 2008-09-02 14:00:16 UTC
References : http://marc.info/?l=linux-kernel&m=122038418801942&w=4
Comment 24 herrmann.der.user 2008-09-03 05:21:16 UTC
No Asus F3T available here. And no HP dv6000 (see comment #23).

But I have access to an HP tx1000 with nvidia MCP51.
It shows same symptom (spinlock lockup).

Interestingly, hpet does not seem to work in periodic mode.
With a non-tickless kernel, but using hpet system hangs for several
minutes during boot.

Currently doing some debugging (regarding hpet) on that system.
Comment 25 Thomas Gleixner 2008-09-03 07:58:51 UTC
> Currently doing some debugging (regarding hpet) on that system.

Can you please apply those patches:

http://lkml.org/lkml/2008/9/3/75

and

http://lkml.org/lkml/2008/9/3/183

Thanks,
	tglx
Comment 26 Mikhail Kshevetskiy 2008-09-03 08:15:34 UTC
(In reply to comment #24)
> No Asus F3T available here. And no HP dv6000 (see comment #23).
> 
> But I have access to an HP tx1000 with nvidia MCP51.
> It shows same symptom (spinlock lockup).
> 
> Interestingly, hpet does not seem to work in periodic mode.
> With a non-tickless kernel, but using hpet system hangs for several
> minutes during boot.
> 
> Currently doing some debugging (regarding hpet) on that system.
> 
try to turn off highres timers also. I use "highres=off nohz=off" and this works for me.
Comment 27 Thomas Gleixner 2008-09-03 08:23:55 UTC
> > Currently doing some debugging (regarding hpet) on that system.
> > 
> try to turn off highres timers also. I use "highres=off nohz=off" and this
> works for me.

That does not help much, if you want to debug why those two options
are _NOT_ working when enabled. :)

Thanks,

	tglx
Comment 28 Thomas Gleixner 2008-09-03 08:25:54 UTC
Created attachment 17595 [details]
combo patch of various clockevents fixes which might be related

That's the combo patch of all fixes which were made in the last couple of days in that area. Any feedback is welcome.

Thanks,
       tglx
Comment 29 herrmann.der.user 2008-09-03 11:39:16 UTC
Didn't try your combo patch(yet), but encountered the following:

On HP tx1000 with same chipset the kernel hangs for several minutes in a
loop when it tries to program hpet in one-shot mode:

int tick_program_event(ktime_t expires, int force)
...

        while (1) {
                int ret = clockevents_program_event(dev, expires, now);

                if (!ret || !force)
                        return ret;
                now = ktime_get();
                expires = ktime_add(now, ktime_set(0, dev->min_delta_ns));
        }

Some printks in hpet_legacy_next_event show

hpet: T0_CMP: 20cbcfc COUNTER: 20cbd01, delta: 2f 
hpet: T0_CMP: 20ccc27 COUNTER: 20ccc2b, delta: 2f
hpet: T0_CMP: 20cdb40 COUNTER: 20cdb45, delta: 2f
hpet: T0_CMP: 20cea7d COUNTER: 20cea81, delta: 2f

and this goes on and on and on and on (for at least more than 2 minutes)

The corresponding values in clockevents_program_event are:

delta (ns): 0x780, dev->min_delta_ns: 0x780,
dev->mult: 0x6666666, dev->shift: 0x20,

I've doubled min_delta_ns for hpet:

@@ -229,7 +229,7 @@ static void hpet_legacy_clockevent_register(void)
        /* Calculate the min / max delta */
        hpet_clockevent.max_delta_ns = clockevent_delta2ns(0x7FFFFFFF,
                                                           &hpet_clockevent);
-       hpet_clockevent.min_delta_ns = clockevent_delta2ns(0x30,
+       hpet_clockevent.min_delta_ns = clockevent_delta2ns(0x60,
                                                           &hpet_clockevent);
 

and the system boots without delays and does not show spinlock lockups so far.
Comment 30 herrmann.der.user 2008-09-03 13:16:43 UTC
Created attachment 17598 [details]
patch to fix the problem (at least it worked for the HP tx1000 where I've tested)

    x86: hpet: increase min_delta_ns to increase chance of successful programming in hpet_legacy_next_event
    
    This fixes http://bugzilla.kernel.org/show_bug.cgi?id=11191
    and most probably http://bugzilla.kernel.org/show_bug.cgi?id=11418
    as well.
    
    With c1e_idle hpet is frequently reprogrammed (in one-shot mode).
    
    If the delta for next timer event is very small the T0 comparator
    value is too close to the current HPET counter value and Linux
    repeatedly tries to reprogram the comparator.
    
    On an HP tx1000 (with AMD Turion and nvidia MCP51) this caused
    
      BUG: spinlock lockup on CPU#0
    
    during boot. On other systems with other chipsets I've observed soft
    lockups, e.g.
    
      BUG: soft lockup - CPU#1 stuck for 89s! [uname:28197]
    
    Both symptoms vanished when I've increased min_delta_ns for hpet.
Comment 31 Thomas Gleixner 2008-09-03 14:41:04 UTC
> I've doubled min_delta_ns for hpet:
> 
> @@ -229,7 +229,7 @@ static void hpet_legacy_clockevent_register(void)
>         /* Calculate the min / max delta */
>         hpet_clockevent.max_delta_ns = clockevent_delta2ns(0x7FFFFFFF,
>                                                            &hpet_clockevent);
> -       hpet_clockevent.min_delta_ns = clockevent_delta2ns(0x30,
> +       hpet_clockevent.min_delta_ns = clockevent_delta2ns(0x60,
>                                                            &hpet_clockevent);
> 
> 
> and the system boots without delays and does not show spinlock lockups so
> far.

Yeah, that was what we found out as well :) I just sent a patchqueue
with various fixes to lkml. unfortunately I forgot to CC you. too tired :(

Thanks,
	tglx
Comment 32 Mikhail Kshevetskiy 2008-09-04 04:49:32 UTC
with the latest patches kernel boot and work just fine.

thank you
Comment 33 herrmann.der.user 2008-09-04 04:52:21 UTC
Today I have tested Thomas' patchset as well.
It (of course) solved the spinlock lockup issue on the HP tx1000.
Comment 34 herrmann.der.user 2008-09-04 05:02:16 UTC
Minor correction to my comment #24
HPET works/worked in periodic mode without problems.
Comment 35 Rafael J. Wysocki 2008-09-04 14:57:28 UTC
Handled-By : Andreas Herrmann <andreas.herrmann3@amd.com>
Patch : http://bugzilla.kernel.org/attachment.cgi?id=17598&action=view
Comment 36 Thomas Gleixner 2008-09-04 15:04:37 UTC
Is fixed by the complete patch set, which is scheduled for linus. Combo patch is here.

http://bugzilla.kernel.org/show_bug.cgi?id=11418

Maybe we should mark those duplicate.
Comment 37 Rafael J. Wysocki 2008-09-04 15:35:01 UTC

*** This bug has been marked as a duplicate of bug 11418 ***

Note You need to log in before you can comment on or make changes to this bug.