Subject : BUG: 2.6.26-git8: spinlock lockup in c1e_idle() Submitter : Mikhail Kshevetskiy <mikhail.kshevetskiy@gmail.com> Date : 2008-07-24 03:22 References : http://lkml.org/lkml/2008/7/23/317 Handled-By : Thomas Gleixner <tglx@linutronix.de> This entry is being used for tracking a regression from 2.6.26. Please don't close it until the problem is fixed in the mainline.
On Monday, 11 of August 2008, Mikhail Kshevetskiy wrote: > As of 2.6.27-rc2-git4 the bug still exists, so nothing has changed. > The syslog for 2.6.27-rc2-git4 is attached. > > Mikhail > > > On Sun, 10 Aug 2008 00:43:50 +0200 (CEST) > "Rafael J. Wysocki" <rjw@sisk.pl> wrote: > > > This message has been generated automatically as a part of a report > > of recent regressions. > > > > The following bug entry is on the current list of known regressions > > from 2.6.26. Please verify if it still should be listed and let me know > > (either way). > > > > > > Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=11191 > > Subject : 2.6.26-git8: spinlock lockup in c1e_idle() > > Submitter : Mikhail Kshevetskiy <mikhail.kshevetskiy@gmail.com> > > Date : 2008-07-24 03:22 (17 days old) > > References : http://lkml.org/lkml/2008/7/23/317 > > Handled-By : Thomas Gleixner <tglx@linutronix.de>
On Monday, 18 of August 2008, Mikhail Kshevetskiy wrote: > As of 2.6.26-rc3-git3 bug still exist. > It affect both i386 and x86_64 architectures. > > Mikhail > > On Sat, 16 Aug 2008 21:02:46 +0200 (CEST) > "Rafael J. Wysocki" <rjw@sisk.pl> wrote: > > > This message has been generated automatically as a part of a report > > of recent regressions. > > > > The following bug entry is on the current list of known regressions > > from 2.6.26. Please verify if it still should be listed and let me know > > (either way). > > > > > > Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=11191 > > Subject : 2.6.26-git8: spinlock lockup in c1e_idle() > > Submitter : Mikhail Kshevetskiy <mikhail.kshevetskiy@gmail.com> > > Date : 2008-07-24 03:22 (24 days old) > > References : http://lkml.org/lkml/2008/7/23/317 > > Handled-By : Thomas Gleixner <tglx@linutronix.de>
Created attachment 17405 [details] config-2.6.27-rc1-git4
Created attachment 17406 [details] dmesg-2.6.26-rc1-git4
Created attachment 17407 [details] syslog-2.6.27-rc2-git4
Created attachment 17408 [details] lspci -tv
Created attachment 17409 [details] lspci -xvvv
As of 2.6.26-rc3-git3 bug still exist. It affect both i386 and x86_64 architectures. Here is the summary of kernel parameters I tried: 1) noapictimer -- boot ok, tickless disabled 2) nohpet -- boot ok, (i suppose tickless disabled) 3) clocksource=tsc -- the same as with "nohpet" 4) idle=halt -- need to press buttons to boot 5) none of above -- the kernel freeze after switch to high resolution mode on CPU0 and detect spinlock lockup 5 minutes later
Created attachment 17410 [details] /proc/timer_list (2.6.26.3)
Is this problem still there with 2.6.27-rc4 ?
Created attachment 17476 [details] dmesg-2.6.27-rc4-git6
Yep, the problem still present. I try the following: 1) replace default_idle() with poll_idle() in c1e_idle() -- [BUG PRESENT] 2) "nohpet" or "clocksource=tsc" kernel parameters -- hpet disabled, bug not observed 3) "noapictimer" or "highres=off nohz=off" kernel parameters -- hpet in periodic mode, tickless disabled, bug not observed 4) "idle=poll" kernel parameter -- hpet in oneshot mode, tickless enabled, power saving is off, bug not observed so it looks like the problem linked with hpet breakage during clockevents_notify(CLOCK_EVT_NOTIFY_BROADCAST_FORCE/ENTER/EXIT, &cpu) calls.
Can you please provide the output of /proc/timer_list after you booted a kernel with "nohpet" on the kernel command line ? Also can you please try to boot with "clocksource=acpi_pm" ?
Created attachment 17488 [details] /proc/timer_list (2.6.27-rc4-git6, nohpet)
Created attachment 17489 [details] /proc/timer_list (2.6.27-rc4-git6, idle=poll)
(In reply to comment #13) > Can you please provide the output of /proc/timer_list after you booted a > kernel > with "nohpet" on the kernel command line ? see the attachmets (two cases) 1) nohpet (this is your request) 2) idle=poll (for compare with 2.6.23 case) > Also can you please try to boot with "clocksource=acpi_pm" ? the same problem as in hpet case, so no results
> see the attachmets (two cases) > 1) nohpet (this is your request) working highres just using PIT instead of HPET > > Also can you please try to boot with "clocksource=acpi_pm" ? > > the same problem as in hpet case, so no results Well, it's an result. Using HPET on that box seems to be the real problem. Maciej, any idea on that ?
> Well, it's an result. Using HPET on that box seems to be the real problem. I am not really sure that HPET is broken completely. It work in periodic mode at least. According to /proc/timer_list from 2.6.27-rc4-git6 booted with "idle=poll" kernel parameter, there is a chance that HPET may work in oneshot mode also. I want to boot with "idle=poll" (this will switch HPET to oneshot mode) and then switch idle function to "c1e_idle". If no errors occurred, then only HPET mode switching should be fixed for my notebook. Is there any way to switch idle function after booting?
> > Well, it's an result. Using HPET on that box seems to be the real problem. > > I am not really sure that HPET is broken completely. It work in periodic mode > at least. According to /proc/timer_list from 2.6.27-rc4-git6 booted with > "idle=poll" kernel parameter, there is a chance that HPET may work in oneshot > mode also. > > I want to boot with "idle=poll" (this will switch HPET to oneshot mode) and > then switch idle function to "c1e_idle". If no errors occurred, then only > HPET > mode switching should be fixed for my notebook. The mode switch of HPET is not the problem. The point is that you disable the C1E functionality when you boot with idle=poll. Once you switch to c1e_idle you will have the same problem again. It seems, that once the CPUs switch into C1E mode, then the HPET becomes disfunctional. If you run in periodic mode, then the CPUs might not switch into the deepest saving modes which affect the HPET. We have no clue about the internal details of the C1E implementation in the System Management code, which is inside of your BIOS.
Can the AMD folks please shed some light on the internals of the C1E magic ?
On Monday, 1 of September 2008, Mikhail Kshevetskiy wrote: > On Sat, 30 Aug 2008 21:50:13 +0200 (CEST) > "Rafael J. Wysocki" <rjw@sisk.pl> wrote: > > > This message has been generated automatically as a part of a report > > of recent regressions. > > > > The following bug entry is on the current list of known regressions > > from 2.6.26. Please verify if it still should be listed and let me know > > (either way). > > > > > > Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=11191 > > Subject : 2.6.26-git8: spinlock lockup in c1e_idle() > > Submitter : Mikhail Kshevetskiy <mikhail.kshevetskiy@gmail.com> > > Date : 2008-07-24 03:22 (38 days old) > > References : http://lkml.org/lkml/2008/7/23/317 > > Handled-By : Thomas Gleixner <tglx@linutronix.de> > > > > > > not fixed yet
> what notebook do you have? From the bugzilla it seems to be an Asus with > nvidia chipset -- but what model? > > So far I don't know what the root cause is, and I was not able > to reproduce the problem with different hardware -- but maybe I have a > chance to get access to a similar notebook model. > This would help to better debug the problem. ASUS F3T, bios F3TAS.228 http://gentoo-wiki.com/Asus_F3T
References : http://marc.info/?l=linux-kernel&m=122038418801942&w=4
No Asus F3T available here. And no HP dv6000 (see comment #23). But I have access to an HP tx1000 with nvidia MCP51. It shows same symptom (spinlock lockup). Interestingly, hpet does not seem to work in periodic mode. With a non-tickless kernel, but using hpet system hangs for several minutes during boot. Currently doing some debugging (regarding hpet) on that system.
> Currently doing some debugging (regarding hpet) on that system. Can you please apply those patches: http://lkml.org/lkml/2008/9/3/75 and http://lkml.org/lkml/2008/9/3/183 Thanks, tglx
(In reply to comment #24) > No Asus F3T available here. And no HP dv6000 (see comment #23). > > But I have access to an HP tx1000 with nvidia MCP51. > It shows same symptom (spinlock lockup). > > Interestingly, hpet does not seem to work in periodic mode. > With a non-tickless kernel, but using hpet system hangs for several > minutes during boot. > > Currently doing some debugging (regarding hpet) on that system. > try to turn off highres timers also. I use "highres=off nohz=off" and this works for me.
> > Currently doing some debugging (regarding hpet) on that system. > > > try to turn off highres timers also. I use "highres=off nohz=off" and this > works for me. That does not help much, if you want to debug why those two options are _NOT_ working when enabled. :) Thanks, tglx
Created attachment 17595 [details] combo patch of various clockevents fixes which might be related That's the combo patch of all fixes which were made in the last couple of days in that area. Any feedback is welcome. Thanks, tglx
Didn't try your combo patch(yet), but encountered the following: On HP tx1000 with same chipset the kernel hangs for several minutes in a loop when it tries to program hpet in one-shot mode: int tick_program_event(ktime_t expires, int force) ... while (1) { int ret = clockevents_program_event(dev, expires, now); if (!ret || !force) return ret; now = ktime_get(); expires = ktime_add(now, ktime_set(0, dev->min_delta_ns)); } Some printks in hpet_legacy_next_event show hpet: T0_CMP: 20cbcfc COUNTER: 20cbd01, delta: 2f hpet: T0_CMP: 20ccc27 COUNTER: 20ccc2b, delta: 2f hpet: T0_CMP: 20cdb40 COUNTER: 20cdb45, delta: 2f hpet: T0_CMP: 20cea7d COUNTER: 20cea81, delta: 2f and this goes on and on and on and on (for at least more than 2 minutes) The corresponding values in clockevents_program_event are: delta (ns): 0x780, dev->min_delta_ns: 0x780, dev->mult: 0x6666666, dev->shift: 0x20, I've doubled min_delta_ns for hpet: @@ -229,7 +229,7 @@ static void hpet_legacy_clockevent_register(void) /* Calculate the min / max delta */ hpet_clockevent.max_delta_ns = clockevent_delta2ns(0x7FFFFFFF, &hpet_clockevent); - hpet_clockevent.min_delta_ns = clockevent_delta2ns(0x30, + hpet_clockevent.min_delta_ns = clockevent_delta2ns(0x60, &hpet_clockevent); and the system boots without delays and does not show spinlock lockups so far.
Created attachment 17598 [details] patch to fix the problem (at least it worked for the HP tx1000 where I've tested) x86: hpet: increase min_delta_ns to increase chance of successful programming in hpet_legacy_next_event This fixes http://bugzilla.kernel.org/show_bug.cgi?id=11191 and most probably http://bugzilla.kernel.org/show_bug.cgi?id=11418 as well. With c1e_idle hpet is frequently reprogrammed (in one-shot mode). If the delta for next timer event is very small the T0 comparator value is too close to the current HPET counter value and Linux repeatedly tries to reprogram the comparator. On an HP tx1000 (with AMD Turion and nvidia MCP51) this caused BUG: spinlock lockup on CPU#0 during boot. On other systems with other chipsets I've observed soft lockups, e.g. BUG: soft lockup - CPU#1 stuck for 89s! [uname:28197] Both symptoms vanished when I've increased min_delta_ns for hpet.
> I've doubled min_delta_ns for hpet: > > @@ -229,7 +229,7 @@ static void hpet_legacy_clockevent_register(void) > /* Calculate the min / max delta */ > hpet_clockevent.max_delta_ns = clockevent_delta2ns(0x7FFFFFFF, > &hpet_clockevent); > - hpet_clockevent.min_delta_ns = clockevent_delta2ns(0x30, > + hpet_clockevent.min_delta_ns = clockevent_delta2ns(0x60, > &hpet_clockevent); > > > and the system boots without delays and does not show spinlock lockups so > far. Yeah, that was what we found out as well :) I just sent a patchqueue with various fixes to lkml. unfortunately I forgot to CC you. too tired :( Thanks, tglx
with the latest patches kernel boot and work just fine. thank you
Today I have tested Thomas' patchset as well. It (of course) solved the spinlock lockup issue on the HP tx1000.
Minor correction to my comment #24 HPET works/worked in periodic mode without problems.
Handled-By : Andreas Herrmann <andreas.herrmann3@amd.com> Patch : http://bugzilla.kernel.org/attachment.cgi?id=17598&action=view
Is fixed by the complete patch set, which is scheduled for linus. Combo patch is here. http://bugzilla.kernel.org/show_bug.cgi?id=11418 Maybe we should mark those duplicate.
*** This bug has been marked as a duplicate of bug 11418 ***