Bug 10152

Summary: Clocksource tsc is always unstable with 2.6.25-* kernels and CONFIG_NO_HZ=y on my box
Product: Platform Specific/Hardware Reporter: Rafael J. Wysocki (rjw)
Component: x86-64Assignee: Thomas Gleixner (tglx)
Status: CLOSED CODE_FIX    
Severity: normal CC: akpm, andi-bz, crazy, mingo, tglx
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: 2.6.25-rc Subsystem:
Regression: --- Bisected commit-id:

Description Rafael J. Wysocki 2008-03-02 14:29:29 UTC
Subject         : Clocksource tsc is always unstable with 2.6.25-* kernels and CONFIG_NO_HZ=y on my box
Submitter       : Gabriel C <nix.or.die@googlemail.com>
Date            : 2008-02-24 01:31
References      : http://lkml.org/lkml/2008/2/23/380
References      : http://lkml.org/lkml/2008/2/24/281
Handled-By      : Thomas Gleixner <tglx@linutronix.de>

This entry is being used for tracking a regression from 2.6.24.  Please don't
close it until the problem is fixed in the mainline.
Comment 1 Rafael J. Wysocki 2008-03-18 17:46:27 UTC
References : http://lkml.org/lkml/2008/3/18/1

Caused by:

commit 1ada5cba6a0318f90e45b38557e7b5206a9cba38
Author: Andi Kleen <ak@suse.de>
Date:   Wed Jan 30 13:30:02 2008 +0100

    clocksource: make clocksource watchdog cycle through online CPUs
Comment 2 Andi Kleen 2008-03-19 03:31:25 UTC
Well if the cycling watchdog detects an inconsistency the clock perhaps
_ought_ to be marked unstable?  If it is really not consistent over CPUs
marking it unstable is the right thing to do, short of finding out what
makes it inconsistent.
Comment 4 Andi Kleen 2008-03-20 04:35:52 UTC
Hmm why exactly was it reverted? It is unclear from the lkml reference
and the git commit message is also not very enlightening. Also nobody cc
ed me on anything.

Unless there is some bug in the watchdog itself reverting it will just hide 
whatever problem it showed so would be absolutely the wrong fix.

I suspect his machine really had some inconsistency between CPUs and you
just shot the messenger.
Comment 5 Andi Kleen 2008-03-20 04:36:46 UTC
My recommendation would be to reopen the bug, but I don't have the rights for that.
Comment 6 Rafael J. Wysocki 2008-03-20 07:19:17 UTC
Okay, but I'm removing it from the list of recent regressions.
Comment 7 Rafael J. Wysocki 2008-03-20 07:27:32 UTC
(In reply to comment #4)
> Hmm why exactly was it reverted?

The reporter observed undesired behavior that was not present with 2.6.24 and identified the commit that caused it to happen, AFAICS.  Still, the revert was from Andrew, so you should better ask him.
Comment 8 Andi Kleen 2008-03-20 07:41:12 UTC
Ok Andrew can you tell us why you reverted it? I think you just shot the messenger who exposed a previously hidden problem.

I think it is better assigned to Thomas. If he determines that my original
patch for the watchdog was broken I would be happy to take a look at that but it would surprise me if that was the case.
Comment 9 Gabriel C 2008-03-21 08:30:17 UTC
Andi I've CC'ed you on that.. ( Andi Kleen <ak@suse.de> ) Is that the wrong email ?

Also from here on you got CC'ed on each email I've send http://lkml.org/lkml/2008/3/18/1

 
Comment 10 Thomas Gleixner 2008-03-21 09:05:02 UTC
we definitely want to look deeper into this. The patch looks pretty innocent and I definitely want to know why exactly it triggers on your machine.
Comment 11 Andi Kleen 2008-03-21 09:19:38 UTC
Re #9: I meant I was not cc'ed on whatever thread discussed the decision to revert the watchdog patch.
Comment 12 Gabriel C 2008-03-21 09:25:20 UTC
(In reply to comment #11)
> Re #9: I meant I was not cc'ed on whatever thread discussed the decision to
> revert the watchdog patch.
> 

Ach ok , I got that wrong sorry
Comment 13 Gabriel C 2008-03-21 09:39:07 UTC
(In reply to comment #10)
> we definitely want to look deeper into this. The patch looks pretty innocent
> and I definitely want to know why exactly it triggers on your machine.
> 

Sure. 

As said on LKML , I can test any sort patches on that box at any time ( oh well when I'm home : ) )