Bug 35772

Summary: linux-image-2.6.38-2-686 (Debian): Horrible Time Skew, Eventual Near-zero Responsiveness
Product: ACPI Reporter: Sabahattin Gucukoglu (mail-sender-e537f2)
Component: OtherAssignee: Lan Tianyu (tianyu.lan)
Status: CLOSED INSUFFICIENT_DATA    
Severity: high CC: ben, jrnieder, lenb, rui.zhang
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: 2.6.39-rc6 Subsystem:
Regression: Yes Bisected commit-id:
Attachments: Debian bug tracker log mailbox for #626427
Acpidump output, acpi=off
Dmesg output, acpi=off
ntpd log, 3.2.y kernel, acpi disabled and enabled
kernel log, 3.2.y kernel, acpi disabled and enabled

Description Sabahattin Gucukoglu 2011-05-24 22:49:03 UTC
Created attachment 59292 [details]
Debian bug tracker log mailbox for #626427

This is initially basically a forward of Debian bug #626427 whose log can be found here:
http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=626427
(Also attached as plaintext mailbox.)

The braille display is away for now, but it still managed to freeze after a while longer without BRLTTY running.  This is using 2.6.39 RC6.

In summary, the system will slow right down after an unspecified time, but more frequently with BRLTTY running, it seems.  Touching a backspace at an empty shell line will result in a continuous beeping from the system speaker without pause, but dnscache still serves data, even if no process or thread will start.  It will require a hard reset to make the system be usable again.  The other important manifestation is horrible clock drift.  It looks like a regression from 2.6.32 in the acpi_pm clocksource driver.  The clocksource selected automatically was tsc, though, after the kernel upgrade; it was acpi_pm before then in 2.6.32, giving 0.13s drift, which I suppose is acceptable on my hardware.  Now it's somewhere like 500s behind at every update on tsc, and less drastic but unusable back-and-forth with acpi_pm.
Comment 1 Len Brown 2011-05-31 01:21:38 UTC
if you boot with idle=halt and use the tsc as the clocksource,
does the system time behave correctly?

clocksource=hpet does what

clocksource=pit does what?

clocksource=jiffies does what?

the system is in pretty bad shape if acpi_pm must be used as a clocksource...
Comment 2 Sabahattin Gucukoglu 2011-06-12 02:26:46 UTC
(In reply to comment #1)
> if you boot with idle=halt and use the tsc as the clocksource,
> does the system time behave correctly?

Eventually.  It seems to take two or three "Skew change nnn exceeds limit" with the clock being forced to the absolute time, but it will then stop drifting sufficient to even make openntpd show adjustments.  However, the freezes still happen, but just less frequently.

> clocksource=hpet does what

Falls back to acpi_pm.

> clocksource=pit does what?

Works just like idle=halt clocksource=tsc, in clock behaviour.  Not sure about the freezing, though, yet.

> clocksource=jiffies does what?

As with clocksource=pit.

> the system is in pretty bad shape if acpi_pm must be used as a clocksource...

In what sense, hardware?

BTW: acpi_pm caused freezes very quickly.  It's next on my list to try idle=halt with all the clock sources besides tsc, if that'll help.
Comment 3 Len Brown 2011-06-14 01:54:07 UTC
> DMI: Micronpc.com Millennia GS/694X-596B-977, BIOS 4.51 PG 10/20/99

This machine is really from 1999?

Please boot with "acpi=off" and report if things are different,
and attach the complete dmesg.

Please attach the output from acpidump.
Comment 4 Sabahattin Gucukoglu 2011-06-23 08:18:48 UTC
(In reply to comment #3)
> > DMI: Micronpc.com Millennia GS/694X-596B-977, BIOS 4.51 PG 10/20/99
> 
> This machine is really from 1999?

Really.  Earlier, if you count the upgraded mainboard and hard disk, by about a year.  Does this surprise you? :-)

Do you know about any BIOS updates for this thing?  Even the extended int13h calls are sketchy.

But it still works.  Why chuck it?

> Please boot with "acpi=off" and report if things are different,
> and attach the complete dmesg.
> 
> Please attach the output from acpidump.

Substantially no different than clocksource=tsc idle=halt, that being that skews settle down to nothing after about the third absolute clock set by ntpd because the drift is too great.  Then after a whileof decent operation, it "Freezes".
Comment 5 Sabahattin Gucukoglu 2011-06-23 08:24:34 UTC
Created attachment 63272 [details]
Acpidump output, acpi=off
Comment 6 Sabahattin Gucukoglu 2011-06-23 08:29:16 UTC
Created attachment 63282 [details]
Dmesg output, acpi=off
Comment 7 Zhang Rui 2012-01-18 05:17:16 UTC
It's great that the kernel bugzilla is back.

Can you please verify if the problem still exists in the latest upstream
kernel?
Comment 8 Zhang Rui 2012-05-24 07:59:13 UTC
bug closed as there is no response from the bug reporter.
please feel free to reopen it if the problem still exists in the latest upstream kernel.
Comment 9 Jonathan Nieder 2012-05-26 22:03:28 UTC
From Sabahattin Gucukoglu at http://bugs.debian.org/626427:

| OK.  Please understand that I'm a bit busy at the moment, so
| I can't devote as much time as I'd like.  However, I did a
| full upgrade to the latest kernel in Sid, 3.2.0-2-686-pae and
| got basically the same results, the same time skew, complete
| absence of any indication, and eventually the same lockup.
| Right down to the beeeeeeeeeeeeeeep of the console speaker.
| The available and current clocksource was set to acpi_pm.
| However, this showed up on one occasion when it seemed likely
| that it would happen again, only it didn't:
  May 26 07:48:03 Bloodstone vmlinux: [  672.068384] sched: RT throttling activated
|
| I've included all logs since the last report for Linux and
| ntpd.  Hopefully this is absolutely everything you guys
| need to work out what/if something's changed.
[...]
| (ATM this box is still my DNS cache and other replaceable
| things, so even if it's dying it's still strictly speaking in
| production :-) ).

The logs attached span three boots:

 14 March or so: probably Debian 3.2.10-1
 15 April: Debian 3.2.14-1
 26 May: Debian 3.2.18-1
Comment 10 Jonathan Nieder 2012-05-26 22:04:54 UTC
Created attachment 73419 [details]
ntpd log, 3.2.y kernel, acpi disabled and enabled
Comment 11 Jonathan Nieder 2012-05-26 22:05:18 UTC
Created attachment 73420 [details]
kernel log, 3.2.y kernel, acpi disabled and enabled
Comment 12 Jonathan Nieder 2012-05-26 22:10:16 UTC
As I wrote on the Debian bugtracker, it is not so weird for pre-2000
machines to have buggy ACPI implementations, so having to disable
acpi to get a stable clocksource does not seem so bad.

Hopefully the attached logs can (1) satisfy the curious and (2) help
to figure out if this is pointing to a bug in acpi_pm that might
also affect newer machines and if there is some easy way to detect
problematic machines to automatically switch to another time source.