Kernel Bug Tracker – Bug 35772
linux-image-2.6.38-2-686 (Debian): Horrible Time Skew, Eventual Near-zero Responsiveness
Last modified: 2012-05-26 22:10:16 UTC
Created attachment 59292 [details]
Debian bug tracker log mailbox for #626427
This is initially basically a forward of Debian bug #626427 whose log can be found here:
(Also attached as plaintext mailbox.)
The braille display is away for now, but it still managed to freeze after a while longer without BRLTTY running. This is using 2.6.39 RC6.
In summary, the system will slow right down after an unspecified time, but more frequently with BRLTTY running, it seems. Touching a backspace at an empty shell line will result in a continuous beeping from the system speaker without pause, but dnscache still serves data, even if no process or thread will start. It will require a hard reset to make the system be usable again. The other important manifestation is horrible clock drift. It looks like a regression from 2.6.32 in the acpi_pm clocksource driver. The clocksource selected automatically was tsc, though, after the kernel upgrade; it was acpi_pm before then in 2.6.32, giving 0.13s drift, which I suppose is acceptable on my hardware. Now it's somewhere like 500s behind at every update on tsc, and less drastic but unusable back-and-forth with acpi_pm.
if you boot with idle=halt and use the tsc as the clocksource,
does the system time behave correctly?
clocksource=hpet does what
clocksource=pit does what?
clocksource=jiffies does what?
the system is in pretty bad shape if acpi_pm must be used as a clocksource...
(In reply to comment #1)
> if you boot with idle=halt and use the tsc as the clocksource,
> does the system time behave correctly?
Eventually. It seems to take two or three "Skew change nnn exceeds limit" with the clock being forced to the absolute time, but it will then stop drifting sufficient to even make openntpd show adjustments. However, the freezes still happen, but just less frequently.
> clocksource=hpet does what
Falls back to acpi_pm.
> clocksource=pit does what?
Works just like idle=halt clocksource=tsc, in clock behaviour. Not sure about the freezing, though, yet.
> clocksource=jiffies does what?
As with clocksource=pit.
> the system is in pretty bad shape if acpi_pm must be used as a clocksource...
In what sense, hardware?
BTW: acpi_pm caused freezes very quickly. It's next on my list to try idle=halt with all the clock sources besides tsc, if that'll help.
> DMI: Micronpc.com Millennia GS/694X-596B-977, BIOS 4.51 PG 10/20/99
This machine is really from 1999?
Please boot with "acpi=off" and report if things are different,
and attach the complete dmesg.
Please attach the output from acpidump.
(In reply to comment #3)
> > DMI: Micronpc.com Millennia GS/694X-596B-977, BIOS 4.51 PG 10/20/99
> This machine is really from 1999?
Really. Earlier, if you count the upgraded mainboard and hard disk, by about a year. Does this surprise you? :-)
Do you know about any BIOS updates for this thing? Even the extended int13h calls are sketchy.
But it still works. Why chuck it?
> Please boot with "acpi=off" and report if things are different,
> and attach the complete dmesg.
> Please attach the output from acpidump.
Substantially no different than clocksource=tsc idle=halt, that being that skews settle down to nothing after about the third absolute clock set by ntpd because the drift is too great. Then after a whileof decent operation, it "Freezes".
Created attachment 63272 [details]
Acpidump output, acpi=off
Created attachment 63282 [details]
Dmesg output, acpi=off
It's great that the kernel bugzilla is back.
Can you please verify if the problem still exists in the latest upstream
bug closed as there is no response from the bug reporter.
please feel free to reopen it if the problem still exists in the latest upstream kernel.
From Sabahattin Gucukoglu at http://bugs.debian.org/626427:
| OK. Please understand that I'm a bit busy at the moment, so
| I can't devote as much time as I'd like. However, I did a
| full upgrade to the latest kernel in Sid, 3.2.0-2-686-pae and
| got basically the same results, the same time skew, complete
| absence of any indication, and eventually the same lockup.
| Right down to the beeeeeeeeeeeeeeep of the console speaker.
| The available and current clocksource was set to acpi_pm.
| However, this showed up on one occasion when it seemed likely
| that it would happen again, only it didn't:
May 26 07:48:03 Bloodstone vmlinux: [ 672.068384] sched: RT throttling activated
| I've included all logs since the last report for Linux and
| ntpd. Hopefully this is absolutely everything you guys
| need to work out what/if something's changed.
| (ATM this box is still my DNS cache and other replaceable
| things, so even if it's dying it's still strictly speaking in
| production :-) ).
The logs attached span three boots:
14 March or so: probably Debian 3.2.10-1
15 April: Debian 3.2.14-1
26 May: Debian 3.2.18-1
Created attachment 73419 [details]
ntpd log, 3.2.y kernel, acpi disabled and enabled
Created attachment 73420 [details]
kernel log, 3.2.y kernel, acpi disabled and enabled
As I wrote on the Debian bugtracker, it is not so weird for pre-2000
machines to have buggy ACPI implementations, so having to disable
acpi to get a stable clocksource does not seem so bad.
Hopefully the attached logs can (1) satisfy the curious and (2) help
to figure out if this is pointing to a bug in acpi_pm that might
also affect newer machines and if there is some easy way to detect
problematic machines to automatically switch to another time source.