8870 – legacy rtc makes computer freeze

Bug 8870 - legacy rtc makes computer freeze

Summary: legacy rtc makes computer freeze

Status:	CLOSED CODE_FIX

Alias:	None

Product:	Platform Specific/Hardware
Classification:	Unclassified
Component:	x86-64 (show other bugs)
Hardware:	All Linux

Importance:	P1 normal
Assignee:	platform_x86_64@kernel-bugs.osdl.org

URL:
Keywords:

Depends on:
Blocks:

Reported:	2007-08-09 09:20 UTC by Marcus
Modified:	2008-02-24 23:10 UTC (History)
CC List:	11 users (show)

See Also:
Kernel Version:	2.6.22
Subsystem:
Regression:	---
Bisected commit-id:

Attachments
dmesg from i386 kernel 2.6.22.5 (41.96 KB, text/plain) 2007-08-23 12:57 UTC, Marcus	Details
dmesg from x86_64 kernel 2.6.22.5 (41.14 KB, text/plain) 2007-08-23 13:10 UTC, Marcus	Details
/proc/interrupts for i386 kernel 2.6.22.5 (494 bytes, text/plain) 2007-08-23 13:13 UTC, Marcus	Details
/proc/interrupts for x86_64 kernel 2.6.22.5 (478 bytes, text/plain) 2007-08-23 13:13 UTC, Marcus	Details
Add an attachment (proposed patch, testcase, etc.)

Description Marcus 2007-08-09 09:20:19 UTC

Most recent kernel where this bug did not occur:2.6.22
Distribution: Ubuntu 64bit 7.4 or Gentoo
Hardware Environment: HP Pavilion dv9398eu, chipset: nvidia MCP51 proc: AMDTurion64x2, bios: F.38
Software Environment: Ubuntu 7.0
Problem Description:
Computer freezes almost every time I (or boot scripts) run hwclock. Problem disappears with i386-kernel or if noapic or acpi=off kernel option is used. hwclock works if almost no modules are loaded. noapic or acpi=off introduces other problems, such as usb not working. If rtc is disabled in kernel then hwclock doesn't freeze computer, but no time is reported.

Steps to reproduce: run hwclock

Comment 1 Andrew Morton 2007-08-09 12:07:08 UTC

According to your report, this bug is present in 2.6.22, and was not
present in 2.6.22.  Please clarify: Most recent kernel where this bug did not occur?

Comment 2 Marcus 2007-08-09 23:30:29 UTC

That was a mistake. I don't know most recent kernel where this bug did not occur.

Comment 3 Marcus 2007-08-09 23:33:55 UTC

This bug is similar to 8186, but that bug is closed and considered as solved, which I don't consider.

Comment 4 Marcus 2007-08-16 10:47:23 UTC

I can test patches or provide more info. I want this problem resolved.

Comment 5 Andi Kleen 2007-08-22 17:33:31 UTC

Does the i386 kernel run with apic on too?

Perhaps provide us with full dmesg output of both the i386 kernel
and the x86_64 kernel.

You see the freeze with just running "hwclock" without options?

Comment 6 Marcus 2007-08-23 12:53:30 UTC

The i386 kernel works with apic, x86_64 doesn't. No options are needed, only "hwclock".

Comment 7 Marcus 2007-08-23 12:57:18 UTC

Created attachment 12510 [details]
dmesg from i386 kernel 2.6.22.5

Comment 8 Marcus 2007-08-23 13:10:23 UTC

Created attachment 12511 [details]
dmesg from x86_64 kernel 2.6.22.5

Comment 9 Marcus 2007-08-23 13:13:11 UTC

Created attachment 12512 [details]
/proc/interrupts for i386 kernel 2.6.22.5

Comment 10 Marcus 2007-08-23 13:13:59 UTC

Created attachment 12513 [details]
/proc/interrupts for x86_64 kernel 2.6.22.5

Comment 11 Marcus 2007-09-01 09:19:52 UTC

Do you need anything else?

Comment 12 Andi Kleen 2007-09-04 07:16:40 UTC

Could you please also try the i386 hwclock binary on the 64bit kernel?
Does it cause the freeze also?

Comment 13 Marcus 2007-09-07 12:52:19 UTC

32bit hwclock seems to work on 64bit kernel, but it doesn't report any time.

I logged the output from hwclock with --debug option:

The first 7 output lines were the same for all configurations,

hwclock from util-linux-2.12r
Using /dev/rtc interface to clock.
Last drift adjustment done at 1187894090 seconds after 1969
Last calibration done at 1187894090 seconds after 1969
Hardware clock is on local time
Assuming hardware clock is kept in local time.
Waiting for clock tick...

32bit hwclock on 32bit kernel:

...got clock tick
Time read from Hardware Clock: 2007/09/07 21:01:44
Hw clock time : 2007/09/07 21:01:44 = 1189191704 seconds since 1969
Fri Sep  7 21:01:44 2007  -0.015954 seconds

64bit hwclock on 64bit kernel (when it doesn't freeze)

/dev/rtc does not have interrupt functions. Waiting in loop for time from /dev/rtc to change
...got clock tick
Time read from Hardware Clock: 2007/09/07 20:48:33
Hw clock time : 2007/09/07 20:48:33 = 1189190913 seconds since 1969
Fri Sep  7 20:48:33 2007  -0.767138 seconds

if kernel freezes, the line "/dev/rtc does not have interrupt functions. Waiting in loop for time from /dev/rtc to change" is the last output. 

32bit hwclock on 64bit kernel:

select() to /dev/rtc to wait for clock tick timed out
...got clock tick

I also noticed that the interrupt count is always zero for rtc-interrupt on the 64bit kernel.

Comment 14 Marcus 2007-10-31 12:24:08 UTC

Any news here?

I have found more users having problem with RTC. 

I also tried the same thing in gentoo. Here both the 32bit and 64bit hwclock makes the computer freeze in a 64bit kernel.

The only difference is that the 32bit hwclock can make use of --directisa, which doesn't freeze the kernel.

Comment 15 Andrew Morton 2007-10-31 13:32:26 UTC

(lots of cc's added)

Guys, could you please take a look?  It's a box-killer...

Thanks.

Comment 16 David Brownell 2007-10-31 13:58:36 UTC

That "/dev/rtc does not have interrupt functions" thing may be a symptom of the same problem killing the box.  I see the 64 bit kernel is coming up with HPET in "broken" mode (using the RTC IRQ), and may have the drivers/char/rtc.c driver enabled (it seems to be listed in /proc/interrupts).

The "hpet_resources: 0xfed00000 is busy" is also suspicious...

Try disabling HPET and see if that helps.

(I don't know what happened to the patches which make HPET come up in "sane" mode whereby it doesn't *silently* interfere with the RTC.)

Comment 17 Marcus 2007-11-05 12:06:04 UTC

How to disable HPET?  I managed to unset CONFIG_HPET, but CONFIG_HPET_TIMER cant be unset, because the scripts in the source tree seems to restore it to CONFIG_HPET_TIMER=y.

Unseting CONFIG_HPET didn't make any difference.

Comment 18 David Brownell 2007-11-06 00:08:12 UTC

Hmm, I see that x86_64 forces HPET_TIMER on in all cases, which in turn forces on the HPET_EMULATE_RTC option whenever you use the legacy RTC driver, which in turn is what I think is causing your freeze.  But i386 puts that choice in user hands.  The x86_64 treatment looks, on first glance, broken in at least the "needless i386 incompatibility" mode.  (Behaves on my x86_64 system, but that has no HPET and doesn't use the legacy driver ...)

Try editing arch/x86/Kconfig.x86_64 to make the entry for HPET_TMER give a prompt; i386 uses:  'bool "HPET Timer Support"'.  Then you should be able to disable HPET_TIMER.  I'd hope that would let you build a new and working kernel.  Or maybe HPET_TIMER should on HPET itself.  It seems there are too many HPET variables and they don't play together well.

Another thing you might try, again leaving HPET off:  just use the rtc-cmos driver, link it statically and disable the legacy drivers/char/rtc.c driver.

Comment 19 Marcus 2007-11-11 02:40:40 UTC

I managed to unset both CONFIG_HPET and CONFIG_HPET_TIMER, but that didn't help.

I haven't tried the rtc-cmos yet.

I found that now the i386 kernel freezes too. So I went back to the i386 kernel i used when i reported this bug and i started to run hwclock continuously while doing other things, and after a while the kernel freezed. But its still much more easy to make the x86_64 kernel freeze. And it looks like if I enable CONFIG_HIGH_RES_TIMERS the i386 kernel freezes more easily. 

Disabling CONFIG_RTC or adding noapic or acpi=off still seems to be the only way to make it stable.

Comment 20 David Brownell 2007-11-17 09:58:55 UTC

I see that 2.6.24-rc3 checked in some fixes, one of which fixed an x86_64 freeze related to NTP updating the clock.  See if that kernel works better for you.

Comment 21 David Brownell 2007-11-17 10:01:45 UTC

Whoops, correction.  Those fixes are in current GIT, but right AFTER the RC3 tag was applied.  So try current git, not RC3.

Comment 22 Ingo Molnar 2007-11-17 10:06:29 UTC

it's this commit:

 commit c399da0d97e06803e51085ec076b63a3168aad1b
 Author: David P. Reed <dpreed@reed.com>
 Date:   Wed Nov 14 17:47:35 2007 -0500

     x86: fix freeze in x86_64 RTC update code in time_64.c

pull the latest kernel and type "git-log c399da0d". If that lists the 
above commit then you've got the right fix included.

Comment 23 David Brownell 2008-01-23 23:10:46 UTC

taking myself off this bug as I haven't touched any of the relevant code...

Comment 24 Ramon Antonio Parada 2008-01-30 10:22:38 UTC

Problem seems to affect all HP dv6xxx and HP tx1xxx at least which share in common  nVidia MCP51 Chipset. You can google for "dv6000 irq" and "tx1000 irq" or "tx1000 timer" to see forum discussions with more than 50 pages of posts of HUNDREDS of users having related problems with this chipset.

I use 32 bit 2.6.24 and problem still continues.

Comment 25 H. Peter Anvin 2008-01-30 10:49:54 UTC

The actual common denominator seems to be an embedded controller which comes with incompetently programmed firmware from the manufacturer.  The chipset is not at fault.

Comment 26 H. Peter Anvin 2008-01-30 10:57:37 UTC

The problem is that the embedded controller apparently queues up writes to port 0x80, the debug port, which Linux (and a lot of other software) uses as a delay port.  Worse, it stops (waiting for a drain?) if its internal queue fills up.

A patch has been floating around LKML which hacks around it (changing the delay port to 0xed), but that's a platform-specific hack, with the resulting issues.

Comment 27 Thomas Gleixner 2008-01-30 11:05:07 UTC

> The problem is that the embedded controller apparently queues up writes to
> port
> 0x80, the debug port, which Linux (and a lot of other software) uses as a
> delay
> port.  Worse, it stops (waiting for a drain?) if its internal queue fills up.
> 
> A patch has been floating around LKML which hacks around it (changing the
> delay
> port to 0xed), but that's a platform-specific hack, with the resulting
> issues.

The patch is in mailine now. commits

 b02aae9cf52956dfe1bec73f77f81a3d05d3902b
 ..
 f9fc58910ebc448b0b7d37af1bf57a896a78e9c4

Thanks,
	tglx

Comment 28 H. Peter Anvin 2008-01-30 11:26:56 UTC

It looks like d0049e71c6e14a3b0a5b8cedaa1325a1a91fecb0 was never meant to go into mainstream... this might be a problem.

Comment 29 Ramon Antonio Parada 2008-01-30 11:58:28 UTC

Don't know how to get working an specific commit.
Fount patch at http://lkml.org/lkml/2007/12/13/591

Could you assist us using 32 bit how to deal with this now? Should report another separated bug?

Comment 30 Ingo Molnar 2008-02-01 08:11:00 UTC

> Could you assist us using 32 bit how to deal with this now? Should 
> report another separated bug?

try the io_delay=0x80 boot option - does it change things?

Comment 31 Ramon Antonio Parada 2008-02-01 14:29:05 UTC

You mean io_delay=0xed?

I have tryed with 0x80, 0xed, alternate (seen somewhere) but kernel 2.6.24 seems to ignore this argument. No change, no confirmation message at dmesg.

I have also tried 2.6.24-git10 which includes specific patch for this issue and still didn't got a freeze.

I trying to check if completely solves the issue. I'm unable to check the hwclock problem reported by many people. If I use framebuffer I do not get freeze, if I don't use framebuffer always get freeze during boot so it's impossible to check it.

Comment 32 Ramon Antonio Parada 2008-02-01 17:43:10 UTC

io_delay attribute was included in 2.6.24-rc1 and removed for final version. Also reenabled in 2.6.24-git9

Comment 33 Rene Herman 2008-02-03 07:16:27 UTC

No, it wasn't in 2.6.24-rc1. A minimal version of it was submitted to Linus for 2.4.24-rc6 but he didn't want it. It's in post 2.6.24 though; just as a temporary measure until the port 0x80 use has been sanitized (limited to ISA drivers) after which it won't be needed anymore. Not sure therefore when it's going to go again, but it's a not a longterm thing.

Comment 34 Marcus 2008-02-11 02:54:21 UTC

I have now tried 2.6.24-git21 and it looks like the problem is gone both for 32bit and 64bit kernel!!!

I also looked at /proc/sys/kernel/io_delay_type and it was set to 1. So the kernel understands that it should use 0xed instead of 0x80. Great job!!

Note You need to log in before you can comment on or make changes to this bug.