Bug 3927
Description
Enrico Scholz
2004-12-21 14:55:06 UTC
Does this occur when cpufreq is disabled? yes, it still happens when CONFIG_CPU_FREQ is turned off and/or when I boot into runlevel 1 (which does not start any cpufreq daemon). 'Kernel Version' field seems to be limited in length... so: still with 2.6.11-rc1 Could you check /proc/interrupts to see how frequently you're getting timer ticks? You should see the timer interrutp count increase 1000 times a second (measured against your wrist-watch). CC'ing Andi to make sure he's aware of this. they increase twice as fast (2000 times/sec) I've had a similar bug once on a i386 multiprocessor machine. The problem in this case was that the timer interrupt was misconfigured in the APIC and broadcasted to all CPUs, and each CPU did timer processing, which made the time run NRCPUS times as fast. But this one must be different since the machine has only a single CPU. What does grep time /var/log/boot.log say? If it says something like time.c: Using 14.318180 MHz HPET timer. you are using HPET. If yes try "nohpet" see http://www.tu-chemnitz.de/~ensc/hw/amd64/dmesg.txt | time.c: Using 1.193182 MHz PIT timer. Playing with ACPI and APIC is difficultly as both is required to boot the machine. i386 kernel is not working either as it gives lot of interrupts. fwiw, from time to time (especially on CPU or IO intensive tasks??) I see | APIC error on CPU0: 40(40) messages. For now, I worked around the timer problem by ignoring every second timer interrupt. But this is probably not a very portable solution ;) Can you post the full boot log? is http://www.tu-chemnitz.de/~ensc/hw/amd64/dmesg.txt not enough? When not, how can I create a better "boot log"? I do not see the machine before the weekend so I can not provide new logs atm. ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 dfl dfl) ACPI: INT_SRC_OVR (bus 0 bus_irq 9 global_irq 21 low level) ACPI: IRQ0 used by override. ACPI: IRQ2 used by override. Most likely one of these overrides is somehow handled wrong. Does "noapic" help? Could be an ACPI issue. Unfortunately, machine does not boot with 'noapic' or 'acpi=off' :( But this kind of message seems to be common on AMD64; e.g. first hit in google shows http://lists.suse.com/archive/suse-amd64/2004-Jul/0104.html The only unique message seems to be | ACPI: INT_SRC_OVR (bus 0 bus_irq 9 global_irq 21 low level) This looks similar to bugme bug #4442 as well as the lkml thread: http://www.ussg.iu.edu/hypermail/linux/kernel/0504.0/0270.html Booting with "noapic" might help. Could the submitter try that out? Crud, scratch the noapic suggestion. I forgot that had already been tried. Could the submitter try the patch found here: http://www.ussg.iu.edu/hypermail/linux/kernel/0504.0/1625.html yes, this patch or the alternative in http://www.ussg.iu.edu/hypermail/linux/kernel/0504.0/1862.html fixes the double timer frequency. But it disables the NMI watchdog. For reference, my board is an ATI RX480-SB400 found in an HP Pavilion k737.de Is this bug currently reproduceable w/ 2.6.12? Does booting with "no_timer_check" change anything? Another similar issue is in bug #3341 *** Bug 4651 has been marked as a duplicate of this bug. *** *** Bug 5031 has been marked as a duplicate of this bug. *** this appers to be an interrupt configuration bug rather than a "timers" bug. Created attachment 5596 [details]
disable_check_timer.patch vs 2.6.13-rc6
Please test this patch to see if it has any effect.
The patch disables the questionable call to check_timer()
when in ACPI+IOAPIC mode. This patch is not
production ready b/c some NMI gunk is also
(erroneously) in check_timer().
This patch is working AOK for me so far. processor : 0 vendor_id : AuthenticAMD cpu family : 15 model : 28 model name : Mobile AMD Sempron(tm) Processor 2800+ stepping : 0 cpu MHz : 800.129 cache size : 256 KB fdiv_bug : no hlt_bug : no f00f_bug : no coma_bug : no fpu : yes fpu_exception : yes cpuid level : 1 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 syscall nx mmxext fxsr_opt 3dnowext 3dnow lahf_lm bogomips : 1602.80 0000:00:00.0 Host bridge: ATI Technologies Inc: Unknown device 5950 (rev 01) 0000:00:01.0 PCI bridge: ATI Technologies Inc: Unknown device 5a3f 0000:00:13.0 USB Controller: ATI Technologies Inc: Unknown device 4374 0000:00:13.1 USB Controller: ATI Technologies Inc: Unknown device 4375 0000:00:13.2 USB Controller: ATI Technologies Inc: Unknown device 4373 0000:00:14.0 SMBus: ATI Technologies Inc: Unknown device 4372 (rev 11) 0000:00:14.1 IDE interface: ATI Technologies Inc: Unknown device 4376 0000:00:14.3 ISA bridge: ATI Technologies Inc: Unknown device 4377 0000:00:14.4 PCI bridge: ATI Technologies Inc: Unknown device 4371 0000:00:14.5 Multimedia audio controller: ATI Technologies Inc: Unknown device 4370 (rev 02) 0000:00:14.6 Modem: ATI Technologies Inc: Unknown device 4378 (rev 02) 0000:00:18.0 Host bridge: Advanced Micro Devices [AMD] K8 NorthBridge 0000:00:18.1 Host bridge: Advanced Micro Devices [AMD] K8 NorthBridge 0000:00:18.2 Host bridge: Advanced Micro Devices [AMD] K8 NorthBridge 0000:00:18.3 Host bridge: Advanced Micro Devices [AMD] K8 NorthBridge 0000:01:05.0 VGA compatible controller: ATI Technologies Inc: Unknown device 5955 0000:05:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL-8139/8139C/8139C+ (rev 10) 0000:05:09.0 CardBus bridge: Texas Instruments: Unknown device 8031 Can you show dmesg output with the patch? Created attachment 5614 [details]
dmesg output from patch (debug off)
As requested.
This patch doesn't seem to work on an HP nx6125 which exhibits these symptoms (AMD64, ATI chipset). If I apply it, the last line I get from the kernel is ACPI: Embedded Controller [C110] (gpe 17) and then the machine hangs. I'll see if I can sort out a serial port on it and get a dump of the entire boot. Presumably something in check_timer() is necessary for this machine to work. (Attempting to boot with acpi=off or noapic oopses the kernel - I'll try to track that down at some later stage) After further checking - the patch in http://www.ussg.iu.edu/hypermail/linux/kernel/0504.0/1625.html works fine. Skipping check_timer() entirely doesn't. Full bootlog for failure with the patch please. Created attachment 5643 [details] dmesg from nx6125 with patch applied dmesg output from the failure case supplied. This is with the patch from comment 22. Boot freezes at this point. The patch from comment 14 works. *** Bug 4092 has been marked as a duplicate of this bug. *** *** Bug 4442 has been marked as a duplicate of this bug. *** Keith, your dmesg shows that you turned off the APIC (with the Created attachment 5705 [details]
patch, /proc/interrupts, and dmesg
I made some random changes and made the following observation.
If I don
Created attachment 5755 [details]
Patch that works around the problem
This patch fixes things on my nx6125, and doesn't seem to interfere with any
other code. This surely can't be the correct solution?
Andi: Any comments on the last patch? Mathew: Would you consider RFC'ing that patch to lkml to get wider testing and feedback? Hmm, maybe. It looks a bit dubious, but could be it. Would need a lot of testing, also on non ATI chipsets. Created attachment 5788 [details] Lost ticks and APIC errors The patch from comment #34 stops the double timer interrupts on my machine. I still get messages about lost ticks, although less than with my patch but from more sources. Furthermore I still receive occasional APIC errors (APIC error on CPU0: 40(40)), which my patch stopped. The lost ticks and the APIC errors may be related because a lost tick is often (though not always) immediately preceded by an APIC error. From the results so far I would guess the following: the timer is connected to pin 0 of the I/O APIC and the output of the PIC is connected to pin 2 of the I/O APIC (so that the timer override is bogus) and to LINT0 of the local APIC; however, somehow masking LINT0 in check_timer() takes no effect. This setup would explain all effects I encountered on my machine, namely why: - booting without any kernel parameters causes double IRQ 0; - disabling IRQ 0 in the PIC stops IRQ 0 altogether; - enabling IRQ 1 in the PIC increases IRQ 0 when hitting the keyboard; - booting with acpi_skip_timer_override still causes double IRQ 0; - and booting with acpi_skip_timer_override and disabling IRQ 0 in the PIC results in normal IRQ 0. Any idea how to verify or dismiss this hypothesis? What about opening the case and track the traces on the mainboard? I am hitting this problem running an i386 2.6.13 kernel on a Compaq Presario V2312US. Adapting the patch from comment #34 to i386 solves it. ooh, progress. Can other reporters please test http://bugzilla.kernel.org/attachment.cgi?id=5755&action=view ? I'm running an Acer Aspire 5024 with the same problem. The patch http://bugzilla.kernel.org/attachment.cgi?id=5755&action=view solves it. I tried the patch from comment #34 and it caused my EMT64 box to hang partway through booting. Hmm, isn This patch (http://bugme.osdl.org/attachment.cgi?id=5755&action=view) seems to solve the double-speed system clock over here. I still get the annoying lost ticks message: Losing some ticks... checking if CPU frequency changed. And the "standard" APIC error (27 lines worth at the moment): APIC error on CPU0: 40(40) I'm not using frequency scaling. dmesg output or whatever else available upon request. Kernel is vanilla 2.6.13 other thanthe applied patch. Created attachment 5940 [details]
kernel .config and boot log, /proc/interrupts
I've tried this patch on a Compaq V2000 laptop with AMD Turion 64 / ATI, but to
no avail. I' patched Debian's 2.6.12-6 I tried booting this kernel both with
and without acpi_skip_timer_override
I'm running a Compaq V2000 laptop with AMD Turion 64 and ATI. With this patch,
my timer is still running at double-time. I tried acpi_skip_timer_override as
well, but to no avail.
The attachment is my latest /var/log/boot, /proc/interrupts, and the kernel's
.config
How about 'noapic' ? Looks like I didn't do enough testing. It works fine with noapic even without that latest of posted patches. bugme-daemon@kernel-bugs.osdl.org wrote: > http://bugzilla.kernel.org/show_bug.cgi?id=3927 > > > > > > ------- Additional Comments From zwane@arm.linux.org.uk 2005-09-08 18:02 ------- > How about 'noapic' ? > > ------- You are receiving this mail because: ------- > You are on the CC list for the bug, or are watching someone who is. noapic doesn't work for all people with this problems - sometimes it causes the machine not to boot (see also https://bugzilla.novell.com/show_bug.cgi?id=113323) I suspect the best course of action is to extend check_ioapic to check for ATI bridges and then enable the change from comment #34. Still the APIC errors that are caused by this are bad so it's probably still not a very good workaround Best would be to figure out how Windows programs the hardware, that tends to be most tested (= reliable) way. Right. The basic problem seems to be that we're getting two timer ticks when we should be getting one. My patch disables one of these, but produces APIC errors. Presumably we actually want to be disabling the other one, if someone could work out where it was coming from. I have a copy of Windows installed on the test machine I have - is there any easy way to dump the APIC and legacy PIC state under Windows? I have reports that this behaviour has also been observed in Windows XP too. Has anyone from this bug report observed it on their systems with Windows? Created attachment 5943 [details]
Extend exisitng quirk code to apply fix only to ATI chipsets
I attached a patch that selectiviely applies the only known fix to ATI
chipsets.
And I've only gotten the APIC errors others are seeing one time, before I added
/sbin/hdparm -a 8 -m 8 -u 1 -d 1 -c 1 /dev/hda
in /etc/rc.local
It's wrong to make this dependent on CONFIG_ACPI - it should be independent on ACPI Also for i386 it would need to be in a code path outside acpi/ FWIIW: - Windows XP SP2 on my machine doesn Created attachment 5959 [details]
PIC, local APIC, I/O APIC, and IDT on Windows XP SP2
It has been suggested to find out what Windows does on
a double timer machine. You can use Microsoft
I have to correct my statement in comment # 54 about the APIC errors: they go away only if boot with acpi_skip_timer_override and disable the PIC; booting with acpi_skip_timer_override and disabling the I/O APIC pin does not stop them. If you are hitting this bug, please go to Bugzilla #3927 and post the output of lspci -n -s 00:00.0 If your vendor:product have already been reported, please don't report again. $ lspci -n -s 00:00.0 00:00.0 Class 0600: 1002:5950 If it helps here is my output, cryos-lap ~ # lspci -n -s 00:00.0 0000:00:00.0 Class 0600: 1002:5951 (rev 01) Using no_timer_check fixes this issue, I am also getting APIC error on CPU0: 40 (40) errors. This is an Acer Ferrari 4005 laptop using the turion/ATI chipset combination. lspci -n -s 00:00.0 --> 00:00.0 Class 0600: 1002:5950 -- [Compaq Presario R4000 series (R4035)] It has already been reported in AC#58. I just wanted to add that no_timer_check partially fixes the problem. The timer is now ticking all right but then I get this message at boot : ..MP-BIOS bug: 8254 timer not connected to IO-APIC failed. timer doesn't work through the IO-APIC - disabling NMI Watchdog! Uhhuh. NMI received for unknown reason 31. Dazed and confused, but trying to continue Do you have a strange power saving mode enabled? works. Using local APIC timer interrupts. Detected 12.464 MHz APIC timer. testing NMI watchdog ... CPU#0: NMI appears to be stuck (1->1)! -- With nmi_watchdog=0 as kernel param, it says : ..MP-BIOS bug: 8254 timer not connected to IO-APIC failed. works. Using local APIC timer interrupts. Detected 12.464 MHz APIC timer. testing NMI watchdog ... CPU#0: NMI appears to be stuck (0->0)! -- Finally, I just wanted to report that the DSDT has many errors too (disassembling - recompiling with iasl gives 25 errors and 3 warnings). Perhaps the bug feeds on this. I'm trying to work around them now. I can post my DSDT and/or my recompile output if anyone asks. [root@]# lspci -s 00:00.0 -n 00:00.0 Class 0600: 1002:5950 (rev 01) [root@]# lspci -s 00:00.0 -vvx 00:00.0 Host bridge: ATI Technologies Inc: Unknown device 5950 (rev 01) Subsystem: Hewlett-Packard Company: Unknown device 2a20 Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- Status: Cap- 66Mhz+ UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort+ >SERR- <PERR- Latency: 64 Region 2: I/O ports at 4100 [disabled] [size=32] Region 3: Memory at <ignored> (64-bit, non-prefetchable) [size=512M] 00: 02 10 50 59 06 00 20 22 01 00 00 06 00 40 00 00 10: 00 00 00 00 00 00 00 00 01 41 00 00 04 00 00 e0 20: 00 00 00 00 00 00 00 00 00 00 00 00 3c 10 20 2a 30: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 Yesterday I could not even get an audio cd to play, but when I did before, I think it was playing at high speed. Could the timer problem be causing other problems too? HP Pavilion a1130n - 3500+ Mandriva LE2005 kernel 2.6.11-6mdk Stan Downstream bug report: http://bugs.gentoo.org/show_bug.cgi?id=104789 # lspci -n -s 00:00.0 0000:00:00.0 Class 0600: 1002:5950 (rev 01) # lspci -s 00:00.0 -vvx 0000:00:00.0 Host bridge: ATI Technologies Inc: Unknown device 5950 (rev 01) Subsystem: Hewlett-Packard Company: Unknown device 2a20 Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- Status: Cap- 66Mhz+ UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort+ >SERR- <PERR- Latency: 64 Region 2: I/O ports at 4100 [disabled] [size=32] Region 3: Memory at <ignored> (64-bit, non-prefetchable) 00: 02 10 50 59 06 00 20 22 01 00 00 06 00 40 00 00 10: 00 00 00 00 00 00 00 00 01 41 00 00 04 00 00 e0 20: 00 00 00 00 00 00 00 00 00 00 00 00 3c 10 20 2a 30: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 *** Bug 5252 has been marked as a duplicate of this bug. *** Created attachment 6061 [details]
New patch
How about this? It only works in the amd64 case, but should only trigger on the
affected machines.
(I've removed the timer override, too - it seems to be bogus, and on the nx6125
has the side effect that the ACPI thermal trip values are all 16 degrees C.
Thanks, HP)
I am testeing kernel 2.6.13.2 from SUSE Kernel of the day in Suse 9.3 altair:~ # uname -a Linux altair 2.6.13.2-20050928132226-default #1 Wed Sep 28 13:22:26 UTC 2005 x86_64 x86_64 x86_64 GNU/Linux The problem with the clock has disapeared. altair:~ # cat /proc/interrupts CPU0 0: 5273574 IO-APIC-edge timer 1: 26496 IO-APIC-edge i8042 8: 37612 IO-APIC-edge rtc 12: 314691 IO-APIC-edge i8042 15: 378468 IO-APIC-edge ide1 169: 3 IO-APIC-level acpi, ohci1394 193: 52556 IO-APIC-level libata 209: 1499114 IO-APIC-level eth0 217: 1692938 IO-APIC-level ehci_hcd:usb1, ohci_hcd:usb2, ohci_hcd:usb3 225: 1848465 IO-APIC-level nvidia 233: 6510 IO-APIC-level ATI IXP NMI: 1369 LOC: 5273779 ERR: 37 MIS: 0 But there are still "Lost ticks and APIC errors" Oct 2 14:38:55 altair kernel: APIC error on CPU0: 00(40) Oct 2 14:42:38 altair kernel: APIC error on CPU0: 40(40) Oct 2 14:46:50 altair kernel: APIC error on CPU0: 40(40) Oct 2 14:46:50 altair kernel: Losing some ticks... checking if CPU frequency changed. Oct 2 14:50:02 altair kernel: APIC error on CPU0: 40(40) Oct 2 14:53:16 altair kernel: APIC error on CPU0: 40(40) (powersave is disabled) hi, i've been trying to get this clock too fast prob solved with my acer ferrari 4005 for quite some time now (http://marc.theaimsgroup.com/?l=linux-kernel&m=112835858711786&w=2). it seems to be working fine with this patch without any further kernel parameter (kernel 2.6.13.3): http://bugzilla.kernel.org/attachment.cgi?id=5943&action=view i also tried patch id=6061 and it didn't work here, because it seems only to apply to x86_64. JG The patch in attachment 6061 [details] (msg #65) applied to linux-2.6.13-1.1526_FC4 fixes
the double-speed clock problem for a Gateway MX7515 running in x86_64.
It also fixes a tapping speed problem with the Synaptics touchpad which made it
difficult to select text, etc. Many thanks Matthew.
Hi All I get the same error, the patch in #6061 doesn't work for me, but if I declare disable_timer_pin_1 to 1 initially, it fixes the problem. I guess this means my chipset isn't being detected as an ATI in the code. I really don't understand why, as my chipset's PCI ID is: 0000:00:00.0 Class 0600: 1002:5950 (rev 01) This does map to PCI_VENDOR_ID_ATI. FYI I'm using 2.6.13 (amd64) on an Optronix K9A200G-MLF board and an Athlon 64. Is this issue still being looked at for inclusion into the kernel? James Additional update: The patch now works for me. I had apic=debug in my kernel command line, and having this sets ioapic_force = 1 in setup.c therefore bypassing the pci bus walk. James Just some extra information for 2.6.14 kernels. The boot parameter, disable_timer_pin_1, seems to work around this problem nicely in 2.6.14 series kernels. Agree, what Richard Mace wrote. On my Machine $ lspci -n -s 00:00.0 0000:00:00.0 0600: 1002:5950 (rev 10) the double timer problem is solved by adding "disable_timer_pin_1" as boot parameter when using the 2.6.14.1 kernel. In opposite, the "disable_timer_pin_1" parameter does not help when using the 2.6.12 kernel (on the same machine). The patch at http://www.firstfloor.org/~andi/timer-routing-1 for 2.6.15rc2 should fix it. Please test. This chipset can also be found on 32-bit systems (some Semprons ship with it), and the problem is also exhibited there. Your patch only seems to touch the x86_64 code? I also have a 32-bit Sempron affected by this bug. Please reopen. On 32bit just don't use APIC. You don't need it there. There are machines shipping which seem to require apic support on amd64, and there are vendors shipping basically identical hardware with either an opteron or a sempron on-board. Shipping distribution kernels without apic support isn't a terribly appealing option, since there's a fairly good chance at least some of these machines will be broken. Hum... I find that hard to believe because the Wireless doesn't work and the computer will not reboot or powerdown when APIC is disabled. (Yes, I'm not getting ACPI and APIC confused.) The major distros (RH,SUSE) ship the default kernel with apic off. Always has been the case. I would be surprised if any distro did it differently because many older 32bit systems and laptops are extremly unhappy with APIC on. Anyways - if you read my l-k email it would be possible to port this fix over to i386 by trying to detect at runtime if the machine is ACPI compliant (e.g. by looking for ACPI tables) and if yes do the changes I did with runtime switches. It's somewhere on my todo list but very low because 64bit is my priority. But it doesn't belong into this bug because it's marked "Other architectures" and that doesn't cover i386. I would appreciate if 64bit users (for which this bug really is, you others are just piggybackers ;-) could report success or failure with this patch though. The problems with apics and 32-bit systems were largely resolved around 2.6.9 when the kernel stopped trying to enable apics even if the BIOS didn't flag them. We (Ubuntu) have had no significant problems shipping with it enabled. Created attachment 6689 [details]
Use PIT based APIC calibration
Does the following patch (against 2.6.15rc2) help?
It changes the APIC calibration to use the PIT instead of the TSC
as reference.
I'm having the same issues with a dual core x2 box i've aquired. I've having trouble with the box and 2.6.15-rc2. IDE hangs. Are these patches applicable to 2.6.14? Specifically the fedora core 4 2.6.14 version of the kernel. I can give them a shot with that kernel. I am having the same problem with an HP nx6152,AMD64/ATI with an Debian stock Sarge 2.6.8-2-386 Kernel. I also have "APIC error on CPU0: 40(40)" errors.The problem does not exist with an Debian Sarge stock 2.4.27-2-386 kernel. But with the 2.4, ACPI is not working at all although it is activated in .config (no /proc/acpi dir). Also with the 2.4 kernel there are sometimes "spurious 8259A interrupt: IRQ7." messages in dmesg. no_timer_check did not help. With noapic or acpi=off the machine is not booting. Hope this helps a bit. No it doesn't help what happens or not happens with ancient non standard kernels. Please test the patches I am posting, otherwise you're just wasting time on this bugzilla. RESULTS OF PATCH (at http://www.firstfloor.org/~andi/timer-routing-1) APPLIED TO KERNEL 2.6.15-rc3 HARDWARE: HP nx6125, AMD Turion 64 ML-34, ATI Radeon express 200M chipset OUTCOME: Failure. Machine hangs on boot at ==> Floppy drive(s): fd0 is 1.44M (I don't know if this is as a result of using 2.6.15-rc3 or the above mentioned patch. All patches applied cleanly.) The patch in http://bugzilla.kernel.org/show_bug.cgi?id=3927#c73, cleanly applied to 2.6.15-rc2, hung my machine. The final line of output (all I had the willingness to hand-copy), was ohci_hcd 0000:00:13.0: irq 177, io mem 0xfe02d000 As it is 2.6.14.3 with the no_timer_check boots and the system clock runs at the proper speed. I get these messages, though: CPU: AMD Athlon(tm) 64 Processor 3200+ stepping 00 ..MP-BIOS bug: 8254 timer not connected to IO-APIC failed. timer doesn't work through the IO-APIC - disabling NMI Watchdog! Uhhuh. NMI received for unknown reason 2d. Dazed and confused, but trying to continue Do you have a strange power saving mode enabled? works. Using local APIC timer interrupts. Detected 12.436 MHz APIC timer. testing NMI watchdog ... CPU#0: NMI appears to be stuck (1->1)! softlockup thread 0 started up. NET: Registered protocol family 16 ACPI: bus type pci registered PCI: Using configuration type 1 PCI: Using MMCONFIG at e0000000 ACPI: Subsystem revision 20050902 ACPI-0339: *** Error: Looking up [\_SB_.PCI0.LPC0.LNK0] in namespace, AE_NOT_FOUND search_node ffff81001fec5480 start_node ffff81001fec5480 return_node 00000000000 00000 ACPI: Interpreter enabled ACPI: Using IOAPIC for interrupt routing ACPI: PCI Root Bridge [PCI0] (0000:00) PCI: Probing PCI hardware (bus 00) ACPI: Assume root bridge [\_SB_.PCI0] bus is 0 PCI: Ignoring BAR0-3 of IDE controller 0000:00:14.1 Re #85 - can you please test if vanilla rc3 works (iirc rc3 has some USB problem that causes breakage). If it doesn't then it's not my patch. There is a USB patch floating around on linux-kernel that might help or wait for tomorrow and use a uptodate -git. #86 - same please. Test if the kernel without the patch works as a baseline. And did you guys test the patch from #81 in addition to the timer routing patch? Also I wonder if you guys have the chipset with the miscalibrated PIT - if yes the patches from http://bugzilla.kernel.org/show_bug.cgi?id=3341 might help (they're only experimental, may require to back the other patches out) All - if that all doesn't help and you have problems could you try to shrink the console font and make a digital photo after the failure? It's hard to debug this without a fuller log. Logs please with with timer-routing and the #81 patch applied. Thanks for your support. If the machine hangs during boot, try the acpi_skip_timer_override kernel command line parameter. Quick report on Bertro's suggestion #88 (quickest to implement ;-). BOOT param: acpi_skip_timer_override HARDWARE: NP nx6125, Turion ML-34, ATI chipset GOOD NEWS: Got 2.6.15-rc3 to boot (glacially slowly) BAD NEWS: All thermal trips set to 16C. CPU got frequency scaled to 800MHz. (Made a cup of tea during boot,... seriously!). Fans blew like the solar wind... My naive interpretation: Perhaps the timer override defeats what Andi's patch is trying to do, but I'll leave that for Andi to comment on. For reference, I have been able to workaround this timer problem with the boot parameter "disable_timer_pin_1". That seems to be the best option for the nx6125, ie., the one with the smallest number of side effects. I'll try some of the other suggestions later, when time permits.... Created attachment 6719 [details]
dmesg (HP nx6125, kernel 2.6.15-rc3 vanilla)
Andi, as requested, a boot with vanilla kernel 2.6.15-rc3. I have used the boot
parameter "disable_timer_pin_1", which works well for me on all kernels >=
2.6.14. Machine boots perfectly, fans working correctly, thermal trips set
correctly. I've attached a dmesg, as requested.
Thermal trips getting set to 16 degrees is a BIOS bug on the nx6125. Well, booting with "disable_timer_pin_1" sets my thermal trips correctly *and* it keeps my timer running correctly, so it does something right. The HP nx6125 also suffers from bug #5534, which (I'll stick my neck out here) is possibly related to this one. I'm not convinced - you see similar bugs to 5534 (to varying extents) on most recent HPs. I tried the patch from comment #73 with 2.6.15-rc3 (but not the PIT-based APIC calibration) and everything worked smoothly for me: the machine boots normally and the timer runs at normal speed. No there were no Created attachment 6724 [details]
dmesg 2.6.15 rc3 with patch from #73
Per Andi's request in <a href="#c87">#87</a>, vanilla 2.6.15-rc2 and 2.6.15-rc3 *without* the <a href="#c73">timer-routing-1 patch</a> both boot. Without no_timer_check, the clock runs twice as fast. With no_timer_check, the clock runs normally. Booting either -rc2 or -rc3 *with* the <a href="#c73">timer-routing-1 patch</a> hangs the box. I've attached a boot log from 2.6.15-rc3 without the no_timer_check option and without timer-routing-1. I've also attached a digital image (sorry if that's bad manners) of the last screenful of output from booting 2.6.15-rc3 _with_ timer-routing-1 applied. I haven't tried the patch in <a href="#c81>#81</a>, yet. That's next. Does it apply on top of <a href="#c73">#73</a> or in lieu of it? Created attachment 6728 [details]
dmesg 2.6.15-rc3 without timer-routing-1 patch
Created attachment 6729 [details]
Boot output from 2.6.15-rc3+timer-routing-1 patch
This is only the final screenful. Screen wouldn't scroll back.
An update to http://bugzilla.kernel.org/show_bug.cgi?id=3927#c96. 2.6.15-rc3 with the patches fromhttp://bugzilla.kernel.org/show_bug.cgi?id=3927#c73 and http://bugzilla.kernel.org/show_bug.cgi?id=3927#c81 still hangs the machine at the same point. However, with acpi_skip_timer_override, the same kernel _doesn't_hang and the clock runs correctly. I've attached the boot log. Created attachment 6730 [details]
dmesg 2.6.15-rc3 with patches from #73, #81 and acpi_skip_timer_override
Created attachment 6736 [details]
dmesg dual core X2 ATI disable_timer_pin_1 no_timer_check hp a1250n 2.6.15-rc3-git1
This is a dmesg file for a HP a1250n dual core AMD X2. kernel 2.6.15-rc3-git1
with minimal APIC error messages.
I've just finished doing some overnight testing with 2.6.15-rc3-git1 and the results after 12 hours are pretty good. I did not include any of the patches in this bug. running with disable_timer_pin_1 and no_timer_check check seems to be working for me. Also it cleared up the constant APIC errors I was having specifically the following: APIC error on CPU0: 00(40) APIC error on CPU1: 00(40) APIC error on CPU0: 40(40) APIC error on CPU1: 40(40) APIC error on CPU1: 40(40) APIC error on CPU1: 40(40) APIC error on CPU1: 40(40) APIC error on CPU0: 40(40) Without no_timer_check (still have diable_timer_pin_1) I get thousands of the errors. Adding no_timer_check to disable_timer_pin_1 I only get a few APIC errors. In over 12 hours I only got less than 50. I have attached my dmesg here http://bugzilla.kernel.org/attachment.cgi?id=6736&action=view Does anyone have a recommended way of verifying the timer is running correctly? Sorry for the noise My first bugs. Andi, what's the status of this bug? I have this same symptom and am running a AMD64 chip in 32bit mode, and 2.6.15 (all older 2.6.13, 2.6.14, etc.) kernels also show this. Which patch of the many attached to this bug should I try? I thought it was going to be fixed in 2.6.15 ? At the moment (Acer 4005WLMi in 2.6.14 32-bit), I have "disable_timer_pin_1" in the kernel command line, which resolves all problems and works beautifully so far. The only special thing I did, for suppressing the occasional (and harmless) warnings shown in comment #102, is adding a line to apic.c, that makes this prink dependant on the following condition: if ((v & ~0x40) || (v1 != 0x40)) // ignore: 00(40), 40(40) .. and that's about it :-) > The only special thing I did, for suppressing the occasional (and
> harmless)...
What makes you think this is harmless?
Right, I should have said: seems harmless __so far__, and caused no visible problems on my laptop. I have read it's displayed when a misconfigured/spurious IRQ triggers -- so of course, if there's a patch to try, I'm willing to test it and give feedback :-) Could I please ask one of our kernel experts, explain what is really the problem and why only with certain ATI cards. I posted this error in Feb/March of 2005 and we are still having this problem. No offense I do not want any of the good people that put so much effort in helping us be offended. I just want to understand the complication that has us still waiting for a fix. It seems to me we are doing lots of trial and error. rather than finding the root cause of the issue and solve it. Perhaps there must be a bug fix from ATI and it has nothing to do with the Kernel? Again, can someone explain me why this happens and why it is so difficult to resolve it? I have a computer science degree and I can digest the technical explenation if you wish. Thanks, Artimess Hi, I have discovered a BIOS update for my laptop: http://h10025.www1.hp.com/ewfrf/wc/softwareDownloadIndex?lc=en&lang=en&cc=us&os=228&product=1130607&dlc=en&softwareitem=ob-37062-1 It specifically mentions fixing the clock to not run two times faster than it should with APIC under linux. I have tried it and it does fix it. This indicates that this can be fixed by BIOS release by manufacturer. So this is not a linux bug. If anyone wants to try to to find out the real fix so we can add a quirk for this hardware for everyone else, I can try to help. Though ATI or whoever provides BIOS builds for this hardware should know how to do it already. Warning: If you appended "disable_timer_pin_1" as a kernel parameter and can flash your BIOS with a fix like one from HP/Compaq, remove it before flashing. If you boot with the option after flashing, the kernel will always hang after loading the scheduler. Luckily, you don't need it anymore, anyways. Note: Compaq only provides support for BIOS flashes for my model through windows ..... No the patch that HP/Compaq had provided, does not do the job comppletely! You still get famous ACPI 40 error. However, much less than before. I noticed it has some other side effects that I end up returning to the older BIOS. I am sorry I did not make a not of them, if I remember I will post it later. The obsevation that I made is that the error occurs much less uder SUSE stock of Linux operating systems than the others. By others I tried Gentoo, Kanotix, I am waiting for next release of Fedora to test it on that too. With Kanotix (debian) without the BIOS patch not only you get all the errors and skewed time, also keyboard functions badly and you get repeation of the keys, very annoying. I am wondering if there might be another way of fixing this issue by correcting its DSDT rather than messing up with Kernel, but then again I am not expert in neither of them. True, it don't fix APIC error on CPU 0:(40), maybe another bug? But I haven't even seen any side effects. What does it mean? I didn't have any bad side effects with the bios flash either. Hi, not sure if I should post here, or open a new bug report. I decided here might be best. I have a HP Compaq r4000 (r4218ea to be axact). I have flashed my bios to version F1.B (using win expee - which i instaled especially for doing that and nuked it right after.) It is an Amd64 ATI IXP system. I am using Ubuntu Dapper amd-k8 2.6.15.12 kernel at the mo. So - When I boot with no parameters the machine gets as far as the acpi _STA _INI ......... stuff and hangs. If I pass apic=debug it boots (no idea why this should affect the acpi scanning stuff). So far so weird. So the machine now boots and if I pass report_lost_ticks I get loads of messages saying I am losing ticks. This may be normal so I do not pass that option (I only tried it because I saw a post saying this might help things. Besides that I am getting the the lovely messages at quite frequent intervals in dmesg APIC error on CPU0: 00(40) APIC error on CPU0: 40(40) which I see is at KERN_DEBUG level. Should this not be KERN_ERR seeing as how it's an error? If it truly is a debug statement should it not be prtink'd only when in apic debug mode? Anyway I can help - let me know! If you need any data at all let me know! Let me know what to create attachments of. bye! ps: my laptop does not run at double speed as per the title of this message but I feel this is where my issue should go... If this bug is getting too long maybe another one relating to this issue _should_ be created. It took me a while to get here :) Please create a new bug for different issues, it makes it way too difficult otherwise. Would it be possible to get the timer-routing-1 patch introduced in #73 (http://www.firstfloor.org/~andi/timer-routing-1) rediffed against 2.6.16-rc1? It has give me the best results when used with acpi_skip_timer_override but no longer applies to 2.6.16-rc1. Thanks. More generally, what is the status of this bug? The patchkit in ftp://ftp.firstfloor.org/pub/ak/x86_64/quilt/ has a different solution that should work without any command line arguments and doesn't need the new timer routing. Andi's patches work for me. Not being a quilt user, I just grabbed ftp://ftp.firstfloor.org/pub/ak/x86_64/quilt/x86_64-2.6.16-rc1-060118-2.bz2. Andi, 2.6.16-rc1-git4 + apic-main-timer + apic-main-timer-ati solves the clock problem for me on a Compaq SR1710NX, with the Asus A8AE-LE motherboard, thanks! I can't tell, is this a workaround of a mobo/bios/acpi bug, or is it fixing things to properly interact with these boxes? Thanks, -Eric Kind of both. It uses a different timer to avoid the IRQ 0 issues. The local APIC timer is near completely implemented by the CPU itself and the chipset vendor has much less possibility to screw things up. But that timer is known to have some limitations on some other platforms, so it can't be used just used everywhere. For now it's only enabled on ATI. Closing the bug now. I will try to push that patch still into 2.6.16. I should add that the actual timer routing problems are still unresolved. We actually got description of the problem from ATI itself with an analysis, but as far as I can figure out it didn't full explain the problems (or rather it should have worked with the timer-routing applied at least) It's possible the BIOS had all broken timer overrides too (ATI wouldn't be the first vendor where this happened) It was also complicated by the fact that ATI implemented a workaround into many BIOS, but some of the fixes imeplemented anymore when the workaround was enabled in the BIOS. But with the use of the APIC timer it doesn't really matter anymore because it doesn't require IRQ0 routing at all. And of course people kept mixing in other unrelated timer problems in here, which also complicated this bug. Undo mistaken reopen. Andi, if the information provided by ATI is not covered by an NDA, I would be delighted if you could provide some details. Thanks a lot. Well it had some NDA notices on it. Although I didnt sign anything, I don't want to distribute it. But basically it just described in a long winded way that if the interrupt is both enabled on the PIC and on the APIC then it will be delivered twice. I don't quite believe it's that simple because timer-routing disabled the PIC completely and it still didn't help for everybody. [In short I think the analysis was incomplete at best] Also it doesn't explain why the timer fallback code chose to unmask both PIC and APIC in the first place - it only does that when the sane options (PIC only or APIC only) don't work. It also had a recommendation for a BIOS level workaround where you set a magic bit in the Northbridge and then the CPU will ignore the messages from the PIC. If the BIOS incorporated that fix and you used to disable the APIC pins like the earlier workaround patches did then nothing would be delivered and the system wouldn't boot (there was another document that described this problem). At some point i tried to code a patch that check for that bit and don't do anything, otherwise disable PINs, but it also didn't work in all cases and ended up quite hackish. So in the end I chose to use the APIC timer. I had actually written that code before for some different purpose, so it wasn't that much of a redevelopment. I have two questions concerning the new fix Andi Kleen mentions: - does it apply also to 32-bit compiled kernels ? (currently works fine for me since disable_timer_pin_1 was added) - is the solution compatible with dynamic ticks ? It currently works fine (32-bit mode), and displays this: "dyn-tick: Disabling APIC timer, using PIT reprogramming" Thanks ! No, the patch is for 64bit kernels. Dynamic ticks isn't a standard feature, if you want support for non standard patches you have to ask the patch author. On 32bit you can just run with noapic. noapic will hang my system (acer ferrari 4005) during boot phase. i've described the only thing that worked for me with a 32bit kernel (didn't test 2.6.15* yet) under comment #67. JG I Your comment doesn't make sense because noapic doesn't use any interrupt routing tables in the DSDT. Did you perhaps confuse it with acpi=off? I'm not a windows expert, but my understanding is that XP doesn't use the APIC in many circumstances neither. As more annecdotal evidence that this bug was closed prematurely, I know two people in addition to myself with AMD Sempron laptops with ATI motherboards for whom this patch does not fix the problem. The Sempron is a 32-bit chip running a motherboard which was seemingly designed to accept both Semprons and Athlon 64's. Applying the fix to the 64-bit-only side leaves the Sempron users in the dark. Please reopen. AK> ... noapic doesn't use any interrupt routing tables in the DSDT. Well, it does; have a look at my DSDT that I posted here: http://bugzilla.kernel.org/attachment.cgi?id=4448&action=view. When the APIC is not used, ACPI uses the LNK[A-H] devices in the DSDT to determine to which interrupt pins of the PIC the PCI devices are connected. If, as in my case, the SATA controller is (claimed to be) connected to LNK0, but this link is not declared as a device (only LNKA -- LNKH are), ACPI does not enable the controller Ok that sounds more like an ACPI bug. Perhaps you complain to them? I would recommend you open a new bug for that, as far as I'm concerned this one is done. For what it's worth apic-main-timer + apic-main-timer-ati + apic-timer-only-with-cx does not seem to help here. It reports finding the affected ATI chipset but I still have to specify no_timer_check to get the clock to run at normal rate. Fedora Core kernel-2.6.15-1.1884_FC5 might be due to the ACPI updates in that kernel? (acpi-release-20060113-2.6.16-rc1.diff.bz2) MSI RS482M4 board. ATI Radeon XPress 200 based. Forgot to mention that Fedora Core kernel-2.6.15-1.1884_FC5 is based on 2.6.16-rc1-git4 despite it's apparent 2.6.15 version number.. What happens when you apply the full patch and just specify "apicpmtimer" ? Infortunately there is some overlapping changes between your patch and the Fedora kernel and I am a little lazy to wedge in all the rejects, but I have now added apic-pmtimer-calibrate to the mix adding the requested option and the symptom is still there.. Of your patches I now have the following applied: apic-main-timer apic-pmtimer-calibrate apic-main-timer-ati apic-timer-only-with-cx pmtimer-dont-touch-pit The Fedora kernel additionally have which may be relevant acpi-release-20060113-2.6.16-rc1.diff.bz2 linux-2.6-x86-apic-off-by-default.patch maybe some more but nothing obvious which stands out. I very much suspect the acpi update. Maybe relevant boot messages says: and there is a lot of IRQ0 (timer) routed to each CPU according to /proc/interrupt (most to CPU0, 30% to CPU1) ATI board detected. Using APIC/PM timer. ACPI: PM-Timer IO Port: 0x808 ACPI: Local APIC address 0xfee00000 ACPI: LAPIC (acpi_id[0x01] lapic_id[0x00] enabled) Processor #0 15:11 APIC version 16 ACPI: LAPIC (acpi_id[0x02] lapic_id[0x01] enabled) Processor #1 15:11 APIC version 16 ACPI: IOAPIC (id[0x02] address[0xfec00000] gsi_base[0]) IOAPIC[0]: apic_id 2, version 33, address 0xfec00000, GSI 0-23 ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 dfl dfl) ACPI: INT_SRC_OVR (bus 0 bus_irq 9 global_irq 21 low level) ACPI: IRQ0 used by override. ACPI: IRQ2 used by override. Some more maybe interesting boot messages: time.c: Using 3.579545 MHz PM timer. time.c: Detected 2193.722 MHz processor. Calibrating delay using timer specific routine.. 4396.40 BogoMIPS (lpj=8792803) Security Framework v1.0.0 initialized SELinux: Initializing. SELinux: Starting in permissive mode selinux_register_security: Registering secondary module capability Capability LSM initialized as secondary Mount-cache hash table entries: 256 CPU: L1 I Cache: 64K (64 bytes/line), D cache 64K (64 bytes/line) CPU: L2 Cache: 512K (64 bytes/line) CPU 0(2) -> Node 0 -> Core 0 Using local APIC timer interrupts. Detected 12.464 MHz APIC timer. Booting processor 1/2 APIC 0x1 Initializing CPU#1 Calibrating delay using timer specific routine.. 2201.27 BogoMIPS (lpj=4402549) Disabling vsyscall due to use of PM timer time.c: Using PM based timekeeping. Note the BogoMIPS of the second CPU.. this is a dual core CPU with both cores running on the same clock, and with no_timer_check they both report the higher value.. I am still having this problem in 2.6.16-rc2-mm1, unless I specify disable_timer_pin_1. Boot messages: ATI board detected. Using APIC/PM timer. ... time.c: Using 1.193182 MHz PIT timer. time.c: Detected 1600.110 MHZ processor. time.c: Using PIT/TSC based timekeeping. System is Compaq v2312us notebook, host bridge ATI RS480, PCI ID 1002:5950 There is a BIOS update available but I'd have to reinstall XP to apply it. -mm* has some completely different rewritten timecode that's completely unsupported from my side and probably has all kinds of old bugs already fixed. I would recommend you check linus mainline. I installed yesterday It's a bug - i have a partial fix but it needs a little more work still. perhaps this is a known issue, but 2.6.16-rc4 still runs at 2x for me. Can people who still have problems with the timer test 2.6.16-rc5 please? Andi, Tested 2.6.16-rc5 on the hp nx6125 and it fixes the double timer issue for me. In fact, this is the first patch that seems to work on this machine. No more need to boot with disable_timer_pin_1. Great work! Andi, works for me on my Compaq SR1710NX. Thanks! Fedora Core kernel-2.6.15-1.1996_FC5 (2.6.16rc5-git3) works fine here MSI RS482M4-IDL board (ATI Radeon Xpress 200 / RS482 + SB450) Athlon 64 X2 4200+ cpu (dual core). Created attachment 7493 [details]
2.6.16-rc5 boot.log without disable_timer_pin_1
Andi: 2.6.16-rc5 still runs double-time without disable timer-pin-1. I've attached the boot log as http://bugme.osdl.org/attachment.cgi?id=7493&action=view. Here's the output of trtc.c in this case: 1141270963:61429: rtc 256 int 0 (=0) 1141270963:813422: rtc 448 int 752 (=752) 1141270964:333099: rtc 16 int 520 (=520) 1141270964:813333: rtc 448 int 480 (=480) 1141270965:813244: rtc 448 int 1000 (=1000) 1141270966:332794: rtc 16 int 520 (=520) 1141270966:813157: rtc 448 int 480 (=480) 1141270967:813066: rtc 448 int 1000 (=1000) 1141270968:332490: rtc 16 int 520 (=520) 1141270968:812977: rtc 448 int 480 (=480) 1141270969:812889: rtc 448 int 1000 (=1000) 1141270970:332185: rtc 16 int 520 (=520) 1141270970:812800: rtc 448 int 480 (=480) 1141270971:812712: rtc 448 int 1000 (=1000) 1141270972:331881: rtc 16 int 520 (=520) 1141270972:812622: rtc 448 int 480 (=480) 1141270973:812533: rtc 448 int 1000 (=1000) 1141270974:331577: rtc 16 int 520 (=520) 1141270974:812444: rtc 448 int 480 (=480) Thanks. But that looks a bit instable (normally the sums at the end should be rougly the same), but if it works ok. And you're using hZ=1000 I guess. Yes. HZ=1000: [~]$ grep HZ /archive/kernel/linux-2.6/.config # CONFIG_HZ_100 is not set # CONFIG_HZ_250 is not set CONFIG_HZ_1000=y CONFIG_HZ=1000 This bug looks to be related to http://bugme.osdl.org/show_bug.cgi?id=5573 . It has also generated a Ubunto HowTo: http://ubuntuforums.org/showthread.php?s=bb5681d829b8bfd25862caab2a63db20&t=75281 I observe this bug on my HP Pavilion a1250n. It has a dual core Athlon 64 3800+ x2 and uses the ATI Radeon Xpress 200 chipset (RS482 nothbridge, SB400 southbridge). It is running Fedora Core 4's kernel-smp-2.6.15-1.1831_FC4 for x86-64 (no, not a kernel.org kernel). I tried the following things to make the clock behave: 1. in BIOS config, tried to disable "spread spectrum" as per #41 in Ubunto HowTo. There was no such setting to change. 2. updated to HP BIOS 3.40 [no change] 3. booted with notsc [no change] 4. booted with acpi_skip_timer_override [no change] 5. booted with disable_timer_pin_1 [worked!] Even though the clock problem is gone, there are a couple of symptoms that suggest related APIC or interrupt routing problems to me. After 30 hours of uptime (assuming the clock isn't lying) - I see 29 errors like this: APIC error on CPU0: 40(40) All are detected on CPU0 for some reason. The first such error, and only the first, is slightly different: APIC error on CPU0: 00(40) Googling shows that this APIC error shows in dmesg output on systems with the RS480 or RS482. - /proc/interrupts shows a lot of parport0 interrupts, even though there is nothing connected to the parallel port: 7: 374934 26822359 IO-APIC-edge parport0 - /proc/interrupts shows a lot of USB interrupts. The only thing connected to a USB port is the built-in flash card reader (without any cards loaded): 225: 8589443 0 IO-APIC-level ehci_hcd:usb1, ohci_hcd:usb2, ohci_hcd:usb3 - the total number of interrupts fielded by each CPU is extremely close, and I see no reason for this to be the case. Could the parport0 interrupts be invented somehow to balance the two cpu's??? Notice that the number of CPU1 timer interrups equals the number of CPU0 parport0 interrupts? And that the reverse is close to true? CPU0 CPU1 0: 26822411 374934 IO-APIC-edge timer 1: 20918 0 IO-APIC-edge i8042 7: 374934 26822359 IO-APIC-edge parport0 8: 0 0 IO-APIC-edge rtc 12: 359851 0 IO-APIC-edge i8042 14: 1948929 0 IO-APIC-edge ide0 169: 2 0 IO-APIC-level acpi, ohci1394 201: 114362 0 IO-APIC-level libata 209: 159867 0 IO-APIC-level eth0 217: 1 0 IO-APIC-level ATI IXP 225: 8589443 0 IO-APIC-level ehci_hcd:usb1, ohci_hcd:usb2, ohci_hcd:usb3 NMI: 1877 1118 LOC: 27198396 27198373 ERR: 29 MIS: 0 Created attachment 7495 [details]
dmesg output from system described in #149
Hm. I tried 2.6.16-rc5 on MSI RS482M-IL (a radeon express 200) with a AMD Sempron 3000+ in i386 mode. The kernel was compiled to use the K8. The timer runs twice as fast. I am still confused. Is the patch working only for x86_64 ? [17179638.956000] powernow-k8: Found 1 AMD Athlon 64 / Opteron processors (version 1.60.0) [17179638.972000] powernow-k8: BIOS error - no PSB or ACPI _PSS objects 0000:00:00.0 Host bridge: ATI Technologies Inc: Unknown device 5950 (rev 10) 0000:00:01.0 PCI bridge: ATI Technologies Inc: Unknown device 5a3f An x86 patch was merged into 2.6.16-rc, see "[PATCH] i386: port ATI timer fix from x86_64 to i386 II". I dunno if there was a seperate bug number for x86 but alot of those people were watching this one. Also, is there a bug report for the ACPI problems with this chipset yet? I tried 2.6.16-rc on MSI RS482M-IL (a radeon express 200) with a AMD Sempron 3000+ in i386 mode. The kernel was compiled to use the K8. The clock is now normal and this is success for me: Thanks Andi Kleen! kern.log says: Mar 14 20:50:54 alunheim kernel: ATI board detected. Disabling timer routing over 8254. Mar 14 20:50:54 alunheim kernel: ENABLING IO-APIC IRQs Mar 14 20:50:54 alunheim kernel: ..TIMER: vector=0x31 apic1=0 pin1=2 apic2=-1 pin2=-1 Mar 14 20:50:54 alunheim kernel: ..MP-BIOS bug: 8254 timer not connected to IO-APIC Mar 14 20:50:54 alunheim kernel: ...trying to set up timer (IRQ0) through the 8259A ... failed. Mar 14 20:50:54 alunheim kernel: ...trying to set up timer as Virtual Wire IRQ... works. I still sometimes get: Mar 14 21:00:54 alunheim kernel: APIC error on CPU0: 00(40) Mar 14 21:01:12 alunheim kernel: APIC error on CPU0: 40(40) Mar 14 21:02:41 alunheim kernel: APIC error on CPU0: 40(40) Mar 14 21:06:35 alunheim kernel: APIC error on CPU0: 40(40) Mar 14 21:09:30 alunheim kernel: APIC error on CPU0: 40(40) Unless I'm doing something wrong, 2.6.16-rc6 still does the double-speed boogie unless I pass disable_timer_pin_1 to the kernel. Kurt, there is a bug in the i386 patch that went into 2.6.16-rc6. Are you running i386 without ACPI? If so, try the _first_ patch in this message: http://marc.theaimsgroup.com/?l=linux-kernel&m=114238639009970&q=raw Hmm it works for everybody else now afaik. Full boot log? Created attachment 7696 [details]
Full boot log of 2.6.16 and timer running at double speed
On stock 2.6.16, disable_timer_pin_1 is required to keep the clock from running
at double speed. This boot log is from a boot without disable_timer_pin_1.
Can you add dmidecode output too please? Created attachment 7697 [details]
dmidecode output for system in #157
As of 2.6.17-rc1, the double speed clock problem has disappeared on my system -- it is no longer necessary to pass disable_timer_pin_1. I still see "APIC error on CPU0: 0(40)" messages in syslog (well, /var/log/messages and dmesg). So, between 2.6.16 and 2.6.17-rc1, the problem was fixed. I'm getting "APIC error on CPU0: 0(40)" on my 2.6.16-gentoo (Turion64 / ATI X200) all the time (less than before and I don't have any kernel parameters passed as of now). I think that #c160 is not entirely correct in this respect. Double clock problem is fixed, though. *** Bug 5211 has been marked as a duplicate of this bug. *** Nothing has changed for me regarding this bug. I still have to use noapic in kernel version 2.6.17_rc4. What I did notice during a boot without noapic was these lines from the kernel log. This is from a 2.6.16 kernel btw. Apr 12 11:25:36 [kernel] ATI board detected. Disabling timer routing over 8254. Apr 12 11:25:36 [kernel] OEM ID: ATI Product ID: RS480 APIC at: 0xFEE00000 Apr 12 11:25:36 [kernel] Losing some ticks... checking if CPU frequency changed. Apr 12 11:26:08 [kernel] spurious 8259A interrupt: IRQ7. My motherboard is an MSI RS482M4 though and not an RS480. I have the same problem and i am using kernel-2.6.15, i have a MSI rs482m4 with ati chipset and an athlon X2 3800+ You need 2.6.16 or later. 2.6.15 has the old timer code and will produce double clock rate on the MSI RS482M4 boards. Fedora Core x86_64 kernel 2.6.16-1.2096_FC5 works just fine here on MSI RS482M4-IDL Athlon64 X2 4200+. Have not verified 32-bit i686 mode kernels and it's possible there is still problems in the i386 arch on these boards, but x86_64 should work fine on RS428M4 with 2.6.16 and later. No it does not work fine. I have an athlon64-3200+ on my msi rs482m4 and use x86_64-pc-linux-gnu-4.1.0 to compile my kernels. Today I tried 2.6.16-gentoo-r6 and 2.6.17_rc3 and both still suffer from this problem. I still have to use noapic to not get double clock rate. maybe this is a bios version issue? I'm on 1.1 See: http://www.msi.com.tw/program/support/bios/bos/spt_bos_detail.php?UID=693&kind=1 Maybe. I am using the 1.2 BIOS. But I do not think it is BIOS related. It could also be slight differences in ACPI usage/patches I suppose. The Fedora kernels do have a bit of ACPI patches in them, but nothing sticks out as obviously relevant to the timers.. Might also be SMP/UP related. The fedora x86_&4 kernels are all SMP enabled and I suppose this could make a bit of difference in how the timer is handled. I am sorry but I am too lazy to compile a pristine 2.6.16 or 2.6.17 kernel myself from kernel.org sources just to verify now that the Fedora kernel does work just fine. Maybe you could try the Fedora x86_64 kernel 2.6.16-1.2096_FC5 on your gentoo (should just be to unpack the RPM with rpm2cpio and install it as any other kernel for a quick test). If that works then you have something to compare with to try to isolate why your kernel isn't working. If that fails then there is some other difference outside the kernel. I've found that kernel at: http://download.fedora.redhat.com/pub/fedora/linux/core/updates/5/x86_64/ or better yet: http://download.fedora.redhat.com/pub/fedora/linux/core/updates/5/x86_64/kernel-xen0-2.6.16-1.2096_FC5.x86_64.rpm I've rpm2tar'ed the rpm for the fedora kernel, and make oldconfig'ed it with a config from my current 2.6.16-gentoo-r6. Unfortunately when I tried to make, I got: CHK include/linux/version.h SPLIT include/linux/autoconf.h -> include/config/* CC scripts/mod/empty.o HOSTCC scripts/mod/mk_elfconfig MKELF scripts/mod/elfconfig.h HOSTCC scripts/mod/file2alias.o HOSTCC scripts/mod/modpost.o HOSTCC scripts/mod/sumversion.o HOSTLD scripts/mod/modpost HOSTCC scripts/kallsyms HOSTCC scripts/pnmtologo HOSTCC scripts/conmakehash HOSTCC scripts/bin2c make[1]: *** No rule to make target `init/main.o', needed by `init/built-in.o'. Stop. make: *** [init] Error 2 I'm not sure why this would not work, but perhaps it would be easier for you to try a vanilla kernel. It's a binary kernel already compiled. Just drop it into the proper places, and get your boot loader in shape to boot it. Note: The xen kernel is for xen, not real hardware. The Fedora RPM you should be looking for is kernel-2.6.16-... As I am not having any issues with my kernel I am not too motivated to mess around with this. right, I missed the xen part :( not xen kernel: http://download.fedora.redhat.com/pub/fedora/linux/core/updates/5/x86_64/kernel-2.6.16-1.2096_FC5.x86_64.rpm no need to compile, ok, will try again. Folks, this is not the forum for random distribution user land handholding. If someone has still has a genuinely new problem with 2.6.17 & ATI timers please open a new bug. I'm not interested in distribution kernels which are off topic here. I still have a problem with 2.6.17 & ATI timers, and the problem is exactly that described by the summary of this bug, so I don't think I should open a new bug. I think you should because I'm going to ignore that one. It is far too chaotic to let anything good come out of it. okay, you're the boss :) new bug is: http://bugzilla.kernel.org/show_bug.cgi?id=6497 Is any issue discussed in this bug still present in kernel 2.6.19-rc6? Like said, new bugs should be probably opened. Off-topic I know, but basically worked with ubuntu kernels 2.6.15, 2.6.17 and 2.6.19, but now broken with 2.6.20 - I don't see anything in the ubuntu changelogs indicating that this would have been caused by ubuntu, so it might be that 2.6.20 kernel has this problem resurfaced on some at least some boards that have already worked for a year or so. Anyway, this is just FYI. If I have time to compile from kernel.org sources, I'll open a new bug about this, and meanwhile this probably really should be closed to make space for bugs about the current situation. Thanks for closing. The continuation bug is bug 7789. *** Bug 6099 has been marked as a duplicate of this bug. *** |