Most recent kernel where this bug did not occur: 2.6.9 Distribution: Debian Sarge Hardware Environment: MSI K7N2 Delta + Athlon XP 3200+ 2.0GHz (downclocked) Software Environment: Problem Description: ntpd fail to work starting 2.6.10 on this hardware. Also the drift-test.py from John Stultz show a very high drift starting 2.6.10: A have tested 7 differents vanilla kernel on the same suspect hardware: 2.6.8 : ntpd working : drift from -77ppm to -144ppm 2.6.9 : ntpd working : drift from -99ppm to -231ppm 2.6.10 : ntpd failed : drift from -37825ppm to -29912ppm 2.6.12 : ntpd failed : drift from -43429ppm to -45251ppm CONFIG_HZ=100 2.6.14 : ntpd failed : drift from -7598ppm to -4410ppm CONFIG_HZ=250 2.6.14 : ntpd failed : drift from -13519ppm to -12538ppm CONFIG_HZ=1000 2.6.14 : ntpd failed : drift from -14497ppm to -19543ppm Steps to reproduce: Use 'ntpdate <server>' then start ntpd. With ntpq use the 'pe', 'assID' and 'rv <id>' command to wait at least 5 polls, then ntpd should show the 'sys.peer' condition if it work or 'rejected' if it fail. See the 'NTP broken with 2.6.14' thead on the LKLM.
Created attachment 6462 [details] linux 2.6.9 kernel log
Created attachment 6463 [details] 2.6.9 kernel log
Created attachment 6464 [details] 2.6.10 kernel log
Looking at the dmessage differences, I suspect this is related to changes in the ACPI layer. Here are some snippits of the diff. 2.6.9 vs 2.6.10 -talla kernel: ACPI: BIOS IRQ0 pin2 override ignored. talla kernel: ACPI: INT_SRC_OVR (bus 0 bus_irq 9 global_irq 9 high level) talla kernel: ACPI: INT_SRC_OVR (bus 0 bus_irq 14 global_irq 14 high edge) talla kernel: ACPI: INT_SRC_OVR (bus 0 bus_irq 15 global_irq 15 high edge) +talla kernel: ACPI: IRQ0 used by override. +talla kernel: ACPI: IRQ2 used by override. talla kernel: ACPI: IRQ9 used by override. talla kernel: ACPI: IRQ14 used by override. talla kernel: ACPI: IRQ15 used by override. talla kernel: Enabling APIC mode: Flat. Using 1 I/O APICs talla kernel: Using ACPI (MADT) for SMP configuration information talla kernel: Built 1 zonelists -talla kernel: mapped APIC to ffffd000 (fee00000) -talla kernel: mapped IOAPIC to ffffc000 (fec00000) talla kernel: ENABLING IO-APIC IRQs -talla kernel: ..TIMER: vector=0x31 pin1=0 pin2=-1 +talla kernel: ..TIMER: vector=0x31 pin1=2 pin2=-1 +talla kernel: ..MP-BIOS bug: 8254 timer not connected to IO-APIC +talla kernel: ...trying to set up timer (IRQ0) through the 8259A ... failed. +talla kernel: ...trying to set up timer as Virtual Wire IRQ... failed. +talla kernel: ...trying to set up timer as ExtINT IRQ... works. talla kernel: NET: Registered protocol family 16 talla kernel: PCI: PCI BIOS revision 2.10 entry at 0xfbbb0, last bus=3 talla kernel: PCI: Using configuration type 1 talla kernel: mtrr: v2.0 (20020519) -talla kernel: ACPI: Subsystem revision 20041105 +talla kernel: ACPI: Subsystem revision 20040816
From mail to lkml. Here are ioapic related changes from bkcvs gitweb from 2.6.9 to 2.6.10. http://kernel.org/git/?p=linux/kernel/git/torvalds/old-2.6-bkcvs.git;a=commitdiff;h=0b517c442f66f9b1e280ca49d4b215cc3681d4e5;hp=60a7a584ad5a266afa5d7fde5f2828894e615c17 http://kernel.org/git/?p=linux/kernel/git/torvalds/old-2.6-bkcvs.git;a=commitdiff;h=eb3f18413cb759662b34230674fb6f07c9e16e56;hp=e87e2e7669129dc0e8b2959c656650d7ea5c066f
Ack. The diff above in comment #4 is backward. The -'s are 2.6.10 and the +'s are 2.6.9. I mistakenly saved the files with the wrong names. This aligns with Len Brown's note on lkml: "NFORCE2 on an ACPI-enabled kernel should automatically invoke the acpi_skip_timer_override BIOS workaround -- as the NFORCE family of chip-sets have the timer interrupt attached to pin-0, but some of them shipped with a bogus BIOS over-ride telling Linux the timer is on pin-2." Still not sure why the problem crops up after this fix has been included. The reporter is having pretty sever BIOS issues, so until they are resolved, I'm thinking we should mark this as INVALID. Jean-Chrstian: Please reopen this if the problem persists after you get your BIOS issues sorted. Thanks again for the great testing and feedback on lkml!
Just as a followup, Jean-Christian resolved his BIOS confusion and updated to the current BIOS. Now this issue does not appear. I'm marking this as closed.
Debugging this issue feels like playing drift boss on ice! The last stable kernel I found was 2.6.9. I'm running Debian Sarge on an MSI K7N2 Delta with an Athlon XP 3200+ (clocked at 2.0GHz). The problem I'm experiencing didn't exist back then, hoping someone can help pinpoint the cause so I can go back to my game of drift boss! https://driftboss.lol