Bug 12114 - AthlonXP-M
Summary: AthlonXP-M
Status: CLOSED UNREPRODUCIBLE
Alias: None
Product: Power Management
Classification: Unclassified
Component: cpufreq (show other bugs)
Hardware: All Linux
: P1 normal
Assignee: cpufreq
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2008-11-27 20:07 UTC by Jérôme Poulin
Modified: 2011-03-10 16:22 UTC (History)
4 users (show)

See Also:
Kernel Version: 2.6.27-gentoo-r4
Subsystem:
Regression: No
Bisected commit-id:


Attachments
Two other oops (6.98 KB, text/plain)
2009-03-31 00:08 UTC, Xavier Douville
Details
Enforce using of broadcast timers through boot param processor (958 bytes, patch)
2011-03-10 09:26 UTC, Thomas Renninger
Details | Diff

Description Jérôme Poulin 2008-11-27 20:07:37 UTC
Latest working kernel version: <2.6.24 (didn't try below, got the card about a year ago.)
Earliest failing kernel version: 2.6.27
Distribution: Gentoo
Hardware Environment: VIA motherboard, D-Link gigabit card
Ethernet controller: D-Link System Inc DGE-530T Gigabit Ethernet Adapter (rev 11) (rev 11)
The card is uplinked directly to another computer using jumbo frame of maximum size (9000 bytes) but now downsized to normal 1500 for testing purpose, still crashes.
Software Environment: Uplinked to my media center which currently just play MP3 from the host computer and remote controlled via Sonata / MPD.
Problem Description:
Music plays, then suddenly stop, I rmmod skge, modprobe skge, then execute network script to get interface up and it works for 1-30 minutes then stops again. I sometime get an OOPS: http://pastebin.ca/1269135
Comment 1 Jérôme Poulin 2008-11-27 20:16:13 UTC
I would like to add more information to this bug, after looking at the pastebin, I found out fglrx was loaded during my test, but I also tried with the module removed, the bug is still present without fglrx.
It appears the bug is more present when there's transfer on my internal network card too, via-rhine.
Comment 2 Dominique Larchey-Wendling 2009-01-09 02:48:26 UTC
I am experiencing a similar problem on 2.6.27 and 2.6.28 where my server
becomes suddenly unreachable .. I do not have anything in my logs. The
server does not hang ... simply the network connection does not work
anymore ... the server shuts down cleanly when I push the off button.

The last time I tried, the server was unreachable after about 2 days with
a 2.6.28. It seems much quicker with a 2.6.27.

With a 2.6.26, everything is ok and the server runs for months without
trouble. Here is my network adapter :

....# /usr/sbin/lshw -C network
  *-network:0
       description: Ethernet interface
       product: DGE-530T Gigabit Ethernet Adapter (rev 11)
       vendor: D-Link System Inc
       physical id: a
       bus info: pci@0000:00:0a.0
       logical name: eth0
       version: 11
       serial: 00:17:9a:c1:a7:d2
       size: 1GB/s
       capacity: 1GB/s
       width: 32 bits
       clock: 66MHz
       capabilities: pm vpd bus_master cap_list ethernet physical tp 10bt 10bt-fd 100bt 100bt-fd 1000bt 1000bt-fd autonegotiation
       configuration: autonegotiation=on broadcast=yes driver=skge driverversion=1.13 duplex=full firmware=N/A ip=192.168.1.1 latency=32 link=yes maxlatency=31 mingnt=23 module=skge multicast=yes port=twisted pair speed=1GB/s

The motherboard is VIA based (Shuttle FX41). The processor is an athlon Mobile 1.8Ghz.

Have you made any progress wrt this bug ?
Comment 3 Dominique Larchey-Wendling 2009-03-27 08:55:06 UTC
With a 2.6.29, my server has been up and reachable for nearly 3 days without interruption. It seems skge works again. I don't know what happened.
Comment 4 Dominique Larchey-Wendling 2009-03-29 16:39:02 UTC
Well ... after 5 days of 2.6.29, the server has become unreachable ... same symptoms but it lasted longer ...
Comment 5 Xavier Douville 2009-03-31 00:08:08 UTC
Created attachment 20748 [details]
Two other oops

I think I have a similar problem with my server. I use debian lenny, kernel 2.6.26. The first time it crashed my system.
Comment 6 Dominique Larchey-Wendling 2009-03-31 07:55:59 UTC
Well 2.6.26 is fine for me ... it works for weeks. Pbs started with 2.6.27. Notice that my problem is not about crash but just network unreachability of the server. It reboots just fine when I push the on/off button. Since it is not connected to any screen/kbd, I cannot check what is going on when the network is down ...
Comment 7 Dominique Larchey-Wendling 2009-04-20 10:43:39 UTC
I think I have identified the source of my problem. May be it has nothing to do with skge ... I do not know. My CPU is Athlon XP-M and it has an "unstable TSC". I do not know if TSC unstability (delta sometimes over 60 seconds) is another symptom or the source of the problem but since I have removed frequency scaling, TSC is now stable and I am able to run 2.6.29.1 for days without any problem. I do not know what changed from 2.6.26 to 2.6.27 wrt to TSC but it may have caused the TSC unstability to have consequences it did not have before.
Comment 8 Len Brown 2011-01-18 08:07:20 UTC
the driver oops's were watchdog timeouts,
so it does appear that this is a timer-related issue rather than a crash.

I don't know much about Athlon, but I believe that the idle
power states and the frequency scaling are interrelated.

Possibly when you disabled frequency scaling, you also
disabled their idle states, which is where you might have
problems with the TSC and/or the LAPIC timer.

Anyway, please re-open this if there is still a problem
with the latest stable kernel.
Comment 9 Dominique Larchey-Wendling 2011-03-07 12:58:38 UTC
Two years later, here are my observations regarding the TSC/network/skge problem.

After I disabled cpu frequency scaling AND the processor option in ACPI (ACPI_PROCESSOR=n), the TSC was not unstable any more on a 2.6.27 kernel (patch with openvz). The file server has been running flawlessly for more than a year without reboot, sometimes under heavy load for weeks long. No more stalled network interface leading to an inaccessible server (the server is headless).

2 weeks ago, I updated the server to FC14 and a 2.6.32 kernel (patch with openvz). As I did not recall the whole TSC problem issues, I inadvertently changed the ACPI_PROCESSOR flag to y(es), but as I remembered that frequency scaling was an issue, I let that option untouched, ie no frequency scaling. 

As a consequence, the network stalling problem re-appeared. After a few days of uptime (1 to 3 days), the file server network interface does not respond any more, but there is no crash, no oops. I noticed the following message in the logs :

Mar  7 09:59:47 xxxxx klogd: [    4.100585] Marking TSC unstable due to TSC halts in idle

And thus tsc is removed from 

/sys/devices/system/clocksource/clocksource0/available_clocksource 

Only acpi_pm remains. But nevertheless, even though TSC is removed as a clock source, it seems that the TSC functionality is not completely ignored and has a bad impact on the behaviour of the skge driver.

After I re-read this bugzilla thread, I remembered the ACPI_PROCESSOR issue and I disabled it again. Now TSC is apparently stable and serves as "reliable" clock source. Also, hopefully, the skge driver will work again. Otherwise I will come back here.

I cannot test later kernels like 2.6.36 or 2.6.37 because no openvz patch exists for such kernels. I don't know if "unstable TSC" management has evolved in recent kernels. What remains a mystery for me is how an unstable but disabled TSC (as a clock source) impacts the skge network driver.
Comment 10 Dominique Larchey-Wendling 2011-03-07 17:23:30 UTC
Another remark. It seems that the PR_GET_TSC and PR_SET_TSC operations of prctl where implemented in the kernel starting with 2.6.26. Maybe a userland process is involved here. 

ntp was also messed up with the unstable TSC. It got unsynchronized very quickly and of course, before networking went down.

Maybe the same issue as the netdev_watchdog timeout ...
Comment 11 Thomas Renninger 2011-03-07 22:02:21 UTC
Does notsc param help?
Not sure it works anymore as expected. I very recently saw an oops early in calibrate_native_tsc, even with notsc param. As this was in an early -rcX kernel and getting more recent sources fixed the issue, I didn't look at it anymore.
Better check dmesg and clocksource sysfs file that tsc isn't used:
/sys/devices/system/clocksource/clocksource0/current_clocksource

Hm, it looks like skge does not like the latencies or whatever is introduced by the deep sleep state. There are not much older AMD machines supporting deeper sleep states through ACPI processor. I know single core Turion did. AthlonXP-M (Mobile?) sounds like it supports C2 and the message:
"Marking TSC unstable due to TSC halts in idle"
tells us it does (2.6.37):
                /* TSC could halt in idle, so notify users */
                if (state > ACPI_STATE_C1)
                        mark_tsc_unstable("TSC halts in idle");

Summary: Sleep state(s) makes your machine unstable, disabling it looks like the way to go (processor.max_cstate=1).

You may want to try to set:
local_apic_timer_c2_ok = 0
I can't see the use of this variable, it always seem to be one.
You could just try to hardcode it in drivers/acpi/processor_idle.c:lapic_timer_check_state():
-        u8 type = local_apic_timer_c2_ok ? ACPI_STATE_C3 : ACPI_STATE_C2;
+        u8 type;
+        local_apic_timer_c2_ok = 0;
+        local_apic_timer_c2_ok ? ACPI_STATE_C3 : ACPI_STATE_C2;
Be careful, this is not a real patch.

This only makes sense if you have a C-state of type C2, which is a bit ugly to find out.
Comment 12 Dominique Larchey-Wendling 2011-03-08 10:02:42 UTC
Thx very much for your input. I think might Athlon is XP-M(obile) Barton 2400+.

# cat /proc/cpuinfo
processor       : 0
vendor_id       : AuthenticAMD
cpu family      : 6
model           : 10
model name      : Mobile AMD Athl
stepping        : 0
cpu MHz         : 1802.161
cache size      : 512 KB
fdiv_bug        : no
hlt_bug         : no
f00f_bug        : no
coma_bug        : no
fpu             : yes
fpu_exception   : yes
cpuid level     : 1
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 mmx fxsr sse syscall mp mmxext 3dnowext 3dnow
bogomips        : 3604.32
clflush size    : 32
cache_alignment : 32
address sizes   : 34 bits physical, 32 bits virtual
power management: ts fid vid

If I understand well, you say that it is possible to keep sleep state C1 but C2 is a mess ?
Comment 13 Thomas Renninger 2011-03-08 11:26:35 UTC
More interesting would be:
for x in cat /sys/devices/system/cpu/cpu*/cpuidle/state*/*;do echo $x;cat $x;done
If your kernel still provides:
cat /proc/acpi/processor/*/power
That would be interesting as well.
Comment 14 Dominique Larchey-Wendling 2011-03-09 15:39:38 UTC
Here is the output of cpuid ... the following seems a bit strange

RDMSR and WRMSR support               = true
....
RDTSCP                                = false

TSC definitely exists on my XP-M but may be the intent was to discourage its use because it is not reliable when C2 and C3 sleeps states are used ? So I am going to try the processor.max_cstate=1 command line option as well.

Additionally, I compiled a kernel 2.6.32 with ACPI_PROCESSOR and cpu_freq but without TSC (by commenting CONFIG_X86_TSC=y in the kernel .config file). I will try it soon to see if the system can work without the TSC. I am not sure it is really disabled in this case. I will try prctl also.

-----------------------------------------------------------------

# cpuid -k
CPU 0:
   vendor_id = "AuthenticAMD"
   version information (1/eax):
      processor type  = primary processor (0)
      family          = Intel Pentium Pro/II/III/Celeron/Core/Core 2/Atom, AMD Athlon/Duron, Cyrix M2, VIA C3 (6)
      model           = 0xa (10)
      stepping id     = 0x0 (0)
      extended family = 0x0 (0)
      extended model  = 0x0 (0)
      (simple synth)  = AMD Athlon XP / Athlon MP / Sempron / mobile Athlon XP-M / mobile Athlon XP-M (LV) (Barton A2)
   miscellaneous (1/ebx):
      process local APIC physical ID = 0x0 (0)
      cpu count                      = 0x0 (0)
      CLFLUSH line size              = 0x0 (0)
      brand index                    = 0x0 (0)
   brand id = 0x00 (0): unknown
   feature information (1/edx):
      x87 FPU on chip                        = true
      virtual-8086 mode enhancement          = true
      debugging extensions                   = true
      page size extensions                   = true
      time stamp counter                     = true
      RDMSR and WRMSR support                = true
      physical address extensions            = true
      machine check exception                = true
      CMPXCHG8B inst.                        = true
      APIC on chip                           = true
      SYSENTER and SYSEXIT                   = true
      memory type range registers            = true
      PTE global bit                         = true
      machine check architecture             = true
      conditional move/compare instruction   = true
      page attribute table                   = true
      page size extension                    = true
      processor serial number                = false
      CLFLUSH instruction                    = false
      debug store                            = false
      thermal monitor and clock ctrl         = false
      MMX Technology                         = true
      FXSAVE/FXRSTOR                         = true
      SSE extensions                         = true
      SSE2 extensions                        = false
      self snoop                             = false
      hyper-threading / multi-core supported = false
      therm. monitor                         = false
      IA64                                   = false
      pending break event                    = false
   feature information (1/ecx):
      PNI/SSE3: Prescott New Instructions     = false
      PCLMULDQ instruction                    = false
      64-bit debug store                      = false
      MONITOR/MWAIT                           = false
      CPL-qualified debug store               = false
      VMX: virtual machine extensions         = false
      SMX: safer mode extensions              = false
      Enhanced Intel SpeedStep Technology     = false
      thermal monitor 2                       = false
      SSSE3 extensions                        = false
      context ID: adaptive or shared L1 data  = false
      FMA instruction                         = false
      CMPXCHG16B instruction                  = false
      xTPR disable                            = false
      perfmon and debug                       = false
      process context identifiers             = false
      direct cache access                     = false
      SSE4.1 extensions                       = false
      SSE4.2 extensions                       = false
      extended xAPIC support                  = false
      MOVBE instruction                       = false
      POPCNT instruction                      = false
      time stamp counter deadline             = false
      AES instruction                         = false
      XSAVE/XSTOR states                      = false
      OS-enabled XSAVE/XSTOR                  = false
      AVX: advanced vector extensions         = false
      F16C half-precision convert instruction = false
      hypervisor guest status                 = false
   extended processor signature (0x80000001/eax):
      family/generation = AMD Athlon/Duron (7)
      model             = 0xa (10)
      stepping id       = 0x0 (0)
      extended family   = 0x0 (0)
      extended model    = 0x0 (0)
      (simple synth) = AMD Athlon XP / Athlon MP / Sempron / mobile Athlon XP-M / mobile Athlon XP-M (LV) (Barton A2)
   extended feature flags (0x80000001/edx):
      x87 FPU on chip                       = true
      virtual-8086 mode enhancement         = true
      debugging extensions                  = true
      page size extensions                  = true
      time stamp counter                    = true
      RDMSR and WRMSR support               = true
      physical address extensions           = true
      machine check exception               = true
      CMPXCHG8B inst.                       = true
      APIC on chip                          = true
      SYSCALL and SYSRET instructions       = true
      memory type range registers           = true
      global paging extension               = true
      machine check architecture            = true
      conditional move/compare instruction  = true
      page attribute table                  = true
      page size extension                   = true
      multiprocessing capable               = true
      no-execute page protection            = false
      AMD multimedia instruction extensions = true
      MMX Technology                        = true
      FXSAVE/FXRSTOR                        = true
      SSE extensions                        = false
      1-GB large page support               = false
      RDTSCP                                = false
      long mode (AA-64)                     = false
      3DNow! instruction extensions         = true
      3DNow! instructions                   = true
   extended brand id (0x80000001/ebx):
      raw     = 0x0 (0)
      BrandId = 0x0 (0)
   AMD feature flags (0x80000001/ecx):
      LAHF/SAHF supported in 64-bit mode = false
      CMP Legacy                         = false
      SVM: secure virtual machine        = false
      extended APIC space                = false
      AltMovCr8                          = false
      LZCNT advanced bit manipulation    = false
      SSE4A support                      = false
      misaligned SSE mode                = false
      PREFETCH/PREFETCHW instructions    = false
      OS visible workaround              = false
      instruction based sampling         = false
      XOP support                        = false
      SKINIT/STGI support                = false
      watchdog timer support             = false
      lightweight profiling support      = false
      4-operand FMA instruction          = false
      NodeId MSR C001100C                = false
      TBM support                        = false
      topology extensions                = false
   brand = "Mobile AMD Athl"
   L1 TLB/cache information: 2M/4M pages & L1 TLB (0x80000005/eax):
      instruction # entries     = 0x8 (8)
      instruction associativity = 0xff (255)
      data # entries            = 0x8 (8)
      data associativity        = 0x4 (4)
   L1 TLB/cache information: 4K pages & L1 TLB (0x80000005/ebx):
      instruction # entries     = 0x10 (16)
      instruction associativity = 0xff (255)
      data # entries            = 0x20 (32)
      data associativity        = 0xff (255)
   L1 data cache information (0x80000005/ecx):
      line size (bytes) = 0x40 (64)
      lines per tag     = 0x1 (1)
      associativity     = 0x2 (2)
      size (Kb)         = 0x40 (64)
   L1 instruction cache information (0x80000005/edx):
      line size (bytes) = 0x40 (64)
      lines per tag     = 0x1 (1)
      associativity     = 0x2 (2)
      size (Kb)         = 0x40 (64)
   L2 TLB/cache information: 2M/4M pages & L2 TLB (0x80000006/eax):
      instruction # entries     = 0x0 (0)
      instruction associativity = L2 off (0)
      data # entries            = 0x0 (0)
      data associativity        = L2 off (0)
   L2 TLB/cache information: 4K pages & L2 TLB (0x80000006/ebx):
      instruction # entries     = 0x100 (256)
      instruction associativity = 4-way (4)
      data # entries            = 0x100 (256)
      data associativity        = 4-way (4)
   L2 unified cache information (0x80000006/ecx):
      line size (bytes) = 0x40 (64)
      lines per tag     = 0x1 (1)
      associativity     = 16-way (8)
      size (Kb)         = 0x200 (512)
   L3 cache information (0x80000006/edx):
      line size (bytes)     = 0x0 (0)
      lines per tag         = 0x0 (0)
      associativity         = L2 off (0)
      size (in 512Kb units) = 0x0 (0)
   Advanced Power Management Features (0x80000007/edx):
      temperature sensing diode      = true
      frequency ID (FID) control     = true
      voltage ID (VID) control       = true
      thermal trip (TTP)             = false
      thermal monitor (TM)           = false
      software thermal control (STC) = false
      100 MHz multiplier control     = false
      hardware P-State control       = false
      TscInvariant                   = false
   Physical Address and Linear Address Size (0x80000008/eax):
      maximum physical address bits         = 0x22 (34)
      maximum linear (virtual) address bits = 0x20 (32)
      maximum guest physical address bits   = 0x0 (0)
   Logical CPU cores (0x80000008/ecx):
      number of CPU cores - 1 = 0x0 (0)
      ApicIdCoreIdSize        = 0x0 (0)
   (multi-processing synth): none
   (multi-processing method): AMD
   (synth) = AMD Athlon XP / Athlon MP / Sempron / mobile Athlon XP-M / mobile Athlon XP-M (LV) (Barton A2)
Comment 15 Dominique Larchey-Wendling 2011-03-09 17:09:30 UTC
With CONFIG_X86_TSC disabled, using the TSC is still possible from userland via prctl :

# gcc disable-tsc-test.c -o test
# ./test 
rdtsc() == 1221707443605
prctl(PR_GET_TSC, &tsc_val); tsc_val == PR_TSC_ENABLE
rdtsc() == 1221707595418
prctl(PR_SET_TSC, PR_TSC_ENABLE)
rdtsc() == 1221707621154
prctl(PR_SET_TSC, PR_TSC_SIGSEGV)
rdtsc() == [ SIG_SEGV ]
prctl(PR_GET_TSC, &tsc_val); tsc_val == PR_TSC_SIGSEGV
prctl(PR_SET_TSC, PR_TSC_ENABLE)
rdtsc() == 1221707702746

----------------------------------------------------------------

I get an incorrect bogomips evaluation in /proc/cpuinfo, about ahalf of what should be reported. TSC is of course not available as a clocksource and there are now TSC related messages in dmesg. However, I can find oops like these in early dmesg

[    0.020000] Calibrating delay loop... 1795.68 BogoMIPS (lpj=8978432)
[    0.290000] ------------[ cut here ]------------
[    0.290000] WARNING: at /home/larchey/linux-2.6.32-ovz/arch/x86/include/asm/tsc.h:27 calibrate_delay+0x291/0x367()
[    0.290000] Hardware name: KM266-8235
[    0.290000] Modules linked in:
[    0.290000] Pid: 0, comm: swapper Not tainted 2.6.32.25-ovz-notsc-hard #4
[    0.290000] Call Trace:
[    0.290000]  [<c129240c>] ? printk+0x18/0x1c
[    0.290000]  [<c102498d>] warn_slowpath_common+0x6d/0xa0
[    0.290000]  [<c13c1e7b>] ? calibrate_delay+0x291/0x367
[    0.290000]  [<c13c1e7b>] ? calibrate_delay+0x291/0x367
[    0.290000]  [<c10249d5>] warn_slowpath_null+0x15/0x20
[    0.290000]  [<c13c1e7b>] calibrate_delay+0x291/0x367
[    0.290000]  [<c13a631e>] ? tsc_init+0xf/0x186
[    0.290000]  [<c103e9bb>] ? ktime_get+0x5b/0xf0
[    0.290000]  [<c13a0652>] start_kernel+0x244/0x2b3
[    0.290000]  [<c13a01a2>] ? unknown_bootoption+0x0/0x193
[    0.290000]  [<c13a009e>] i386_start_kernel+0x9e/0xa5
[    0.290000] ---[ end trace 4eaa2a86a8e2da22 ]---
[    0.300000] Cycles are stuck! Some statistics will not be available.

--------------------------------------------------------------------------------

# ls  /proc/acpi/processor/*/power
/proc/acpi/processor/CPU0/power
# cat /proc/acpi/processor/*/power
active state:            C0
max_cstate:              C1
maximum allowed latency: 2000000000 usec
states:
    C1:                  type[C1] promotion[--] demotion[--] latency[000] usage[00000000] duration[00000000000000000000]

--------------------------------------------------------------------------------

# ls -al /sys/devices/system/cpu/cpu*/cpuidle/state*/*
ls: cannot access /sys/devices/system/cpu/cpu*/cpuidle/state*/*: No such file or directory

# cat /sys/devices/system/cpu/cpuidle/current_driver 
acpi_idle
Comment 16 Thomas Renninger 2011-03-10 09:20:39 UTC
> I can find oops like these in early dmesg
Yep, something is borked if one tries to disable tsc. Same with notsc boot param.
No time to look at it right now, though.

That your system only shows C1 is strange.
The message you showed:
> Marking TSC unstable due to TSC halts in idle
should only show up if C2 or deeper states (or C1E, but this should not exist on older machines?) are supported.
Compare with drivers/acpi/processor_idle.c:
                if (state > ACPI_STATE_C1)
                        mark_tsc_unstable("TSC halts in idle");
Could it be that you tried with processor.max_cstate=1?

Can you double check that you did not use processor.max_cstate=1 and an unmodified kernel. If there still is only C1 in /proc/acpi/processor/*/power then it is something else.
Ok, local_apic_timer_c2_ok is set to 0 already by default and needs to get enabled via boot param, this cannot be it.
Also a wiki page about this processor tells that the new "mobile" (only AthlonXP "M") feature is to support Powernow!. So it really might have to do with tsc+powernow!.
Comment 17 Thomas Renninger 2011-03-10 09:26:00 UTC
Created attachment 50542 [details]
Enforce using of broadcast timers through boot param processor

Enforce using of broadcast timers through boot param processor:
processor.timer_broadcast=1

Might help...
Comment 18 Dominique Larchey-Wendling 2011-03-10 11:35:35 UTC
(In reply to comment #16)

> That your system only shows C1 is strange.
> The message you showed:
> > Marking TSC unstable due to TSC halts in idle
> should only show up if C2 or deeper states (or C1E, but this should not exist
> on older machines?) are supported.
> Compare with drivers/acpi/processor_idle.c:
>                 if (state > ACPI_STATE_C1)
>                         mark_tsc_unstable("TSC halts in idle");
> Could it be that you tried with processor.max_cstate=1?

Yes yes I did use this command line parameter. So the fact that my system only shows C1 is normal in this case I suppose. I will try to boot the system without the processor.max_cstate=1 parameter.

> Also a wiki page about this processor tells that the new "mobile" (only
> AthlonXP "M") feature is to support Powernow!. So it really might have to do
> with tsc+powernow!.

Powernow does not work on my system. 

----------------------------------------------------------------------------

[  121.157837] Fast TSC calibration using PIT
[  121.170872] powernow: Trying ACPI perflib
[  121.170930] powernow: ACPI perflib can not be used on this platform
[  121.170973] powernow: ACPI and legacy methods failed

-----------------------------------------------------------------------------

I bought the XP-M on Ebay. My mainboard (a FX41 from Shuttle) was not supposed to run mobile Athlon so I think the BIOS does not support the XP-M. I will try patches from 

http://www.yggdrasl.demon.co.uk/code/

but I need to guess the correct parameters to manually configure Powernow on my Athlon.
Comment 19 Dominique Larchey-Wendling 2011-03-10 12:47:32 UTC
So with my 2.6.32 compiled without TSC (CONFIG_X86_TSC undefined) but with ACPI_PROCESSOR=m and running without the "processor.max_cstate=1" command line parameter, here are the results :

-----------------------------------------------------------------------------

$  ls  /proc/acpi/processor/*/power
/proc/acpi/processor/CPU0/power

$ cat /proc/acpi/processor/*/power
active state:            C0
max_cstate:              C8
maximum allowed latency: 2000000000 usec
states:
    C1:                  type[C1] promotion[--] demotion[--] latency[000] usage[00002176] duration[00000000000000000000]
    C2:                  type[C2] promotion[--] demotion[--] latency[090] usage[00020235] duration[00000000001322893600]

-----------------------------------------------------------------------------

$ for x in `ls /sys/devices/system/cpu/cpu*/cpuidle/state*/*` ; do echo $x ; cat $x ; done

/sys/devices/system/cpu/cpu0/cpuidle/state0/desc
CPUIDLE CORE POLL IDLE
/sys/devices/system/cpu/cpu0/cpuidle/state0/latency
0
/sys/devices/system/cpu/cpu0/cpuidle/state0/name
C0
/sys/devices/system/cpu/cpu0/cpuidle/state0/power
4294967295
/sys/devices/system/cpu/cpu0/cpuidle/state0/time
0
/sys/devices/system/cpu/cpu0/cpuidle/state0/usage
0
/sys/devices/system/cpu/cpu0/cpuidle/state1/desc
<null>
/sys/devices/system/cpu/cpu0/cpuidle/state1/latency
0
/sys/devices/system/cpu/cpu0/cpuidle/state1/name
C1
/sys/devices/system/cpu/cpu0/cpuidle/state1/power
0
/sys/devices/system/cpu/cpu0/cpuidle/state1/time
1704275
/sys/devices/system/cpu/cpu0/cpuidle/state1/usage
2178
/sys/devices/system/cpu/cpu0/cpuidle/state2/desc
<null>
/sys/devices/system/cpu/cpu0/cpuidle/state2/latency
90
/sys/devices/system/cpu/cpu0/cpuidle/state2/name
C2
/sys/devices/system/cpu/cpu0/cpuidle/state2/power
0
/sys/devices/system/cpu/cpu0/cpuidle/state2/time
416769058
/sys/devices/system/cpu/cpu0/cpuidle/state2/usage
22022
Comment 20 Thomas Renninger 2011-03-10 13:21:01 UTC
I expect C2 is causing the trouble. processor.max_cstate=1 is the workaround that should help.
Patch/test from comment #17 won't help.
As this machine is a bit older it might not even have an hpet timer?
(dmesg |grep -i hpet). If apic (and pic?) and tsc do not work due to the sleep state and the machine does not have an hpet, there aren't much timers left for time keeping?
But I won't be able to help you further there, you'd need to debug which timer still ticks in C2 (and whether there still is one functioning at all) and whether timer interrupts are still fired at the right times.
nohz=off is an option you might want to try with C2 enabled.
Comment 21 Dominique Larchey-Wendling 2011-03-10 16:22:11 UTC
You are right the machine is a bit old. There is no HPET. The processor.max_cstate=1 is indeed a simple workaround. powernow_k7.ko seems to work with manual configuration but 

modprobe powernow-k7 fsb=133 overwrite_table=1 multiplier=75,105,135 switch_latency=650

has the immediate consequence that TSC becomes unstable (no because of C2 but because of multiplier/frequency chance).

Mar 10 12:43:21 minipc klogd: [61408.385984] powernow: PowerNOW! Technology present. Can scale: frequency and voltage.
Mar 10 12:43:21 minipc klogd: [61408.386033] powernow: Overwriting PST table with manual settings
Mar 10 12:43:21 minipc klogd: [61408.386055] VID: 0xb (1.450V)
Mar 10 12:43:21 minipc klogd: [61408.386068] VID: 0xb (1.450V)
Mar 10 12:43:21 minipc klogd: [61408.386081] VID: 0xb (1.450V)
Mar 10 12:43:22 minipc klogd: [61408.386133] powernow: Minimum speed 997 MHz. Maximum speed 1795 MHz.
Mar 10 12:43:22 minipc klogd: [61408.386192] Marking TSC unstable due to cpufreq changes
Mar 10 12:43:22 minipc klogd: [61408.388262] Switching to clocksource acpi_pm

Since my server is not overheated and a working timer seems mandatory, I will stick to the workaround 

processor.max_cstate=1 

Thank you very much for your help.

Note You need to log in before you can comment on or make changes to this bug.