Bug 10561
Summary: | p4_clockmod: repeated lockups - classmate PC | ||
---|---|---|---|
Product: | Power Management | Reporter: | Omair Eshkenazi (stimpson) |
Component: | cpufreq | Assignee: | acpi_power-processor |
Status: | REJECTED INSUFFICIENT_DATA | ||
Severity: | normal | CC: | bunk, davej, lenb, rui.zhang, tglx, venki |
Priority: | P1 | ||
Hardware: | All | ||
OS: | Linux | ||
Kernel Version: | 2.6.24.5 | Subsystem: | |
Regression: | Yes | Bisected commit-id: | |
Attachments: |
Lots of info from /proc that might help
Output of dmesg -s 1M Info about system time being slowed down. |
Description
Omair Eshkenazi
2008-04-27 02:07:32 UTC
Created attachment 15931 [details]
Lots of info from /proc that might help
Please attach the output of "dmesg -s 1000000" for both booting with HPET and booting without HPET. Created attachment 15955 [details]
Output of dmesg -s 1M
Can you please provide the output of /sys/devices/system/clocksource/clocksource0/current_clocksource for both cases ? Thanks, tglx When using hpet it says hpet and when not it says tsc, but I have a feeling I saw it say acpi_pm in some cases. My previous comment was incorrect. It says tsc weather hpet is enabled or not. However, I just tried and modprobing processor, battery and ac made it change to hpet! (battery and ac might be irrelevant and it just needed time to update -- I can rmmod them but not processor - it says "Module processor is in use".) This is with hpet enabled, so with hpet disabled it will probably be acpi_pm, as I've noticed before. (I also tried modprobing p4-clockmod to see if it has any effect, but it hasn't -- it still says hpet and it just locked up...) > This is with hpet enabled, so with hpet disabled it will probably be acpi_pm, > as I've noticed before. Is the switch to hpet resp. pm_timer noted in dmesg ? > (I also tried modprobing p4-clockmod to see if it has any effect, but it > hasn't > -- it still says hpet and it just locked up...) Hmm, that's a tough one :( Venki, any pointers ? Can you reproduce this when the p4_clockmod driver is not used?
(and why are you using p4_clockmod on a modern processor in the first place
instead of acpi_cpufreq?)
But it may be that p4_clockmod has nothing to do with this,
and that it is purely an hpet issue. For we switch to using
the hpet when the processor module is loaded -- as this
box has C2 we will stop using the tsc when processor is loaded
and use the next best timer, which will be the hpet if allowed...
In particular, the BIOS doesn't export the HPET on this box,
we are enabling via chipset probing:
> Force enabled HPET at base address 0xfed00000
Another thing to try is to load the processor module
with processor.max_cstate=1 (via /etc/modprobe.conf)
which will disable C2 and allow the TSC to be used instead
of the hpet. This should give you the same results as
hpet=disable, and perhaps you should be able to happily
run p4_clockmod in this case and prove that it is not
related to the hang.
Created attachment 15974 [details]
Info about system time being slowed down.
| Can you reproduce this when the p4_clockmod driver is not used?
No, I tried that in the past and again now -- I waited a long time and also run X and everything seems fine. As I mentioned at the beginning, I *can* remove p4-clockmod (and speedstep_lib) and the lockups will continue, but it seems to trigger something that causes the lockups...
| (and why are you using p4_clockmod on a modern processor in the first place
instead of acpi_cpufreq?)
acpi-cpufreq won't load on this machine: "no such device".
Another thing I tried is, after it had already locked up once, I echoed acpi_pm into /sys/devices/system/clocksource/clocksource0/current_clocksource, which caused a message to appear about acpi_pm being enabled (so it worked ok).
However, it still locked up afterwards...
| Another thing to try is to load the processor module
| with processor.max_cstate=1 (via /etc/modprobe.conf)
You mean "option processor max_cstate=1" (the other notation is for builtins).
When probing processor I got a message about it being limited to max C-state 1.
It also caused the time/tsc problem (below) to go away so I got no "tsc unstable" message and it didn't lock up.
However, unlike booting with hpet=disable, in this case it kept using tsc, rather than use acpi_pm.
If I'm not mistaken, it also means the processor can't sleep... so it's not very power-efficient.
I discovered something about the "Clocksource tsc unstable" messages:
When I boot (with hpet enabled or disabled), tsc is used at first.
Then I modprobe processor and there's a period of from a few seconds to a few minutes before that message appears (after which the clocksource used changes to hpet or acpi_pm, depending on how I booted).
Now, during that period, is seems like time pretty much comes to a halt! For every 5 seconds that pass in the real world, the system time changes by about 20 ms.
I've gathered some data that shows how it develops and am attaching it (more info inside).
Note that this is the same whether I boot with hpet enabled or disabled, as hpet/acpi_pm aren't used yet at this stage.
What happens when you boot with clocksource=hpet or clocksource=acpi_pm on the kernel command line ? Thanks, tglx > What happens when you boot with clocksource=hpet or clocksource=acpi_pm on
> the
> kernel command line ?
Whichever one I boot with is, indeed, used from early on in the boot process (when tsc was previously initialized) and goes on being used after processor is loaded.
As a result, the period (after loading processor) when the system clock is slowed down does not exist.
However, after loading processor I still get the "tsc clocksource unstable" message (so I guess it's alive somewhere).
Loading p4-clockmod still causes lockups when hpet is used...
> ------- Comment #11 from stimpson@phys.huji.ac.il 2008-04-30 03:31 -------
> > What happens when you boot with clocksource=hpet or clocksource=acpi_pm on
> the
> > kernel command line ?
>
> Whichever one I boot with is, indeed, used from early on in the boot process
> (when tsc was previously initialized) and goes on being used after processor
> is
> loaded.
> As a result, the period (after loading processor) when the system clock is
> slowed down does not exist.
> However, after loading processor I still get the "tsc clocksource unstable"
> message (so I guess it's alive somewhere).
> Loading p4-clockmod still causes lockups when hpet is used...
Hmm. When you use clocksource=acpi_pm then it works with p4-clockmod, right ?
Can you please provide the output of /proc/timer_list for that scenario ?
Thanks,
tglx
> When you use clocksource=acpi_pm then it works with p4-clockmod, right ?
No it locks up. I guess it's like when I boot with no parameter and echo acpi_pm into the current_clocksource file (as I did a couple of days ago).
I'll try and get you some stats from /proc/time_list, just in case it helps, but it seems to be completely locked up at the moment so I'll have to forcefully power it off and get back to you later.
On Wed, 30 Apr 2008, bugme-daemon@bugzilla.kernel.org wrote: > ------- Comment #13 from stimpson@phys.huji.ac.il 2008-04-30 04:21 ------- > > When you use clocksource=acpi_pm then it works with p4-clockmod, right ? > > No it locks up. I guess it's like when I boot with no parameter and echo > acpi_pm into the current_clocksource file (as I did a couple of days ago). > I'll try and get you some stats from /proc/time_list, just in case it helps, > but it seems to be completely locked up at the moment so I'll have to > forcefully power it off and get back to you later. Ok, so the only way to make it work is hpet=disable. I fear we can't do much about that then. hpet is not supported by the BIOS so it might be a chipset/BIOS bug in combination with the clock scaling. Thanks, tglx > hpet is not supported by the BIOS so it might
> be a chipset/BIOS bug in combination with the clock scaling.
Hmm. That was kind of my basic assumption (hence the "funky" in the title) and I was hoping something could be done to batter handle such problems (distro kernels have hpet enabled, so some users might encounter it and not know what the cause is...).
Oh, well, on to my next bug report...
> Ok, so the only way to make it work is hpet=disable.
Isn't the other solution to simply not load p4_clockmod?
(and continue using the hpet, as is done by default)
I thought that dave was going to delete p4_clockmod from the source
tree about this time last year, I'm not sure why it still exists.
> I thought that dave was going to delete p4_clockmod from the source
> tree about this time last year, I'm not sure why it still exists.
But if it is removed, what will us mortals use?
These new Intel mobile processors don't support EST and so the only driver that works with them in recent kernels is p4_clockmod...
re: comment #9 > You mean "option processor max_cstate=1" (the other notation is for > builtins). > When probing processor I got a message about it being limited to max C-state > 1. > It also caused the time/tsc problem (below) to go away so I got no "tsc > unstable" message and it didn't lock up. > However, unlike booting with hpet=disable, in this case it kept using tsc, > rather than use acpi_pm. > If I'm not mistaken, it also means the processor can't sleep... so it's not > very power-efficient. Actually this is probably fine. It will give you C1E, which is likely 90+ % of the value of C2, and it will not disable the use of the TSC as a timesource. I'd be interested if you can detect a battery life difference between booting with max_cstate=1 and without. But this business of the HPET and the TSC is not directly related to the actual failure, as in comment #13 you stated that this can happen when using acpi_pm as the clocksource, eliminating HPET and TSC from the picture. The problem is with p4_clockmod. I'd be interested if you can measure any benefit from using p4_clockmod. My recommendation would be to simply not use it. bug closed as there is no response from the bug reporter. please re-open it if you can provide the info required in comment #18. Sorry, I had no internet access for six months... will email Len about this soon. that's okay. you don't need to email len, please just re-open the bug once you can do the test and attach the test result here. :) |