Bug 11348 - performance regression at 2.6.26 - Pentium III 700, motherboard Gigabyte GA-6BX7
Summary: performance regression at 2.6.26 - Pentium III 700, motherboard Gigabyte GA-6BX7
Status: REJECTED UNREPRODUCIBLE
Alias: None
Product: ACPI
Classification: Unclassified
Component: Power-Processor (show other bugs)
Hardware: All Linux
: P1 normal
Assignee: Zhang Rui
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2008-08-15 04:21 UTC by Zbigniew
Modified: 2008-12-08 13:27 UTC (History)
2 users (show)

See Also:
Kernel Version: 2.6.26.2
Subsystem:
Regression: Yes
Bisected commit-id:


Attachments
dmidecode's output (my current hardware) (8.91 KB, text/plain)
2008-09-10 00:47 UTC, Zbigniew
Details
patch: disable c state on old cpus (1.05 KB, patch)
2008-11-04 22:00 UTC, Zhang Rui
Details | Diff
ACPI dump of GA-6BX7 made on 2008-12-07 (11.93 KB, application/x-gzip)
2008-12-07 07:19 UTC, Zbigniew
Details
Dmesg made on 2008-12-07 (8.74 KB, application/x-gzip)
2008-12-07 07:20 UTC, Zbigniew
Details

Description Zbigniew 2008-08-15 04:21:20 UTC
Latest working kernel version: 2.6.25.2
Earliest failing kernel version: 2.6.26
Distribution: Debian
Hardware Environment: Pentium III 700, motherboard Gigabyte GA-6BX7
Software Environment: Debian Etch
Problem Description:

I noticed, that using new kernels (tested 2.6.26 and 2.6.26.2), there are 
typical for ACPI-related problems troubles with responsitivity. For example: 
if I'm opening "bloated" web page using Opera, one can't move the mouse 
cursor smoothly around the screen; there is a "jitter" during the movement, 
and any output (like GKrellm's diagrams created in real time) for that 2-3 
seconds is "frozen". As I've found googling around, these are rather typical 
ACPI-related problems, and indeed: forcing "acpi=off" was the cure.

Additionally I noticed, that there are no such problems while using older 
kernel 2.6.25.2 - so something bad happened during the period "between" 
2.6.25.2 and 2.6.26 (but I didn't try later 2.6.25.x kernels).

Following the tip of Andi Klein, I tried processor.max_cstate=1 parameter; it made the situation a bit better - I mean: no "jitter" during mouse movement anymore, but still "momentary freeze" of GKrellm output (which isn't the case when using 2.6.25.2 kernel).


Steps to reproduce: by using kernel 2.6.26 in the conditions described above
Comment 1 Andi Kleen 2008-08-15 23:15:55 UTC
Can you please attach full dmesg?

Hmm actually I notice your old CPU now and non mobile P3 shouldn't have anything deeper than C1 anyways, so processor.max_cstate=1 should be a nop. 
Comment 2 Zbigniew 2008-08-17 03:43:29 UTC
Making additional test (with max_cstate=1), unfortunately, I see no difference (I don't know, was it autosugestion before, or some specific conditions(?)) - responsitivity is as poor, as without this.
I just added into /etc/modutils/processor a line:
options processor max_cstate=1 (...and then made "update-modules").

I'm not sure, whether it's related in any way, or not - but I forgot to mention one more thing:

With new 2.6.26.x kernels several times I could see a problem by shutdown: 
the system stays at the line "Asking all remaining processes to terminate", 
not advancing to the next line "Killing all remaining processes". Of course, 
I've got to press the reset button (which means unclean shutdown and checking the filesystem by next boot), instead of waiting endlessly.

Never had such problem with any of the previous kernels.
Comment 3 Zhang Rui 2008-08-17 18:41:59 UTC
as this is a regression, could you please try git-bisect to narrow down the problem please?
Comment 4 Andi Kleen 2008-08-17 18:43:47 UTC
I guess bisect would only be useful once the problem is stabilized.
Comment 5 Len Brown 2008-08-22 10:45:55 UTC
After the problem is observed, please
cat /proc/interrupts
grep . /sys/firmware/acpi/interrupts/*
and paste the output here.
so we can see if this is some sort of event storm.
If yes, you may also observe with top(1) if any threads/processes
are hogging the cpu time during the performance problem.
Comment 6 Zbigniew 2008-08-22 13:48:00 UTC
It's rather impossible (because of the "throttling" of io, as I wrote already). I tried to, anyway - but pay attention, that in fact it has been done really AFTER the problem has been observed, not during - and below is the result of cat /proc/interrupts :

           CPU0       
  0:      27516    XT-PIC-XT        timer
  1:        287    XT-PIC-XT        i8042
  2:          0    XT-PIC-XT        cascade
  3:          3    XT-PIC-XT        ohci1394
  4:          1    XT-PIC-XT      
  6:          5    XT-PIC-XT        floppy
  9:          0    XT-PIC-XT        acpi
 10:      17598    XT-PIC-XT        sata_sil
 11:        728    XT-PIC-XT        uhci_hcd:usb1, eth0
 12:       8723    XT-PIC-XT        i8042
 14:       1204    XT-PIC-XT        ata_piix
 15:       7358    XT-PIC-XT        ata_piix, bttv0, EMU10K1, radeon@pci:0000:01:00.0
NMI:          0   Non-maskable interrupts
LOC:          0   Local timer interrupts
TRM:          0   Thermal event interrupts
SPU:          0   Spurious interrupts
ERR:          0
MIS:          0

...and here's output of grep . /sys/firmware/acpi/interrupts/* :

/sys/firmware/acpi/interrupts/error:0
/sys/firmware/acpi/interrupts/ff_gbl_lock:0
/sys/firmware/acpi/interrupts/ff_pmtimer:0
/sys/firmware/acpi/interrupts/ff_pwr_btn:0
/sys/firmware/acpi/interrupts/ff_rt_clk:0
/sys/firmware/acpi/interrupts/ff_slp_btn:0
/sys/firmware/acpi/interrupts/gpe00:0
/sys/firmware/acpi/interrupts/gpe01:0
/sys/firmware/acpi/interrupts/gpe02:0
/sys/firmware/acpi/interrupts/gpe03:0
/sys/firmware/acpi/interrupts/gpe04:0
/sys/firmware/acpi/interrupts/gpe05:0
/sys/firmware/acpi/interrupts/gpe06:0
/sys/firmware/acpi/interrupts/gpe07:0
/sys/firmware/acpi/interrupts/gpe08:0
/sys/firmware/acpi/interrupts/gpe09:0
/sys/firmware/acpi/interrupts/gpe0A:0
/sys/firmware/acpi/interrupts/gpe0B:0
/sys/firmware/acpi/interrupts/gpe0C:0
/sys/firmware/acpi/interrupts/gpe0D:0
/sys/firmware/acpi/interrupts/gpe0E:0
/sys/firmware/acpi/interrupts/gpe0F:0
/sys/firmware/acpi/interrupts/gpe_all:0
/sys/firmware/acpi/interrupts/sci:0

...but, alas, it's not possible to trace anything using "top" ("frozen" output for duration of the "throttling").

BTW: Did you mention the comments from the mail at the list, sent at Tue, 19 Aug 2008 13:48:07 +0200 by Dan Vratil <progdan@progdansoft.com>, with the subject "ACPI failure"? He wrote: "Then in one version (i think it was 2.6.25.6, but not sure) ACPI tables were updated to be compatible with Windows Vista, but it has broken down my ACPI".

I think, he just described the same problem.
Comment 7 Zbigniew 2008-08-23 06:24:53 UTC
Maybe "vmstat" output will give you a clue? I tried to take its output to a file (vmstat 1 10 > file.txt), while reloading that "bloated" web page.

1. Output from within 2.6.26.2:

procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa
 2  0      0  25932  17532 113644    0    0   738   125  431  567 25 12 38 25
 0  0      0  25812  17532 113644    0    0     0     0  540  613 25  4 71  0
 1  0      0  25812  17532 113644    0    0     0     0  389  720  9  2 89  0
 8  0      0  25028  17672 113596    0    0     8   196 5580 3881 91  3  5  1
 2  0      0  24968  17760 113604    0    0     4   164 1112  243 90  5  5  0
 4  0      0  24968  17760 113588    0    0     0     0  933  536 73  3 24  0
 0  0      0  24908  17760 113588    0    0     0     0  518  429 25  3 72  0
 0  0      0  24968  17760 113588    0    0     0     0  278  175  3  1 96  0
 0  0      0  25000  17760 113672    0    0     0     0  278  309  7  2 91  0
 0  0      0  24880  17964 113672    0    0     0   344  322  198  3  0 96  1

1. Output from within 2.6.25.2:

procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa
 1  0      0  11844  30020 113044    0    0   901   195  349  584 27 13 25 36
 1  0      0  11492  30020 113044    0    0     0     0  688 1215 92  3  5  0
 1  0      0  11552  30020 113080    0    0     0     0  841 1585 92  8  0  0
 2  0      0  11492  30020 113080    0    0     0     0  796 1178 93  7  0  0
 1  0      0  11492  30020 113080    0    0     0     0  748 1028 99  1  0  0
 2  0      0  11516  30208 113080    0    0     0   264  901  935 96  4  0  0
 1  0      0  11516  30208 113080    0    0     0     0  984  706 94  6  0  0
 1  0      0  11516  30208 113076    0    0     0     0  838  978 93  4  3  0
 1  0      0  11336  30208 113076    0    0     0     0  687 1116 65  5 30  0
 1  0      0  11464  30208 113076    0    0     0     0  516  806 30  3 67  0


Pay attention at a "peak" in "interrupts per second" when running 2.6.26.2... (5580 !)
Comment 8 Zbigniew 2008-08-23 06:43:50 UTC
I forgot to add, that after finishing the test under 2.6.26.2 I had mentioned earlier problem with no passing by "Asking all remaining processes to terminate" (when shutting the system down) again... I had to make cold reset.

I don't know, is it related to described bug - but it seems to be serious problem as well.
Comment 9 Zhang Rui 2008-09-01 18:47:11 UTC
does boot parameter "processor.max_cstate=1" help?
Comment 10 Zbigniew 2008-09-02 01:13:46 UTC
No, it doesn't, because:

1) I've got "processor" as a module, and because of this "processor.max_cstate=1" is ignored as a boot parameter.

2) Adding into /etc/modprobe.d/options a line: "options processor max_cstate=1" didn't help either. I wrote about this already - but I checked it again right now, and again: it means no change.
Comment 11 ykzhao 2008-09-02 01:35:39 UTC
If the processor is compiled as module, please remove the processor module manually and install the processor module manually.In such case the module parameter of "max_cstate=1" is added when inserting processor module.

Of course you can set "CONFIG_ACPI_PROCESSOR=y" in kernel configuration and try the boot option of "processor.max_cstate=1" again.

Thanks.
Comment 12 Zbigniew 2008-09-02 02:43:02 UTC
1) It's not possible to remove processor module (each time "ERROR: Module processor is in use").
2) I cannot compile kernel just to change this one & only option - it takes 5 hours on my machine.
3) What makes you think, that adding the desired option into /etc/modprobe.d/options isn't enough? I've checked that option several times already - it doesn't make anything better - and I really don't know, what do you expect.
4) What is a problem with bringing back the functionality, which was quite OK including 2.6.25(.2)? Just to bring back well-tested, fully functional code - and dispose of that new "improvements".
Comment 13 Zbigniew 2008-09-09 16:56:09 UTC
Today I made another try compiling just kernel alone (without any modules), with configuration having "processor" built-in. With that ACPI-module compiled "for steady" into the kernel and "processor.max_cstate=1" as boot parameter, the problem has been successfully blocked.
Comment 14 Shaohua 2008-09-09 20:40:37 UTC
Can you please tell us the output of 
cat /proc/acpi/processor/*/power
Comment 15 Zbigniew 2008-09-10 00:10:51 UTC
active state:            C0
max_cstate:              C1
bus master activity:     00000000
maximum allowed latency: 2000000000 usec
states:
    C1:                  type[C1] promotion[--] demotion[--] latency[000] usage[00014918] duration[00000000000000000000]
    C2:                  type[C2] promotion[--] demotion[--] latency[090] usage[00000000] duration[00000000000000000000]
Comment 16 Zbigniew 2008-09-10 00:47:37 UTC
Created attachment 17709 [details]
dmidecode's output (my current hardware)

dmidecode's output attached
Comment 17 Zhang Rui 2008-11-04 22:00:25 UTC
Created attachment 18682 [details]
patch: disable c state on old cpus

please try this patch.
Comment 18 Zbigniew 2008-11-14 14:09:57 UTC
Yes, tried it today with kernel 2.6.27.5 - and it seems to fix the problem. Thanks! :)
Comment 19 Zhang Rui 2008-11-16 19:47:21 UTC
len,
Shaohua and I agree that we should disable C-state on old processors, to avoid some potential risks, as C-state on these old processors is not well supported,
what do you think?
Comment 20 Len Brown 2008-11-24 18:43:48 UTC
with the unpatched kernel, please verify that booting with "idle=poll"
makes the problem go away.

please update to 2.6.27 so you can get access to the following
new parameter, and please see if it also helps:
"idle=halt"

please attach the output from dmesg -s64000 from your unpatched
kernel, and please attach one copy of the output from acpidump.

re-opening this bug, b/c we can not disable C-states
for all processors before p4.
Comment 21 Len Brown 2008-11-24 18:49:04 UTC
also...
if you're building that 2.6.27 kernel from scratch,
please be sure to include CONFIG_CPU_IDLE=y

please run powertop and report if you're getting
an excessive number of wakeup's per seconds
when idle, for this bug may be a duplicate of bug 10914
which was seen in 2.6.26 and mysteriously gone in 2.6.27.
Comment 22 Zbigniew 2008-12-07 07:19:21 UTC
Created attachment 19194 [details]
ACPI dump of GA-6BX7 made on 2008-12-07
Comment 23 Zbigniew 2008-12-07 07:20:57 UTC
Created attachment 19195 [details]
Dmesg made on 2008-12-07

Dmesg made on 2008-12-07, Gobolinux, kernel 2.6.27.8 compiled with gcc 4.3.x, glibc 2.8
Comment 24 Zbigniew 2008-12-07 07:22:50 UTC
Today I've compiled newest 2.6.27.8 kernel with no additional ACPI-related patches, but there wasn't any need to test "idle=poll" neither "idle=halt", because... the problem couldn't be noticed anymore.

It's still the same hardware, with small differences:
- newer SATA controller (SiI3512 instead of 3112)
- new harddisk
- the compilation have been done using newer gcc and newer glibc (2.8), Gobolinux distribution

...but AFAIK the differences listed couldn't have any (significant) influence?

Is it possible, that some other improvements in new kernel made the problem disappear? Or perhaps it still persist - but just it isn't so much annoying, that it's harder to notice?

If there's a need to make additional check, just let me know.

More details:

> if you're building that 2.6.27 kernel from scratch,
> please be sure to include CONFIG_CPU_IDLE=y

# shared options
#
# CONFIG_X86_ACPI_CPUFREQ_PROC_INTF is not set
CONFIG_X86_SPEEDSTEP_LIB=m
CONFIG_X86_SPEEDSTEP_RELAXED_CAP_CHECK=y
CONFIG_CPU_IDLE=y
CONFIG_CPU_IDLE_GOV_LADDER=y
CONFIG_CPU_IDLE_GOV_MENU=y

Last lines of lsmod:

thermal                16284  0
button                  7056  0
processor              34988  2 thermal


> please run powertop and report if you're getting
> an excessive number of wakeup's per seconds
> when idle, for this bug may be a duplicate of bug 10914
> which was seen in 2.6.26 and mysteriously gone in 2.6.27.

Output of powertop:

     PowerTOP version 1.9       (C) 2007 Intel Corporation

Cn                Avg residency       P-states (frequencies)
C0 (CPU aktywny)        (99,9%)
C1                0,0ms ( 0,0%)
C2                0,0ms ( 0,1%)

Wakeups-from-idle per second : 547,4    interval: 5,0s

Top causes for wakeups:
  45,3% ( 75,0)       <interrupt> : bttv0, EMU10K1, radeon@pci:0000:01:00.0
  13,0% ( 21,6)       <interrupt> : ide0 
  12,1% ( 20,0)   <kernel module> : bttv_input_init (bttv_input_timer)
   8,3% ( 13,8)       <interrupt> : PS/2 keyboard/mouse/touchpad 
   6,3% ( 10,4)           gkrellm : schedule_timeout (process_timeout)
   5,6% (  9,2)                 X : do_setitimer (it_real_fn)

Attached acpi-dump (file acpi20081207.dump.gz) and dmesg (dmesg20081207.gz).
Comment 25 Zhang Rui 2008-12-07 19:11:35 UTC
(In reply to comment #24)
> Today I've compiled newest 2.6.27.8 kernel with no additional ACPI-related
> patches, but there wasn't any need to test "idle=poll" neither "idle=halt",
> because... the problem couldn't be noticed anymore.
> 
right, there are a lot of C state improvement recently, 

please re-open this bug if you can reproduce the problem in the latest kernel release. :)
Comment 26 Zbigniew 2008-12-08 13:27:28 UTC
To be quite sure, that the problem really is gone, I made today additional test, compiling the newest kernel within the older environment (Debian Etch), and made the test using older both HDD and controller. And indeed: the problem seems to be gone by now.

Big thanks to all of you, gentlemen, for all your efforts! :)

Note You need to log in before you can comment on or make changes to this bug.