Latest working kernel version: 2.6.25.2 Earliest failing kernel version: 2.6.26 Distribution: Debian Hardware Environment: Pentium III 700, motherboard Gigabyte GA-6BX7 Software Environment: Debian Etch Problem Description: I noticed, that using new kernels (tested 2.6.26 and 2.6.26.2), there are typical for ACPI-related problems troubles with responsitivity. For example: if I'm opening "bloated" web page using Opera, one can't move the mouse cursor smoothly around the screen; there is a "jitter" during the movement, and any output (like GKrellm's diagrams created in real time) for that 2-3 seconds is "frozen". As I've found googling around, these are rather typical ACPI-related problems, and indeed: forcing "acpi=off" was the cure. Additionally I noticed, that there are no such problems while using older kernel 2.6.25.2 - so something bad happened during the period "between" 2.6.25.2 and 2.6.26 (but I didn't try later 2.6.25.x kernels). Following the tip of Andi Klein, I tried processor.max_cstate=1 parameter; it made the situation a bit better - I mean: no "jitter" during mouse movement anymore, but still "momentary freeze" of GKrellm output (which isn't the case when using 2.6.25.2 kernel). Steps to reproduce: by using kernel 2.6.26 in the conditions described above
Can you please attach full dmesg? Hmm actually I notice your old CPU now and non mobile P3 shouldn't have anything deeper than C1 anyways, so processor.max_cstate=1 should be a nop.
Making additional test (with max_cstate=1), unfortunately, I see no difference (I don't know, was it autosugestion before, or some specific conditions(?)) - responsitivity is as poor, as without this. I just added into /etc/modutils/processor a line: options processor max_cstate=1 (...and then made "update-modules"). I'm not sure, whether it's related in any way, or not - but I forgot to mention one more thing: With new 2.6.26.x kernels several times I could see a problem by shutdown: the system stays at the line "Asking all remaining processes to terminate", not advancing to the next line "Killing all remaining processes". Of course, I've got to press the reset button (which means unclean shutdown and checking the filesystem by next boot), instead of waiting endlessly. Never had such problem with any of the previous kernels.
as this is a regression, could you please try git-bisect to narrow down the problem please?
I guess bisect would only be useful once the problem is stabilized.
After the problem is observed, please cat /proc/interrupts grep . /sys/firmware/acpi/interrupts/* and paste the output here. so we can see if this is some sort of event storm. If yes, you may also observe with top(1) if any threads/processes are hogging the cpu time during the performance problem.
It's rather impossible (because of the "throttling" of io, as I wrote already). I tried to, anyway - but pay attention, that in fact it has been done really AFTER the problem has been observed, not during - and below is the result of cat /proc/interrupts : CPU0 0: 27516 XT-PIC-XT timer 1: 287 XT-PIC-XT i8042 2: 0 XT-PIC-XT cascade 3: 3 XT-PIC-XT ohci1394 4: 1 XT-PIC-XT 6: 5 XT-PIC-XT floppy 9: 0 XT-PIC-XT acpi 10: 17598 XT-PIC-XT sata_sil 11: 728 XT-PIC-XT uhci_hcd:usb1, eth0 12: 8723 XT-PIC-XT i8042 14: 1204 XT-PIC-XT ata_piix 15: 7358 XT-PIC-XT ata_piix, bttv0, EMU10K1, radeon@pci:0000:01:00.0 NMI: 0 Non-maskable interrupts LOC: 0 Local timer interrupts TRM: 0 Thermal event interrupts SPU: 0 Spurious interrupts ERR: 0 MIS: 0 ...and here's output of grep . /sys/firmware/acpi/interrupts/* : /sys/firmware/acpi/interrupts/error:0 /sys/firmware/acpi/interrupts/ff_gbl_lock:0 /sys/firmware/acpi/interrupts/ff_pmtimer:0 /sys/firmware/acpi/interrupts/ff_pwr_btn:0 /sys/firmware/acpi/interrupts/ff_rt_clk:0 /sys/firmware/acpi/interrupts/ff_slp_btn:0 /sys/firmware/acpi/interrupts/gpe00:0 /sys/firmware/acpi/interrupts/gpe01:0 /sys/firmware/acpi/interrupts/gpe02:0 /sys/firmware/acpi/interrupts/gpe03:0 /sys/firmware/acpi/interrupts/gpe04:0 /sys/firmware/acpi/interrupts/gpe05:0 /sys/firmware/acpi/interrupts/gpe06:0 /sys/firmware/acpi/interrupts/gpe07:0 /sys/firmware/acpi/interrupts/gpe08:0 /sys/firmware/acpi/interrupts/gpe09:0 /sys/firmware/acpi/interrupts/gpe0A:0 /sys/firmware/acpi/interrupts/gpe0B:0 /sys/firmware/acpi/interrupts/gpe0C:0 /sys/firmware/acpi/interrupts/gpe0D:0 /sys/firmware/acpi/interrupts/gpe0E:0 /sys/firmware/acpi/interrupts/gpe0F:0 /sys/firmware/acpi/interrupts/gpe_all:0 /sys/firmware/acpi/interrupts/sci:0 ...but, alas, it's not possible to trace anything using "top" ("frozen" output for duration of the "throttling"). BTW: Did you mention the comments from the mail at the list, sent at Tue, 19 Aug 2008 13:48:07 +0200 by Dan Vratil <progdan@progdansoft.com>, with the subject "ACPI failure"? He wrote: "Then in one version (i think it was 2.6.25.6, but not sure) ACPI tables were updated to be compatible with Windows Vista, but it has broken down my ACPI". I think, he just described the same problem.
Maybe "vmstat" output will give you a clue? I tried to take its output to a file (vmstat 1 10 > file.txt), while reloading that "bloated" web page. 1. Output from within 2.6.26.2: procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu---- r b swpd free buff cache si so bi bo in cs us sy id wa 2 0 0 25932 17532 113644 0 0 738 125 431 567 25 12 38 25 0 0 0 25812 17532 113644 0 0 0 0 540 613 25 4 71 0 1 0 0 25812 17532 113644 0 0 0 0 389 720 9 2 89 0 8 0 0 25028 17672 113596 0 0 8 196 5580 3881 91 3 5 1 2 0 0 24968 17760 113604 0 0 4 164 1112 243 90 5 5 0 4 0 0 24968 17760 113588 0 0 0 0 933 536 73 3 24 0 0 0 0 24908 17760 113588 0 0 0 0 518 429 25 3 72 0 0 0 0 24968 17760 113588 0 0 0 0 278 175 3 1 96 0 0 0 0 25000 17760 113672 0 0 0 0 278 309 7 2 91 0 0 0 0 24880 17964 113672 0 0 0 344 322 198 3 0 96 1 1. Output from within 2.6.25.2: procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu---- r b swpd free buff cache si so bi bo in cs us sy id wa 1 0 0 11844 30020 113044 0 0 901 195 349 584 27 13 25 36 1 0 0 11492 30020 113044 0 0 0 0 688 1215 92 3 5 0 1 0 0 11552 30020 113080 0 0 0 0 841 1585 92 8 0 0 2 0 0 11492 30020 113080 0 0 0 0 796 1178 93 7 0 0 1 0 0 11492 30020 113080 0 0 0 0 748 1028 99 1 0 0 2 0 0 11516 30208 113080 0 0 0 264 901 935 96 4 0 0 1 0 0 11516 30208 113080 0 0 0 0 984 706 94 6 0 0 1 0 0 11516 30208 113076 0 0 0 0 838 978 93 4 3 0 1 0 0 11336 30208 113076 0 0 0 0 687 1116 65 5 30 0 1 0 0 11464 30208 113076 0 0 0 0 516 806 30 3 67 0 Pay attention at a "peak" in "interrupts per second" when running 2.6.26.2... (5580 !)
I forgot to add, that after finishing the test under 2.6.26.2 I had mentioned earlier problem with no passing by "Asking all remaining processes to terminate" (when shutting the system down) again... I had to make cold reset. I don't know, is it related to described bug - but it seems to be serious problem as well.
does boot parameter "processor.max_cstate=1" help?
No, it doesn't, because: 1) I've got "processor" as a module, and because of this "processor.max_cstate=1" is ignored as a boot parameter. 2) Adding into /etc/modprobe.d/options a line: "options processor max_cstate=1" didn't help either. I wrote about this already - but I checked it again right now, and again: it means no change.
If the processor is compiled as module, please remove the processor module manually and install the processor module manually.In such case the module parameter of "max_cstate=1" is added when inserting processor module. Of course you can set "CONFIG_ACPI_PROCESSOR=y" in kernel configuration and try the boot option of "processor.max_cstate=1" again. Thanks.
1) It's not possible to remove processor module (each time "ERROR: Module processor is in use"). 2) I cannot compile kernel just to change this one & only option - it takes 5 hours on my machine. 3) What makes you think, that adding the desired option into /etc/modprobe.d/options isn't enough? I've checked that option several times already - it doesn't make anything better - and I really don't know, what do you expect. 4) What is a problem with bringing back the functionality, which was quite OK including 2.6.25(.2)? Just to bring back well-tested, fully functional code - and dispose of that new "improvements".
Today I made another try compiling just kernel alone (without any modules), with configuration having "processor" built-in. With that ACPI-module compiled "for steady" into the kernel and "processor.max_cstate=1" as boot parameter, the problem has been successfully blocked.
Can you please tell us the output of cat /proc/acpi/processor/*/power
active state: C0 max_cstate: C1 bus master activity: 00000000 maximum allowed latency: 2000000000 usec states: C1: type[C1] promotion[--] demotion[--] latency[000] usage[00014918] duration[00000000000000000000] C2: type[C2] promotion[--] demotion[--] latency[090] usage[00000000] duration[00000000000000000000]
Created attachment 17709 [details] dmidecode's output (my current hardware) dmidecode's output attached
Created attachment 18682 [details] patch: disable c state on old cpus please try this patch.
Yes, tried it today with kernel 2.6.27.5 - and it seems to fix the problem. Thanks! :)
len, Shaohua and I agree that we should disable C-state on old processors, to avoid some potential risks, as C-state on these old processors is not well supported, what do you think?
with the unpatched kernel, please verify that booting with "idle=poll" makes the problem go away. please update to 2.6.27 so you can get access to the following new parameter, and please see if it also helps: "idle=halt" please attach the output from dmesg -s64000 from your unpatched kernel, and please attach one copy of the output from acpidump. re-opening this bug, b/c we can not disable C-states for all processors before p4.
also... if you're building that 2.6.27 kernel from scratch, please be sure to include CONFIG_CPU_IDLE=y please run powertop and report if you're getting an excessive number of wakeup's per seconds when idle, for this bug may be a duplicate of bug 10914 which was seen in 2.6.26 and mysteriously gone in 2.6.27.
Created attachment 19194 [details] ACPI dump of GA-6BX7 made on 2008-12-07
Created attachment 19195 [details] Dmesg made on 2008-12-07 Dmesg made on 2008-12-07, Gobolinux, kernel 2.6.27.8 compiled with gcc 4.3.x, glibc 2.8
Today I've compiled newest 2.6.27.8 kernel with no additional ACPI-related patches, but there wasn't any need to test "idle=poll" neither "idle=halt", because... the problem couldn't be noticed anymore. It's still the same hardware, with small differences: - newer SATA controller (SiI3512 instead of 3112) - new harddisk - the compilation have been done using newer gcc and newer glibc (2.8), Gobolinux distribution ...but AFAIK the differences listed couldn't have any (significant) influence? Is it possible, that some other improvements in new kernel made the problem disappear? Or perhaps it still persist - but just it isn't so much annoying, that it's harder to notice? If there's a need to make additional check, just let me know. More details: > if you're building that 2.6.27 kernel from scratch, > please be sure to include CONFIG_CPU_IDLE=y # shared options # # CONFIG_X86_ACPI_CPUFREQ_PROC_INTF is not set CONFIG_X86_SPEEDSTEP_LIB=m CONFIG_X86_SPEEDSTEP_RELAXED_CAP_CHECK=y CONFIG_CPU_IDLE=y CONFIG_CPU_IDLE_GOV_LADDER=y CONFIG_CPU_IDLE_GOV_MENU=y Last lines of lsmod: thermal 16284 0 button 7056 0 processor 34988 2 thermal > please run powertop and report if you're getting > an excessive number of wakeup's per seconds > when idle, for this bug may be a duplicate of bug 10914 > which was seen in 2.6.26 and mysteriously gone in 2.6.27. Output of powertop: PowerTOP version 1.9 (C) 2007 Intel Corporation Cn Avg residency P-states (frequencies) C0 (CPU aktywny) (99,9%) C1 0,0ms ( 0,0%) C2 0,0ms ( 0,1%) Wakeups-from-idle per second : 547,4 interval: 5,0s Top causes for wakeups: 45,3% ( 75,0) <interrupt> : bttv0, EMU10K1, radeon@pci:0000:01:00.0 13,0% ( 21,6) <interrupt> : ide0 12,1% ( 20,0) <kernel module> : bttv_input_init (bttv_input_timer) 8,3% ( 13,8) <interrupt> : PS/2 keyboard/mouse/touchpad 6,3% ( 10,4) gkrellm : schedule_timeout (process_timeout) 5,6% ( 9,2) X : do_setitimer (it_real_fn) Attached acpi-dump (file acpi20081207.dump.gz) and dmesg (dmesg20081207.gz).
(In reply to comment #24) > Today I've compiled newest 2.6.27.8 kernel with no additional ACPI-related > patches, but there wasn't any need to test "idle=poll" neither "idle=halt", > because... the problem couldn't be noticed anymore. > right, there are a lot of C state improvement recently, please re-open this bug if you can reproduce the problem in the latest kernel release. :)
To be quite sure, that the problem really is gone, I made today additional test, compiling the newest kernel within the older environment (Debian Etch), and made the test using older both HDD and controller. And indeed: the problem seems to be gone by now. Big thanks to all of you, gentlemen, for all your efforts! :)