When 'Intel(R) C-STATE Tech' is enabled in BIOS on Asus P5Q TURBO motherboard with 'Intel(R) Core(TM)2 Quad CPU Q9400 @ 2.66GHz' processor, after loading "processor" module reading from filesystem produces corrupt data and system totally freezes in several minutes. The bug is reproducible with Debian's 2.6.32, 2.6.31 and 2.6.30 amd64 kernels, but never happened on Debian's 2.6.26 kernel. The bug was originally reported at http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=569012 dmesg output when loading "processor" with enabled 'Intel(R) C-STATE Tech': [ 476.918852] ACPI: SSDT 00000000cff880d0 00235 (v01 DpgPmm P001Ist 00000011 INTL 20060113) [ 476.919311] ACPI: SSDT 00000000cff889d0 004B2 (v01 PmRef P001Cst 00003001 INTL 20060113) [ 476.919855] Monitor-Mwait will be used to enter C-1 state [ 476.919874] Monitor-Mwait will be used to enter C-2 state [ 476.919888] Monitor-Mwait will be used to enter C-3 state [ 476.919892] Marking TSC unstable due to TSC halts in idle [ 476.919982] processor LNXCPU:00: registered as cooling_device0 [ 476.920367] ACPI: SSDT 00000000cff88310 00235 (v01 DpgPmm P002Ist 00000012 INTL 20060113) [ 476.920707] ACPI: SSDT 00000000cff88e90 00085 (v01 PmRef P002Cst 00003000 INTL 20060113) [ 476.921274] Switching to clocksource hpet [ 476.921434] processor LNXCPU:01: registered as cooling_device1 [ 476.921824] ACPI: SSDT 00000000cff88550 00235 (v01 DpgPmm P003Ist 00000012 INTL 20060113) [ 476.922172] ACPI: SSDT 00000000cff88f20 00085 (v01 PmRef P003Cst 00003000 INTL 20060113) [ 476.922866] processor LNXCPU:02: registered as cooling_device2 [ 476.923252] ACPI: SSDT 00000000cff88790 00235 (v01 DpgPmm P004Ist 00000012 INTL 20060113) [ 476.923608] ACPI: SSDT 00000000cff88fb0 00085 (v01 PmRef P004Cst 00003000 INTL 20060113) [ 476.924277] processor LNXCPU:03: registered as cooling_device3 with disabled 'Intel(R) C-STATE Tech': [ 9.440094] ACPI: SSDT 00000000cff880d0 00235 (v01 DpgPmm P001Ist 00000011 INTL 20060113) [ 9.440625] processor LNXCPU:00: registered as cooling_device0 [ 9.441014] ACPI: SSDT 00000000cff88310 00235 (v01 DpgPmm P002Ist 00000012 INTL 20060113) [ 9.441511] processor LNXCPU:01: registered as cooling_device1 [ 9.441888] ACPI: SSDT 00000000cff88550 00235 (v01 DpgPmm P003Ist 00000012 INTL 20060113) [ 9.442394] processor LNXCPU:02: registered as cooling_device2 [ 9.442778] ACPI: SSDT 00000000cff88790 00235 (v01 DpgPmm P004Ist 00000012 INTL 20060113) [ 9.443276] processor LNXCPU:03: registered as cooling_device3
please attach the full dmesg output after loading the processor driver, with C-state enabled.
Created attachment 25142 [details] kern.log fragment with C-state enabled with linux 2.6.26 and 2.6.32
(In reply to comment #0) > When 'Intel(R) C-STATE Tech' is enabled in BIOS on Asus P5Q TURBO motherboard > with 'Intel(R) Core(TM)2 Quad CPU Q9400 @ 2.66GHz' processor, after > loading > "processor" module reading from filesystem produces corrupt data and system > totally freezes in several minutes. > > The bug is reproducible with Debian's 2.6.32, 2.6.31 and 2.6.30 amd64 > kernels, > but never happened on Debian's 2.6.26 kernel. > By reading the system log you attached, I think the kernel freezes in 2.6.26 kernel but doesn't freeze in 2.6.32. could you please make a double check?
(In reply to comment #3) > (In reply to comment #0) > > When 'Intel(R) C-STATE Tech' is enabled in BIOS on Asus P5Q TURBO > motherboard > > with 'Intel(R) Core(TM)2 Quad CPU Q9400 @ 2.66GHz' processor, after > loading > > "processor" module reading from filesystem produces corrupt data and system > > totally freezes in several minutes. > > > > The bug is reproducible with Debian's 2.6.32, 2.6.31 and 2.6.30 amd64 > kernels, > > but never happened on Debian's 2.6.26 kernel. > > > By reading the system log you attached, I think the kernel freezes in 2.6.26 > kernel but doesn't freeze in 2.6.32. > could you please make a double check? With 2.6.26 the system was working for several months, but it was running into http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=518643 (which takes days to weeks to reproduce, and has very different symptoms). Any attempt to run newer kernel (>=2.6.30) resulted in sudden total freeze (no reaction to anything, nothing in log/screen) in several minutes. The attached log fragment shows that 2.6.26 was shut down correctly (Kernel logging (proc) stopped) but doesn't show that for 2.6.32 as it wasn't. And I was able to reproduce file misreading and system freezing on 2.6.32 booting with init=/bin/bash and doing "modprobe processor" many times (no messages on the screen upon freezing either). (I won't be physically near that machine until Wednesday.)
(In reply to comment #4) > Any attempt to run newer > kernel (>=2.6.30) resulted in sudden total freeze (no reaction to anything, > nothing in log/screen) in several minutes. (That is, with C-state enabled in bios. With C-state disabled 2.6.32 has uptime of a week and no symptom of any problem.)
are you running with KVM on either the new or old kernels? please attach the .config for latest working and earliest failing kernels. If you could isolate the issue to a specific release between 2.6.26 adn 2.6.30, that might be helpful.
(In reply to comment #6) > are you running with KVM on either the new or old kernels? With KVM on both versions. But this bug is reproducible before kvm modules are loaded.
Created attachment 25214 [details] Working kernel config (2.6.26)
Created attachment 25215 [details] Failing kernel config (2.6.29)
okay, disabling c-states in the BIOS makes the latest kernel work. how about with c-states enabled in the BIOS, if you boot with processor.max_cstate=1 and if that works, then with 2. (the assumption is that 3 will do nothing and you will fail there). Please post the output from cd /sys/devices/system/cpu/cpu0/cpuidle grep . */* please attach the output from acpidump also, please attach the output from # acpidump -a 0xcff889d0 l 0x004B2 > acpidump.ssdt
When processor is loaded with max_cstate=1 everything seems to work fine. With max_cstate=2 the bug is reproducible (tested with Debian's 2.6.33-1~experimental.2). (I had to specify max_cstate parameter in modprobe call or via /etc/modprobe.d, as appending processor.max_cstate in grub didn't work.) Output of "grep . */*" in /sys/devices/system/cpu/cpu0/cpuidle: state0/desc:CPUIDLE CORE POLL IDLE state0/latency:0 state0/name:C0 state0/power:4294967295 state0/time:44780 state0/usage:22 state1/desc:ACPI FFH INTEL MWAIT 0x0 state1/latency:1 state1/name:C1 state1/power:1000 state1/time:92 state1/usage:11 state2/desc:ACPI FFH INTEL MWAIT 0x10 state2/latency:1 state2/name:C2 state2/power:500 state2/time:468253 state2/usage:4531 state3/desc:ACPI FFH INTEL MWAIT 0x30 state3/latency:57 state3/name:C3 state3/power:100 state3/time:432614274 state3/usage:347642
Created attachment 25338 [details] acpidump output Under 2.6.33 with C-state enabled. There were also Wrong checksum for OEMB Wrong checksum for OEMB! on stderr.
Created attachment 25339 [details] Output of "acpidump -a 0xcff889d0 -l 0x004B2" Under 2.6.33 with C-state enabled.
does the problem still exist in the latest upstream kernel?
please feel free to re-open it if the bug still exists in the latest upstream kernel.