Bug 15296 - memory corruption and system freezing after loading processor module - regression in 2.6.29, 2.6.26 works
Summary: memory corruption and system freezing after loading processor module - regres...
Status: REJECTED INSUFFICIENT_DATA
Alias: None
Product: ACPI
Classification: Unclassified
Component: Power-Processor (show other bugs)
Hardware: All Linux
: P1 normal
Assignee: Venkatesh Pallipadi
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2010-02-14 09:07 UTC by Stepan Golosunov
Modified: 2010-06-30 08:40 UTC (History)
3 users (show)

See Also:
Kernel Version: 2.6.32
Subsystem:
Regression: Yes
Bisected commit-id:


Attachments
kern.log fragment with C-state enabled with linux 2.6.26 and 2.6.32 (144.63 KB, text/plain)
2010-02-21 15:14 UTC, Stepan Golosunov
Details
Working kernel config (2.6.26) (83.66 KB, text/plain)
2010-02-25 12:13 UTC, Stepan Golosunov
Details
Failing kernel config (2.6.29) (93.63 KB, text/plain)
2010-02-25 12:15 UTC, Stepan Golosunov
Details
acpidump output (220.08 KB, text/plain)
2010-03-03 14:14 UTC, Stepan Golosunov
Details
Output of "acpidump -a 0xcff889d0 -l 0x004B2" (1.17 KB, application/octet-stream)
2010-03-03 14:17 UTC, Stepan Golosunov
Details

Description Stepan Golosunov 2010-02-14 09:07:09 UTC
When 'Intel(R) C-STATE Tech' is enabled in BIOS on Asus P5Q TURBO motherboard with 'Intel(R) Core(TM)2 Quad CPU    Q9400  @ 2.66GHz' processor, after loading "processor" module reading from filesystem produces corrupt data and system totally freezes in several minutes.

The bug is reproducible with Debian's 2.6.32, 2.6.31 and 2.6.30 amd64 kernels,
but never happened on Debian's 2.6.26 kernel.

The bug was originally reported at http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=569012

dmesg output when loading "processor" with enabled 'Intel(R) C-STATE Tech':
[  476.918852] ACPI: SSDT 00000000cff880d0 00235 (v01 DpgPmm  P001Ist 00000011 INTL 20060113)
[  476.919311] ACPI: SSDT 00000000cff889d0 004B2 (v01  PmRef  P001Cst 00003001 INTL 20060113)
[  476.919855] Monitor-Mwait will be used to enter C-1 state
[  476.919874] Monitor-Mwait will be used to enter C-2 state
[  476.919888] Monitor-Mwait will be used to enter C-3 state
[  476.919892] Marking TSC unstable due to TSC halts in idle
[  476.919982] processor LNXCPU:00: registered as cooling_device0
[  476.920367] ACPI: SSDT 00000000cff88310 00235 (v01 DpgPmm  P002Ist 00000012 INTL 20060113)
[  476.920707] ACPI: SSDT 00000000cff88e90 00085 (v01  PmRef  P002Cst 00003000 INTL 20060113)
[  476.921274] Switching to clocksource hpet
[  476.921434] processor LNXCPU:01: registered as cooling_device1
[  476.921824] ACPI: SSDT 00000000cff88550 00235 (v01 DpgPmm  P003Ist 00000012 INTL 20060113)
[  476.922172] ACPI: SSDT 00000000cff88f20 00085 (v01  PmRef  P003Cst 00003000 INTL 20060113)
[  476.922866] processor LNXCPU:02: registered as cooling_device2
[  476.923252] ACPI: SSDT 00000000cff88790 00235 (v01 DpgPmm  P004Ist 00000012 INTL 20060113)
[  476.923608] ACPI: SSDT 00000000cff88fb0 00085 (v01  PmRef  P004Cst 00003000 INTL 20060113)
[  476.924277] processor LNXCPU:03: registered as cooling_device3

with disabled 'Intel(R) C-STATE Tech':
[    9.440094] ACPI: SSDT 00000000cff880d0 00235 (v01 DpgPmm  P001Ist 00000011 INTL 20060113)
[    9.440625] processor LNXCPU:00: registered as cooling_device0
[    9.441014] ACPI: SSDT 00000000cff88310 00235 (v01 DpgPmm  P002Ist 00000012 INTL 20060113)
[    9.441511] processor LNXCPU:01: registered as cooling_device1
[    9.441888] ACPI: SSDT 00000000cff88550 00235 (v01 DpgPmm  P003Ist 00000012 INTL 20060113)
[    9.442394] processor LNXCPU:02: registered as cooling_device2
[    9.442778] ACPI: SSDT 00000000cff88790 00235 (v01 DpgPmm  P004Ist 00000012 INTL 20060113)
[    9.443276] processor LNXCPU:03: registered as cooling_device3
Comment 1 Zhang Rui 2010-02-21 09:18:06 UTC
please attach the full dmesg output after loading the processor driver, with C-state enabled.
Comment 2 Stepan Golosunov 2010-02-21 15:14:02 UTC
Created attachment 25142 [details]
kern.log fragment with C-state enabled with linux 2.6.26 and 2.6.32
Comment 3 Zhang Rui 2010-02-22 05:45:24 UTC
(In reply to comment #0)
> When 'Intel(R) C-STATE Tech' is enabled in BIOS on Asus P5Q TURBO motherboard
> with 'Intel(R) Core(TM)2 Quad CPU    Q9400  @ 2.66GHz' processor, after
> loading
> "processor" module reading from filesystem produces corrupt data and system
> totally freezes in several minutes.
> 
> The bug is reproducible with Debian's 2.6.32, 2.6.31 and 2.6.30 amd64
> kernels,
> but never happened on Debian's 2.6.26 kernel.
> 
By reading the system log you attached, I think the kernel freezes in 2.6.26 kernel but doesn't freeze in 2.6.32.
could you please make a double check?
Comment 4 Stepan Golosunov 2010-02-22 08:50:34 UTC
(In reply to comment #3)
> (In reply to comment #0)
> > When 'Intel(R) C-STATE Tech' is enabled in BIOS on Asus P5Q TURBO
> motherboard
> > with 'Intel(R) Core(TM)2 Quad CPU    Q9400  @ 2.66GHz' processor, after
> loading
> > "processor" module reading from filesystem produces corrupt data and system
> > totally freezes in several minutes.
> > 
> > The bug is reproducible with Debian's 2.6.32, 2.6.31 and 2.6.30 amd64
> kernels,
> > but never happened on Debian's 2.6.26 kernel.
> > 
> By reading the system log you attached, I think the kernel freezes in 2.6.26
> kernel but doesn't freeze in 2.6.32.
> could you please make a double check?

With 2.6.26 the system was working for several months, but it was running into http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=518643 (which takes days to weeks to reproduce, and has very different symptoms). Any attempt to run newer kernel (>=2.6.30) resulted in sudden total freeze (no reaction to anything, nothing in log/screen) in several minutes.

The attached log fragment shows that 2.6.26 was shut down correctly (Kernel logging (proc) stopped) but doesn't show that for 2.6.32 as it wasn't.

And I was able to reproduce file misreading and system freezing on 2.6.32 booting with init=/bin/bash and doing "modprobe processor" many times (no messages on the screen upon freezing either).

(I won't be physically near that machine until Wednesday.)
Comment 5 Stepan Golosunov 2010-02-22 09:11:37 UTC
(In reply to comment #4)
> Any attempt to run newer
> kernel (>=2.6.30) resulted in sudden total freeze (no reaction to anything,
> nothing in log/screen) in several minutes.

(That is, with C-state enabled in bios. With C-state disabled 2.6.32 has uptime of a week and no symptom of any problem.)
Comment 6 Len Brown 2010-02-23 03:00:06 UTC
are you running with KVM on either the new or old kernels?
please attach the .config for latest working and earliest failing kernels.
If you could isolate the issue to a specific release between 2.6.26
adn 2.6.30, that might be helpful.
Comment 7 Stepan Golosunov 2010-02-23 08:46:52 UTC
(In reply to comment #6)
> are you running with KVM on either the new or old kernels?

With KVM on both versions. But this bug is reproducible before kvm modules are loaded.
Comment 8 Stepan Golosunov 2010-02-25 12:13:13 UTC
Created attachment 25214 [details]
Working kernel config (2.6.26)
Comment 9 Stepan Golosunov 2010-02-25 12:15:42 UTC
Created attachment 25215 [details]
Failing kernel config (2.6.29)
Comment 10 Len Brown 2010-03-02 03:04:17 UTC
okay, disabling c-states in the BIOS makes the latest kernel work.
how about with c-states enabled in the BIOS, if you boot with
processor.max_cstate=1
and if that works, then with 2.
(the assumption is that 3 will do nothing and you will fail there).
Please post the output from 

cd /sys/devices/system/cpu/cpu0/cpuidle
grep . */*

please attach the output from acpidump

also, please attach the output from

# acpidump -a  0xcff889d0 l 0x004B2 > acpidump.ssdt
Comment 11 Stepan Golosunov 2010-03-03 14:06:19 UTC
When processor is loaded with max_cstate=1 everything seems to work fine.
With max_cstate=2 the bug is reproducible (tested with Debian's 2.6.33-1~experimental.2).
(I had to specify max_cstate parameter in modprobe call or via /etc/modprobe.d, as appending processor.max_cstate in grub didn't work.)

Output of "grep . */*" in /sys/devices/system/cpu/cpu0/cpuidle:

state0/desc:CPUIDLE CORE POLL IDLE
state0/latency:0
state0/name:C0
state0/power:4294967295
state0/time:44780
state0/usage:22
state1/desc:ACPI FFH INTEL MWAIT 0x0
state1/latency:1
state1/name:C1
state1/power:1000
state1/time:92
state1/usage:11
state2/desc:ACPI FFH INTEL MWAIT 0x10
state2/latency:1
state2/name:C2
state2/power:500
state2/time:468253
state2/usage:4531
state3/desc:ACPI FFH INTEL MWAIT 0x30
state3/latency:57
state3/name:C3
state3/power:100
state3/time:432614274
state3/usage:347642
Comment 12 Stepan Golosunov 2010-03-03 14:14:53 UTC
Created attachment 25338 [details]
acpidump output

Under 2.6.33 with C-state enabled.

There were also
Wrong checksum for OEMB
Wrong checksum for OEMB!

on stderr.
Comment 13 Stepan Golosunov 2010-03-03 14:17:22 UTC
Created attachment 25339 [details]
Output of "acpidump -a  0xcff889d0 -l 0x004B2"

Under 2.6.33 with C-state enabled.
Comment 14 Zhang Rui 2010-06-29 05:30:30 UTC
does the problem still exist in the latest upstream kernel?
Comment 15 Zhang Rui 2010-06-30 08:40:00 UTC
please feel free to re-open it if the bug still exists in the latest upstream kernel.

Note You need to log in before you can comment on or make changes to this bug.