Bug 14700

Summary: Broken system because of a bad ACPI commit
Product: ACPI Reporter: Petri Lehtinen (petri)
Component: Power-ProcessorAssignee: acpi_power-processor
Severity: normal CC: lenb, rui.zhang, yakui.zhao
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: 2.6.31 Tree: Mainline
Regression: Yes
Attachments: Don't disable ARB_DISABLE when the mode id is less than 0x0f

Description Petri Lehtinen 2009-11-27 12:40:55 UTC
After an upgrade to 2.6.31 kernel, the system is very unstable.
Sometimes it doesn't boot. If it does, the laptop keyboard doesn't
work and after a few minutes it hangs with ATA errors.

I bisected this problem down to the following commit:

commit ee1ca48fae7e575d5e399d4fdcfe0afc1212a64c
Author: Pallipadi, Venkatesh <venkatesh.pallipadi@intel.com>
Date:   Thu May 21 17:09:10 2009 -0700

    ACPI: Disable ARB_DISABLE on platforms where it is not needed

    ARB_DISABLE is a NOP on all of the recent Intel platforms.

    For such platforms, reduce contention on c3_lock
    by skipping the fake ARB_DISABLE.

    Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
    Signed-off-by: Len Brown <len.brown@intel.com>

With this commit reverted, the system works fine with 2.6.31.

$ cat /proc/cpuinfo
processor       : 0
vendor_id       : GenuineIntel
cpu family      : 6
model           : 14
model name      : Intel(R) Celeron(R) M CPU        410  @ 1.46GHz
stepping        : 8
cpu MHz         : 1463.194
cache size      : 1024 KB
fdiv_bug        : no
hlt_bug         : no
f00f_bug        : no
coma_bug        : no
fpu             : yes
fpu_exception   : yes
cpuid level     : 10
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic mtrr pge mca cmov clflush dts acpi mmx fxsr sse sse2 ss tm pbe nx constant_tsc up arch_perfmon bts pni monitor tm2 xtpr pdcm
bogomips        : 2926.38
clflush size    : 64
power management:

If I read the code correctly, in the commit message, "all of the
recent Intel platforms" seem to mean those with family == 6 and model
>= 14. My processor's model is 14 so could we have an
off-by-one error here?
Comment 1 ykzhao 2009-11-30 00:56:00 UTC
   Will you please try the following boot option and see whether the box can be booted correctly?
   a. processor.max_cstate=1
   b. idle=poll

Comment 2 ykzhao 2009-11-30 03:25:47 UTC
Created attachment 23973 [details]
Don't disable ARB_DISABLE when the mode id is less than 0x0f

Will you please try the latest kernel(2.6.32-rc7/rc8) and see whether the box can be booted correctly?

If it still can't be booted, please try the attached debug patch and see whether it can be booted correctly?

Comment 3 Petri Lehtinen 2009-11-30 19:54:42 UTC
With both 2.6.31 and 2.6.32-rc8 I had the following results:

 no extra kernel params  --> doesn't boot
 processor.max_cstate=1  --> works OK
 idle=poll               --> works OK

I'm unable to test with the patch right now, I'll do it later.
Comment 4 Petri Lehtinen 2009-12-01 20:36:01 UTC
2.6.32-rc8 plus the patch works fine.

BTW, the machine in question is Acer Travelmate 2440, just in case you need it for the commit message or something.
Comment 5 Zhang Rui 2009-12-04 02:27:37 UTC
is the patch in comment #2 an acceptable solution for upstream kernel?
if yes, please resend it to linux devel.
Comment 6 Len Brown 2009-12-16 05:42:42 UTC
commit 03a05ed1152944000151d57b71000de287a1eb02
Author: Zhao Yakui <yakui.zhao@intel.com>
Date:   Fri Dec 11 15:17:20 2009 +0800

    ACPI: Use the ARB_DISABLE for the CPU which model id is less than 0x0f.

is queued in the acpi tree for linux-2.6.33
Comment 7 Len Brown 2009-12-17 03:41:09 UTC
shipped in linux-2.6.33 before -rc1