i have a samsung n510 @nynet netbook starting in the 2.6.35 development cycle the system hangs occasionally, especially on boot This bug is present in 2.6.36 and 2.6.35.7 pressing the powerbutton or a key on the keyboard makes the system continue this bug is quite reliable reproduceable .. i did a bisection and ended up on following commit: 2671717265ae6e720a9ba5f13fbec3a718983b65 booting the system with "intel_idle.max_cstate=0" makes it not hang The effects are strongest when intel_idle.max_cstate=2 Mr Brown gave me some Homework to do, here it comes: failing .config .. attached output from lspci .. hopefully can be attached output from cat /proc/cpu .. hopefully can be attached output from acpidump .. hopefully can be attached output from dmidecode .. hopefully can be attached for a 2.6.36 boot with intel_idle.max_cstate=0 (and acpi_idle boot) output from grep . /sys/devices/system/cpu/cpuidle/* output from grep . /sys/devices/system/cpu/cpu*/cpuidle/*/* dmesg .. hopefully can be attached try intel_idle.max_cstate=1, include dmesg and increase the '1' until it fails My guess is that 1 will work, but some higher number will start failing. => max_cstate=2 fails the strongest ... max_cstate=4 has not such a strong effect without any other bootparams, try "nolapic_timer" => yes, works .. does not hang .. see (hopefully) attached dmesg ending in ".nolapic"
Created attachment 34632 [details] dmesg for 2.6.35.7 no maximum cstate
Created attachment 34642 [details] dmesg for 2.6.36 no maximum cstate
Created attachment 34652 [details] dmesg for 2.6.36 maximum cstate=0
Created attachment 34662 [details] dmesg for 2.6.36 maximum cstate=1
Created attachment 34672 [details] dmesg for 2.6.36 maximum cstate=2
Created attachment 34682 [details] dmesg for 2.6.36 maximum cstate=3
Created attachment 34692 [details] dmesg for 2.6.36 maximum cstate=4
Created attachment 34702 [details] dmesg for 2.6.36 nolapic_timer
Created attachment 34712 [details] .config for 2.6.36
Created attachment 34722 [details] lspci -vk for 2.6.36
Created attachment 34732 [details] cat /proc/cpuinfo
Created attachment 34742 [details] acpidump for 2.6.36
Created attachment 34752 [details] dmidecode dump
Created attachment 34762 [details] grep sysfiles for info about idle for max_cstate=0 output from grep . /sys/devices/system/cpu/cpuidle/* output from grep . /sys/devices/system/cpu/cpu*/cpuidle/*/*
> pressing the powerbutton > or a key on the keyboard makes the system continue does a single press get the system "un-stuck" and it runs normally from then on, or do you have to continue to give it button events to keep it from stalling again? > max_cstate=2 fails the strongest > ... max_cstate=4 has not such a strong effect Hmm, this may be because in max_cstate=2, we use C2 a lot, and the LAPIC timer is failing in C2. But if not limited to C2, we rarely use C2 and lapic_timer_relaible_states instructs us to not use that timer in C4 where we know it stops: intel_idle: lapic_timer_reliable_states 0x6 > without any other bootparams, try "nolapic_timer" > yes, works .. does not hang .. Okay, that is a "smoking gun".
Created attachment 34772 [details] patch vs 2.6.36 to avoid using the LAPIC timer in ATM-C2 Please test this patch using no cmdline parameters. Please show the output from output from grep . /sys/devices/system/cpu/cpu*/cpuidle/*/*
(In reply to comment #15) > > pressing the powerbutton > > or a key on the keyboard makes the system continue > > does a single press get the system "un-stuck" and it > runs normally from then on, or do you have to continue > to give it button events to keep it from stalling again? have to press it every time it stalls .. .. so even for shutdown i have to press a key .. > > max_cstate=2 fails the strongest > > ... max_cstate=4 has not such a strong effect > > Hmm, this may be because in max_cstate=2, we use > C2 a lot, and the LAPIC timer is failing in C2. > > But if not limited to C2, we rarely use C2 > and lapic_timer_relaible_states instructs us > to not use that timer in C4 where we know it stops: > > intel_idle: lapic_timer_reliable_states 0x6 > > > without any other bootparams, try "nolapic_timer" > > yes, works .. does not hang .. > > Okay, that is a "smoking gun". hopefully .. :)
(In reply to comment #16) > Created an attachment (id=34772) [details] > patch vs 2.6.36 to avoid using the LAPIC timer in ATM-C2 > > Please test this patch using no cmdline parameters. seems to work > Please show the output from > grep . /sys/devices/system/cpu/cpu*/cpuidle/*/* /sys/devices/system/cpu/cpu0/cpuidle/state0/desc:CPUIDLE CORE POLL IDLE /sys/devices/system/cpu/cpu0/cpuidle/state0/latency:0 /sys/devices/system/cpu/cpu0/cpuidle/state0/name:C0 /sys/devices/system/cpu/cpu0/cpuidle/state0/power:4294967295 /sys/devices/system/cpu/cpu0/cpuidle/state0/time:32 /sys/devices/system/cpu/cpu0/cpuidle/state0/usage:1 /sys/devices/system/cpu/cpu0/cpuidle/state1/desc:ACPI FFH INTEL MWAIT 0x0 /sys/devices/system/cpu/cpu0/cpuidle/state1/latency:1 /sys/devices/system/cpu/cpu0/cpuidle/state1/name:C1 /sys/devices/system/cpu/cpu0/cpuidle/state1/power:4294967294 /sys/devices/system/cpu/cpu0/cpuidle/state1/time:1213 /sys/devices/system/cpu/cpu0/cpuidle/state1/usage:15 /sys/devices/system/cpu/cpu0/cpuidle/state2/desc:ACPI FFH INTEL MWAIT 0x10 /sys/devices/system/cpu/cpu0/cpuidle/state2/latency:1 /sys/devices/system/cpu/cpu0/cpuidle/state2/name:C2 /sys/devices/system/cpu/cpu0/cpuidle/state2/power:4294967293 /sys/devices/system/cpu/cpu0/cpuidle/state2/time:179795525 /sys/devices/system/cpu/cpu0/cpuidle/state2/usage:2205 the patch is disabling lapic_timer for all ATOM platforms .. or just the nVidia MCP7 ?
The patch is disabling lapic_timer for C2 on all Atom. (note, this is what acpi_idle has been doing all along) This isn't how the chip is designed to be hooked up, but there is an additional sighting over at bug 20172 where "nolapic_timer" prevents a hang on an Atom with an Intel chip-set -- so maybe the issue is more widespread than just this nvidia chipset.
shipped in linux-2.6.37-rc1 closed commit c25d29952b2a8c9aaf00e081c9162a0e383030cd Author: Len Brown <len.brown@intel.com> Date: Sat Oct 23 23:25:53 2010 -0400 intel_idle: do not use the LAPIC timer for ATOM C2