Bug 38262

Summary: Kernels 2.6.39, 2.6.39.1, 2.6.39.2 do not start on machine with HT
Product: ACPI Reporter: WZab (wzab)
Component: Config-ProcessorsAssignee: Jonathan Nieder (jrnieder)
Status: CLOSED PATCH_ALREADY_AVAILABLE    
Severity: high CC: jrnieder, pinkbyte, reaper.thresher, rui.zhang
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: 2.6.39 (both 2.6.39.1 , 2.6.39.2), 3.0.0, 3.0.1, 3.0.3, 3.3.1 Subsystem:
Regression: Yes Bisected commit-id:
Bug Depends on:    
Bug Blocks: 32012    
Attachments: Configuration of my 3.0.3 kernel
ACPI dump (with HT disabled in the BIOS)
acpidump with HT enabled
Result of acpidump from my affected system, booted with HT on (and with nocst=1 to allow booting)
patch from acpica tree

Description WZab 2011-06-25 18:29:33 UTC
The machine with HT enabled processor, which perfectly worked with kernels up to 2.6.37.xxx started to behave instable with 2.6.38.xxx and failed to boot with 
2.6.39.xxx
The detailed information about the problems was reported to LKML and is available in the following threads (kernel configs and detailed logs for booting with different BIOS settings and kernel parameters are provided as attachments to LKML messages):

2.6.39.1: http://groups.google.com/group/kernelarchive/browse_thread/thread/adaeb363d2eadf24

2.6.39.2: http://groups.google.com/group/kernelarchive/browse_thread/thread/b5fdfedd74668fbc

I have also tried the standard Debian distribution kernel, which boots correctly right after power-up, but fails to start after reboot.
Detailed information is available here:
http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=631597

After submission of the Debian bug report I received suggestion to fill a bug report here.
Comment 1 WZab 2011-06-27 07:58:56 UTC
It seems, that the problem is associated with SMT (HT) handling on x86 platform.
Maybe some resources are allocated "per core" instaed of "per thread", and this leads to conflict?
Unfortunately I was not able to isolate the main problem (even though I've switched off most debug mechanisms in the kernel).
Bugs reported in the attached logs in quoted threads are the secondary results of the data corruption occuring somewhere at the start of the system...
Comment 2 reaper.thresher 2011-06-30 21:37:51 UTC
I have the same issue: the only way to boot is to use acpi=off or disable HT in the BIOS (acpi=ht still hangs my system with HT enabled). The issue is still present in 3.0rc5.
Comment 3 WZab 2011-07-27 11:53:28 UTC
I have just downloaded and compiled stable 3.0 version of the Linux kernel.
Problem still persists in 3.0 stable.
Comment 4 WZab 2011-07-27 12:16:22 UTC
I have also found similar bug report: https://bbs.archlinux.org/viewtopic.php?id=121367 (the subject says "solved", but in fact the problem was only worked around by switching HT off).

So it seems, that problem is more common, but probably specific for Intel 865 motherboards...
Comment 5 WZab 2011-08-19 08:25:46 UTC
The problem still persists in kernel 3.0.1
Comment 6 reaper.thresher 2011-08-19 12:17:42 UTC
I can confirm: the issue is still present in 3.0.3
Comment 7 WZab 2011-08-22 06:15:38 UTC
Yesterday I have compiled version 3.0.3. I have introduced one change in the configuration: I have switched off the ATA/ATAPI/MFM/RLL support (DEPRECATED) option.
After the compilation, I have rebooted the machine with HT off - it worked correctly. Later I have rebooted the machine with HT on in the BIOS - it still 
worked  correctly. I have rebooted machine two times more due to power surges.
It still booted and worked correctly. 
Today in the morning I tried to boot this machine with HT on - and it failed to boot. I had to switch HT off in the BIOS to get it working again.

I have analyzed logs from yesterday's sessions, and I have found only one bug:

Report in /var/log/messages:
[...]
Aug 21 23:53:20 wzab kernel: NET: Registered protocol family 1
Aug 21 23:53:20 wzab kernel: modprobe[1063]: segfault at ea90 ip 0000ea90 sp c0474ee4 error 4
Aug 21 23:53:20 wzab kernel: no vm86_info: BAD
Aug 21 23:53:20 wzab kernel: note: modprobe[1063] exited with preempt_count 5
Aug 21 23:53:20 wzab kernel: paging request at b8530bd8
Aug 21 23:53:20 wzab kernel: *pde = 00000000 
Aug 21 23:53:20 wzab kernel: Modules linked in: processor(+) unix
Aug 21 23:53:20 wzab kernel: 
Aug 21 23:53:20 wzab kernel: Pid: 1065, comm: udevd Not tainted 3.0.3 #1                  /D865GBF                        
Aug 21 23:53:20 wzab kernel: EIP: 0060:[<c013328d>] EFLAGS: 00010002 CPU: 0
Aug 21 23:53:20 wzab kernel: EIP is at vprintk+0x12d/0x410
Aug 21 23:53:20 wzab kernel: EAX: c05100f8 EBX: 00000003 ECX: 00000001 EDX: c0105716
Aug 21 23:53:20 wzab kernel: ESI: c058ed80 EDI: c01030d2 EBP: 00000000 ESP: f5a47bf8
Aug 21 23:53:20 wzab kernel:  DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
Aug 21 23:53:20 wzab kernel:  f5a47c80 f611ea80 f6209548 f611ea80 f5a47c22 00000000 c0103090 c03f1eae
Aug 21 23:53:20 wzab kernel:  c0105716 00001b74 00000000 00000000 f5929170 00000060 00000000 c010007b
Aug 21 23:53:20 wzab kernel:  0000007b 000000d8 c0103090 ffffffff c03f26b1 00000060 00010012 00000000
Aug 21 23:53:20 wzab kernel:  [<c0103090>] ? do_alignment_check+0xa0/0xa0
Aug 21 23:53:20 wzab kernel:  [<c03f1eae>] ? error_code+0x5a/0x60
Aug 21 23:53:20 wzab kernel:  [<c0105716>] ? oops_begin+0x6/0x90
Aug 21 23:53:20 wzab kernel:  [<c0103090>] ? do_alignment_check+0xa0/0xa0
Aug 21 23:53:20 wzab kernel:  [<c03f26b1>] ? __entry_text_end+0x74/0x8f
Aug 21 23:53:20 wzab kernel:  [<c0105716>] ? oops_begin+0x6/0x90
Aug 21 23:53:20 wzab kernel:  [<c03ed20a>] ? printk+0x17/0x1b
Aug 21 23:53:20 wzab kernel:  [<c03ecb25>] ? no_context+0x74/0x137
Aug 21 23:53:20 wzab kernel:  [<c03ecd53>] ? bad_area+0x30/0x35
Aug 21 23:53:20 wzab kernel:  [<c011faa0>] ? vmalloc_sync_all+0xf0/0xf0
Aug 21 23:53:20 wzab kernel:  [<c011fe0b>] ? do_page_fault+0x36b/0x3d0
Aug 21 23:53:20 wzab dhcpd: Internet Systems Consortium DHCP Server 4.1.1-P1
Aug 21 23:53:20 wzab kernel:  [<c011fe0b>] ? do_page_fault+0x36b/0x3d0
Aug 21 23:53:20 wzab dhcpd: Copyright 2004-2010 Internet Systems Consortium.
Aug 21 23:53:20 wzab kernel:  [<c03f1589>] ? _raw_spin_lock_irqsave+0x19/0x40
Aug 21 23:53:20 wzab dhcpd: All rights reserved.
Aug 21 23:53:20 wzab kernel:  [<c02611e6>] ? number.isra.3+0x316/0x330
Aug 21 23:53:20 wzab dhcpd: For info, please visit https://www.isc.org/software/dhcp/
Aug 21 23:53:20 wzab kernel:  [<c012ca18>] ? get_parent_ip+0x8/0x20
Aug 21 23:53:20 wzab kernel:  [<c031ca18>] ? ata_scsi_verify_xlat+0x208/0x3a0
Aug 21 23:53:20 wzab kernel:  [<c011faa0>] ? vmalloc_sync_all+0xf0/0xf0
Aug 21 23:53:20 wzab kernel:  [<c011faa0>] ? vmalloc_sync_all+0xf0/0xf0
Aug 21 23:53:20 wzab kernel:  [<c03f1eae>] ? error_code+0x5a/0x60
Aug 21 23:53:20 wzab kernel:  [<c011faa0>] ? vmalloc_sync_all+0xf0/0xf0
Aug 21 23:53:20 wzab kernel:  [<c0108b73>] ? native_sched_clock+0x23/0x80
Aug 21 23:53:20 wzab kernel:  [<c012387a>] ? resched_task+0x3a/0x60
Aug 21 23:53:20 wzab kernel:  [<c0125d83>] ? check_preempt_wakeup+0x133/0x1c0
Aug 21 23:53:20 wzab kernel:  [<c01272aa>] ? check_preempt_curr+0x6a/0x80
Aug 21 23:53:20 wzab kernel:  [<c0127313>] ? ttwu_do_wakeup+0x13/0xc0
Aug 21 23:53:20 wzab kernel:  [<c03f13ce>] ? _raw_spin_unlock_irqrestore+0xe/0x30
Aug 21 23:53:20 wzab kernel:  [<c012f05b>] ? try_to_wake_up+0x17b/0x200
Aug 21 23:53:20 wzab kernel:  [<c0259880>] ? cpumask_next_and+0x20/0x30
Aug 21 23:53:20 wzab kernel:  [<c01272aa>] ? check_preempt_curr+0x6a/0x80
Aug 21 23:53:20 wzab kernel:  [<c0173912>] ? __perf_event_task_sched_out+0x32/0x250
Aug 21 23:53:20 wzab kernel:  [<c0126e6b>] ? pick_next_task_fair+0x8b/0xe0
Aug 21 23:53:20 wzab kernel:  [<c02570ee>] ? __cfq_exit_single_io_context+0x6e/0xb0
Aug 21 23:53:20 wzab kernel:  [<c0135e1b>] ? do_exit+0x49b/0x6d0
Aug 21 23:53:20 wzab kernel:  [<c0105716>] ? oops_begin+0x6/0x90
Aug 21 23:53:20 wzab kernel:  [<c011da3c>] ? save_v86_state+0x12c/0x170
Aug 21 23:53:20 wzab kernel:  [<c0105716>] ? oops_begin+0x6/0x90
Aug 21 23:53:20 wzab kernel:  [<c011faa0>] ? vmalloc_sync_all+0xf0/0xf0
Aug 21 23:53:20 wzab kernel:  [<c03f1a7e>] ? work_notifysig_v86+0x6/0x18
Aug 21 23:53:20 wzab kernel:  [<c0105716>] ? oops_begin+0x6/0x90
Aug 21 23:53:20 wzab kernel:  [<c011faa0>] ? vmalloc_sync_all+0xf0/0xf0
Aug 21 23:53:20 wzab kernel: ---[ end trace 2b8e51024c1fe4c5 ]---
Aug 21 23:53:20 wzab kernel: note: udevd[1065] exited with preempt_count 3
Aug 21 23:53:20 wzab kernel: pci_hotplug: PCI Hot Plug PCI Core version: 0.5
Aug 21 23:53:20 wzab kernel: agpgart-intel 0000:00:00.0: Intel 865 Chipset
Aug 21 23:53:20 wzab kernel: agpgart-intel 0000:00:00.0: detected gtt size: 131072K total, 131072K mappable
Aug 21 23:53:20 wzab kernel: agpgart-intel 0000:00:00.0: detected 16384K stolen memory
Aug 21 23:53:20 wzab kernel: agpgart-intel 0000:00:00.0: AGP aperture is 128M @ 0xf0000000
[...]

report in /var/log/syslog:
[...]
Aug 21 23:53:20 wzab kernel: Freeing unused kernel memory: 416k freed
Aug 21 23:53:20 wzab kernel: NET: Registered protocol family 1
Aug 21 23:53:20 wzab kernel: ACPI: acpi_idle registered with cpuidle
Aug 21 23:53:20 wzab kernel: BUG: unable to handle kernel 
Aug 21 23:53:20 wzab kernel: BUG: unable to handle kernel 
Aug 21 23:53:20 wzab kernel: modprobe[1063]: segfault at ea90 ip 0000ea90 sp c0474ee4 error 4
Aug 21 23:53:20 wzab kernel: no vm86_info: BAD
Aug 21 23:53:20 wzab kernel: note: modprobe[1063] exited with preempt_count 5
Aug 21 23:53:20 wzab kernel: paging request at b8530bd8
Aug 21 23:53:20 wzab kernel: IP: [<c013328d>] vprintk+0x12d/0x410
Aug 21 23:53:20 wzab kernel: *pde = 00000000 
Aug 21 23:53:20 wzab kernel: Oops: 0000 [#1] PREEMPT SMP 
Aug 21 23:53:20 wzab kernel: Modules linked in: processor(+) unix
Aug 21 23:53:20 wzab kernel: 
Aug 21 23:53:20 wzab kernel: Pid: 1065, comm: udevd Not tainted 3.0.3 #1                  /D865GBF                        
Aug 21 23:53:20 wzab kernel: EIP: 0060:[<c013328d>] EFLAGS: 00010002 CPU: 0
Aug 21 23:53:20 wzab kernel: EIP is at vprintk+0x12d/0x410
Aug 21 23:53:20 wzab kernel: EAX: c05100f8 EBX: 00000003 ECX: 00000001 EDX: c0105716
Aug 21 23:53:20 wzab kernel: ESI: c058ed80 EDI: c01030d2 EBP: 00000000 ESP: f5a47bf8
Aug 21 23:53:20 wzab kernel:  DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
Aug 21 23:53:20 wzab kernel: Process udevd (pid: 1065, ti=f5a46000 task=f606ea90 task.ti=f5a4c000)
Aug 21 23:53:20 wzab kernel: Stack:
Aug 21 23:53:20 wzab kernel:  f5a47c80 f611ea80 f6209548 f611ea80 f5a47c22 00000000 c0103090 c03f1eae
Aug 21 23:53:20 wzab kernel:  c0105716 00001b74 00000000 00000000 f5929170 00000060 00000000 c010007b
Aug 21 23:53:20 wzab kernel:  0000007b 000000d8 c0103090 ffffffff c03f26b1 00000060 00010012 00000000
Aug 21 23:53:20 wzab kernel: Call Trace:
Aug 21 23:53:20 wzab kernel:  [<c0103090>] ? do_alignment_check+0xa0/0xa0
Aug 21 23:53:20 wzab kernel:  [<c03f1eae>] ? error_code+0x5a/0x60
Aug 21 23:53:20 wzab kernel:  [<c0105716>] ? oops_begin+0x6/0x90
Aug 21 23:53:20 wzab kernel:  [<c0103090>] ? do_alignment_check+0xa0/0xa0
Aug 21 23:53:20 wzab kernel:  [<c03f26b1>] ? __entry_text_end+0x74/0x8f
Aug 21 23:53:20 wzab kernel:  [<c0105716>] ? oops_begin+0x6/0x90
Aug 21 23:53:20 wzab kernel:  [<c03ed20a>] ? printk+0x17/0x1b
Aug 21 23:53:20 wzab kernel:  [<c03ecb25>] ? no_context+0x74/0x137
Aug 21 23:53:20 wzab kernel:  [<c03ecd53>] ? bad_area+0x30/0x35
Aug 21 23:53:20 wzab kernel:  [<c011faa0>] ? vmalloc_sync_all+0xf0/0xf0
Aug 21 23:53:20 wzab kernel:  [<c011fe0b>] ? do_page_fault+0x36b/0x3d0
Aug 21 23:53:20 wzab dhcpd: Internet Systems Consortium DHCP Server 4.1.1-P1
Aug 21 23:53:20 wzab kernel:  [<c011fe0b>] ? do_page_fault+0x36b/0x3d0
Aug 21 23:53:20 wzab dhcpd: Copyright 2004-2010 Internet Systems Consortium.
Aug 21 23:53:20 wzab kernel:  [<c03f1589>] ? _raw_spin_lock_irqsave+0x19/0x40
Aug 21 23:53:20 wzab dhcpd: All rights reserved.
Aug 21 23:53:20 wzab kernel:  [<c02611e6>] ? number.isra.3+0x316/0x330
Aug 21 23:53:20 wzab dhcpd: For info, please visit https://www.isc.org/software/dhcp/
Aug 21 23:53:20 wzab kernel:  [<c012ca18>] ? get_parent_ip+0x8/0x20
Aug 21 23:53:20 wzab kernel:  [<c031ca18>] ? ata_scsi_verify_xlat+0x208/0x3a0
Aug 21 23:53:20 wzab kernel:  [<c011faa0>] ? vmalloc_sync_all+0xf0/0xf0
Aug 21 23:53:20 wzab kernel:  [<c011faa0>] ? vmalloc_sync_all+0xf0/0xf0
Aug 21 23:53:20 wzab kernel:  [<c03f1eae>] ? error_code+0x5a/0x60
Aug 21 23:53:20 wzab kernel:  [<c011faa0>] ? vmalloc_sync_all+0xf0/0xf0
Aug 21 23:53:20 wzab kernel:  [<c0108b73>] ? native_sched_clock+0x23/0x80
Aug 21 23:53:20 wzab kernel:  [<c012387a>] ? resched_task+0x3a/0x60
Aug 21 23:53:20 wzab kernel:  [<c0125d83>] ? check_preempt_wakeup+0x133/0x1c0
Aug 21 23:53:20 wzab kernel:  [<c01272aa>] ? check_preempt_curr+0x6a/0x80
Aug 21 23:53:20 wzab kernel:  [<c0127313>] ? ttwu_do_wakeup+0x13/0xc0
Aug 21 23:53:20 wzab kernel:  [<c03f13ce>] ? _raw_spin_unlock_irqrestore+0xe/0x30
Aug 21 23:53:20 wzab kernel:  [<c012f05b>] ? try_to_wake_up+0x17b/0x200
Aug 21 23:53:20 wzab kernel:  [<c0259880>] ? cpumask_next_and+0x20/0x30
Aug 21 23:53:20 wzab kernel:  [<c01272aa>] ? check_preempt_curr+0x6a/0x80
Aug 21 23:53:20 wzab kernel:  [<c0173912>] ? __perf_event_task_sched_out+0x32/0x250
Aug 21 23:53:20 wzab kernel:  [<c0126e6b>] ? pick_next_task_fair+0x8b/0xe0
Aug 21 23:53:20 wzab kernel:  [<c02570ee>] ? __cfq_exit_single_io_context+0x6e/0xb0
Aug 21 23:53:20 wzab kernel:  [<c0135e1b>] ? do_exit+0x49b/0x6d0
Aug 21 23:53:20 wzab kernel:  [<c0105716>] ? oops_begin+0x6/0x90
Aug 21 23:53:20 wzab kernel:  [<c011da3c>] ? save_v86_state+0x12c/0x170
Aug 21 23:53:20 wzab kernel:  [<c0105716>] ? oops_begin+0x6/0x90
Aug 21 23:53:20 wzab kernel:  [<c011faa0>] ? vmalloc_sync_all+0xf0/0xf0
Aug 21 23:53:20 wzab kernel:  [<c03f1a7e>] ? work_notifysig_v86+0x6/0x18
Aug 21 23:53:20 wzab kernel:  [<c0105716>] ? oops_begin+0x6/0x90
Aug 21 23:53:20 wzab kernel:  [<c011faa0>] ? vmalloc_sync_all+0xf0/0xf0
Aug 21 23:53:20 wzab kernel: Code: 0f be c0 e8 26 fb ff ff 80 3e 0a 74 79 83 c6 01 80 3e 00 75 99 e8 44 fc ff ff 85 c0 0f 84 70 01 00 00 a1 68 5f 3f c0 8b 54 24 20 <0f> a3 10 19 c0 85 c0 75 23 a1 28 ad 58 c0 85 c0 75 14 e9 3c 01 
Aug 21 23:53:20 wzab kernel: EIP: [<c013328d>] vprintk+0x12d/0x410 SS:ESP 0068:f5a47bf8
Aug 21 23:53:20 wzab kernel: CR2: 00000000b8530bd8
Aug 21 23:53:20 wzab kernel: ---[ end trace 2b8e51024c1fe4c5 ]---
Aug 21 23:53:20 wzab kernel: note: udevd[1065] exited with preempt_count 3
Aug 21 23:53:20 wzab kernel: pci_hotplug: PCI Hot Plug PCI Core version: 0.5
Aug 21 23:53:20 wzab kernel: agpgart-intel 0000:00:00.0: Intel 865 Chipset
Aug 21 23:53:20 wzab kernel: agpgart-intel 0000:00:00.0: detected gtt size: 131072K total, 131072K mappable
Aug 21 23:53:20 wzab kernel: agpgart-intel 0000:00:00.0: detected 16384K stolen memory
Aug 21 23:53:20 wzab kernel: agpgart-intel 0000:00:00.0: AGP aperture is 128M @ 0xf0000000
Aug 21 23:53:20 wzab kernel: shpchp: Standard Hot Plug PCI Controller Driver version: 0.4
[...]
Comment 8 WZab 2011-08-22 06:18:19 UTC
Created attachment 69602 [details]
Configuration of my 3.0.3 kernel 

This is the configuration of my 3.0.3 kernel which initially worked correctly with HT on, but on the next day (?!) stopped to work with HT,
Comment 9 Sergey Popov 2011-08-24 17:58:02 UTC
Same problem on my motherboard - Intel D865GBF. Last kernel, which HyperThreading is can be used on my MB is 2.6.38.8. i do not check 2.6.39 kernel, but 2.6.39.1 brings panic with a big(two screens or more) log. This panic can be avoided by using "acpi=off" or "nolapic" options, appended to kernel line in bootloader. But when they are appended, HyperThreading is not available :(

P.S. Sorry for bad english
Comment 10 WZab 2011-08-30 22:16:14 UTC
Looking for possible solutions of the reported problem, I have found: https://bugzilla.redhat.com/show_bug.cgi?id=727865 
Following the suggestions discussed int his thread, I have tried to boot the kernel with "processor.nocst=1" parameter and it worked!

Certainly it is only a dirty workaround, but finally I can have my system working with 3.0.3 kernel and with HT on.
Comment 11 WZab 2011-08-31 07:13:49 UTC
What's interesting, this problem was also reported earlier in 2007: http://www.mail-archive.com/acpi-bugzilla@lists.sourceforge.net/msg06921.html

Unfortunately I was not able to find it before...
Comment 12 Sergey Popov 2011-08-31 09:25:23 UTC
hm, i will try, but i do not understand clearly - what acpi functions will not work properly with this workaround?
Comment 13 WZab 2011-08-31 09:57:57 UTC
Well, According to report from 2007 the BIOS incorrectly reports, that ACPI is CST capable. So I think that with this parameter ACPI should work correctly. The only problem is that you have to add additional parameter to boot correctly.
However I may be wrong.
Comment 14 Zhang Rui 2012-01-18 05:21:27 UTC
It's great that the kernel bugzilla is back.

Can you please verify if the problem still exists in the latest upstream
kernel?
Comment 15 Jonathan Nieder 2012-01-18 08:25:30 UTC
Thanks for writing.  Quick summary:

Symptoms: NULL pointer dereferences, segfaults, hangs at boot
Tested kernel versions: 2.6.39-rc7, 3.1.6, and 3.2-rc4, among others
Regression: yes, 2.6.38 and 2.6.32.y work fine
Motherboards: D865GBF, D865GRH, D865PERL, D865PERLK
Workarounds: processor.nocst=1, or disable hyperthreading in BIOS

Presumably the _CST table on these systems is somehow problematic. acpidumps both with and without hyperthreading disabled available from the downstream bug reports: see

 - http://bugs.debian.org/627019
 - https://bugzilla.redhat.com/show_bug.cgi?id=727865

(WZab or Pinkbyte, attaching acpidumps with and without hyperthreading here would be useful to avoid having to hunt around so much.)
Comment 16 reaper.thresher 2012-01-18 14:44:49 UTC
Created attachment 72108 [details]
ACPI dump (with HT disabled in the BIOS)

I'm not sure this will be helpful, but I've attached an acpidump from my box running 3.1.7. The issue still persists for me and I can't boot with HT enabled in the BIOS (hence the dump is made with HT disabled).

I hope this helps!
Comment 17 Jonathan Nieder 2012-01-19 18:53:02 UTC
Created attachment 72131 [details]
acpidump with HT enabled

From http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=627019#54
Comment 18 WZab 2012-01-20 22:59:03 UTC
Created attachment 72144 [details]
Result of acpidump from my affected system, booted with HT on (and with nocst=1 to allow booting)
Comment 19 WZab 2012-01-20 23:30:28 UTC
Now I've started my system with kernel 3.2.0, with HT and without processor.nocst=1 option and it seems to work stable.

Unfortunately I can't perform more detailed log check, as due to another bug (http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=655152 ) my logs are totally overloaded with "[drm:intel_prepare_page_flip] *ERROR* Prepared flip multiple times" messages :-(.
Comment 20 WZab 2012-01-20 23:44:12 UTC
Oooops, my last comment was too optimistic.
Without the "processor.nocst=1" option my system booted only once (during the first reboot with CTRL+ALT+DEL after correct operation with this option set).
When I tried to restart it again (without power cycling, only with CTRL+ALT+DEL from text console) it crashed during boot.
After pressing the reset button and booting without processor.nocst=1 - the system crashes regularly during the boot.

So obviously the problem stil exists in version 3.2.0

(Regarding the second problem, I have found, that disabling the desktop effects in KDE plasma desktop eliminates the multiple "[drm:intel_prepare_page_flip] *ERROR* Prepared flip multiple times" messages)
Comment 21 Jonathan Nieder 2012-03-24 03:13:54 UTC
Created attachment 72696 [details]
patch from acpica tree

Does the attached patch help?
Comment 22 WZab 2012-03-25 11:18:25 UTC
Yes it helps.
Now I use the kernel 3.2.11. 
Without this patch and without the boot parameter "processor.nocst=1" the kernel hangs during boot.
After application of this patch kernel boots and worked correctly.

OK. In fact I hav introduced those changes by hand, as the line numbers in 3.2.11 have changed:

# diff -c tbfadt.c~ tbfadt.c
*** tbfadt.c~   2012-03-13 18:05:09.000000000 +0100
--- tbfadt.c    2012-03-25 12:24:31.000000000 +0200
***************
*** 350,358 ****
        u32 address32;
        u32 i;
  
-       /* Update the local FADT table header length */
- 
-       acpi_gbl_FADT.header.length = sizeof(struct acpi_table_fadt);
  
        /*
         * Expand the 32-bit FACS and DSDT addresses to 64-bit as necessary.
--- 350,355 ----
***************
*** 395,400 ****
--- 392,401 ----
                acpi_gbl_FADT.boot_flags = 0;
        }
  
+       /* Update the local FADT table header length */
+ 
+       acpi_gbl_FADT.header.length = sizeof(struct acpi_table_fadt);
+ 
        /*
         * Expand the ACPI 1.0 32-bit addresses to the ACPI 2.0 64-bit "X"
         * generic address structures as necessary. Later code will always use
Comment 23 Jonathan Nieder 2012-03-25 14:17:57 UTC
Thanks for testing.
Comment 24 WZab 2012-04-03 19:21:08 UTC
Unfortunately the newest 3.3.1 kernel still has the same problem.
It is necessary to aply thew patch as below (idea is the same, but line numbers have changed):

# diff -c tbfadt.c~ tbfadt.c 
*** tbfadt.c~   2012-04-02 19:32:52.000000000 +0200
--- tbfadt.c    2012-04-03 21:19:40.000000000 +0200
***************
*** 363,371 ****
        u32 address32;
        u32 i;
  
-       /* Update the local FADT table header length */
- 
-       acpi_gbl_FADT.header.length = sizeof(struct acpi_table_fadt);
  
        /*
         * Expand the 32-bit FACS and DSDT addresses to 64-bit as necessary.
--- 363,368 ----
***************
*** 408,413 ****
--- 405,414 ----
                acpi_gbl_FADT.boot_flags = 0;
        }
  
+       /* Update the local FADT table header length */
+ 
+       acpi_gbl_FADT.header.length = sizeof(struct acpi_table_fadt);
+ 
        /*
         * Expand the ACPI 1.0 32-bit addresses to the ACPI 2.0 64-bit "X"
         * generic address structures as necessary. Later code will always use