Bug 11541
Description
Jakub Liput
2008-09-11 09:33:20 UTC
Will you please add the following boot options and see whether the system can be booted normally? a. idle=poll b. nolapic_timer c. processor.max_cstate=1 ( Had better compile the processor as the built-in kenrel module. Please set CONFIG_ACPI_PROCESSOR=y in kernel configuration.) d. nohz=off (Disable tickless-feature) It will be great if you can attach the output of acpidump. Thanks. Created attachment 17747 [details]
acpidump - kernel 2.6.27-rc6 (acpi=off)
Executed on Linux 2.6.27-rc5 with "acpi=off" boot option.
(In reply to comment #1) > Will you please add the following boot options and see whether the system can > be booted normally? > a. idle=poll > b. nolapic_timer > c. processor.max_cstate=1 ( Had better compile the processor as the > built-in > kenrel module. Please set CONFIG_ACPI_PROCESSOR=y in kernel configuration.) > d. nohz=off (Disable tickless-feature) > > > It will be great if you can attach the output of acpidump. > Thanks. > > Unfortunately kernel doesn't boot with any of these options... Only with "acpi=off". I've attached acpidump in previous message. Will you please add the boot option of "initcall_debug" and capture the picture of the screen when the system hangs? thanks. Created attachment 17759 [details]
Photo of a screen where Linux freezes
Photo of a screen where Linux freezes. (Linux 2.6.27-rc5)
(I didn't know the other way of capture this screen. Sorry if there is a better one :) )
Created attachment 17798 [details] try the custom DSDT Will you please try the attached DSDT and see whether the system still hangs? How to use the custom DSDT can be found in: http://www.lesswatts.org/projects/acpi/faq.php In the attached DSDT the GETD method won't be called again in the _INI object of VGA device. From the DSDT it seems that the system will trigger SMI operation in the GETD method. Thansk Created attachment 17805 [details]
screenshot #2
Thanks for this DSDT - it solved some problems, but unfortunately kernel still hangs - this time in another place.
I've attached screenshots from Linux 2.6.27-rc6 with "initcall_debug" and without.
As You can see, system freezes after "io scheduler cfq..." - I've tried to compile Linux without CFQ and other IO Schedulers, but it still doesn't work.
When I boot system without ACPI, it works. After "io scheduler cfq registered (default)" next three lines are:
pci 0000:02:00.0 Boot video device
pcieport-driver 0000:00:02.0: setting latency timer to 64
pcieport-driver 0000:00:02.0 found MSI capability
(...)
please attach the dmesg -s64000 from the latest kernel that boots with ACPI enabled. It would be helpful if if you could isolate which exact release broke this laptop, or ideal if you can git bisect which commit caused the regression. Hi, Jakub Will you please add the following boot option and see whether the system will hangs? The custom DSDT is still required and the boot option of "initcall_debug" is always added. a. nolapic_timer b. idle=poll c. processor.max_cstate=1 d. nohz=off(disable Tickless feature) If the system still hangs, please attach the screen picture for every boot option. thanks. Created attachment 17838 [details]
dmesg from 2.6.21.7 with ACPI
I've testes some kernels and:
- the lastest version that runs with ACPI is 2.6.21.7 (tested with and without DSDT modifications)
- the earliest version that doesn't run with ACPI is 2.6.22-rc1
I've attached dmesg from 2.6.21.7 (without DSDT modification).
Created attachment 17839 [details]
new four screenshots, where system hangs (with DSDT)
Hi, Yakui
Unfortunately system still hangs. I've attached four screenshots (created with various options).
Hi, Jakub Sorry for the late response. From the log in comment #10 the 2.6.21.7 kernel can be booted with ACPI. After checking the ACPI table I find that the ECDT table is incorrect. >Namepath : \\_SB.PCI0.SBRG.EC0 In fact the namepath should be \_SB.PCI0.SBRG.EC0. Because of the above error the EC device is not initialized before evaluating the _INI object on the 2.6.21.7 kernel. But from the 2.6.22-rc1 kernel the EC device will be initialized before evaluating the _INI object as there exists the ECDT table. I don't know why the system is broken by initializing EC device before evaluating the _INI object. Maybe it is related with the BIOS. Hi, Jakub From the attached screenshots it seems that the system hangs in the function of pci_init, in which some quirks function will be called for the corresponding PCI device. I also can't understand why the PCI quirk function is related with the EC initialization order. In kernel 2.6.21.7 the EC device is initialized after evaluating _INI object. But from the kernel 2.6.22-rc1 the EC device is initialized on your laptop before evaluating the _INI object. (In the _INI object of VGA Device the SMI is triggered). From the screenshot in comment #5 we can know that the system will hang in evaluating the _INI object after the initializing EC device. Maybe this is related with BIOS. Please confirm whether the system can be fixed by upgrading BIOS. thanks. Hi, Jakub From the attached screenshots in comment #11 it seems that the system hangs in the function of pci_init after the custom DSDT is used, in which some quirks function will be called for the corresponding PCI device. (In the custom DSDT the _INI object of VGA device is skipped.) I also can't understand why the PCI quirk function is related with the EC initialization order. In kernel 2.6.21.7 the EC device is initialized after evaluating _INI object. But from the kernel 2.6.22-rc1 the EC device is initialized on your laptop before evaluating the _INI object. (In the _INI object of VGA Device the SMI is triggered). From the screenshot in comment #5 we can know that the system will hang in evaluating the _INI object after the initializing EC device.(the custom DSDT table is not used). Maybe this is related with BIOS. Please confirm whether the system can be fixed by upgrading BIOS. thanks. Created attachment 18086 [details]
debug patch: EC device is initialized after evaluating the _INI object
In the 2.6.21.7 kernel the EC device is initialized after evaluating the _INI object, the system can be booted normally.
Will you please try this debug patch on the latest kernel and see whether the system can be booted?
In this debug patch the EC device is forced to be initialized after evaluating the _INI object.
Thanks.
Hello, Sorry for a late reply, but these days I've got problems with internet connection. Thanks for a patch, but unfortunately it doesn't fix the bug. I tried compiling both with custom DSDT. There are messages from the screen, where system hangs (these days I also don't have a digital camera): with custom DSDT: calling init_udf_fs+0xx/0x49 initcall init_udf_fs+0x0/0x49 returned 0 after 0 msecs calling ipc_init+0x0/0x1c mgmni has been set to 1731 initcall ipc_init+0x0/0x1c returned 0 after 0 msecs calling ipc_sysctl_init+0x0/0xd initcall ipc_sysctl_init+0x0/0xd returned 0 after 0 msecs calling init_mqueue_fs+0x0/0xa5 initcall init_mqueue_fs+0x0/0xa5 returned 0 after 0 msecs calling noop_init+0x0/0xd io scheduler noop registered initcall noop_init+0x0/0xd returned 0 after 0 msecs calling as_init+0x0/0xd io scheduler deadline registered initcall deadline_init+0x0/0xd returned 0 after 0 msecs calling cfq_init+0x0/0x84 io scheduler cfq registered (default) initcall cfq_init+0x0/0x84 returned 0 after 0 msecs calling percpu_counter_startup+0x0/0xd initcall percpu_counter_startup+0x0/0xd returned 0 after 0 msecs calling pci_init+0x0/0x28 _ without custom DSDT: calling param_sysfs_init+0x0/0x147 initcall param_sysfs_init+0x0/0x147 returned 0 after 11 msecs calling pm_sysrq_init+0x0/0x12 initcall pm_sysrq_init+0x0/0x12 returned 0 after 0 msecs calling readahead_init+0x0/0x29 initcall readahead_init+0x0/0x29 returned 0 after 0 msecs calling init_bio+0x0/0xa0 initcall init_bio+0x0/0xa0 returned 0 after 0 msecs calling blk_settings_init+0x0/0x19 initcall blk_settings_init+0x0/0x19 returned 0 after 0 msecs calling blk_ioc_init+0x0/0x22 initcall blk_ioc_init+0x0/0x22 returned 0 after 0 msecs calling genhd_device_init+0x0/0x3d initcall genhd_device_init+0x0/0x3d returned 0 after 0 msecs calling pci_slot_init+0x0/0x37 initcall pci_slot_init+0x0/0x37 returned 0 after 0 msecs calling fbmem_init+0x0/0x74 initcall fbmem_init+0x0/0x74 returned 0 after 0 msecs calling acpi_init+0x0/0x220 evgpeblk-0957 [00] ev_create_gpe_block : GPE 00 to 1F [_GPE] 4 regs on int 0x9 Completing Region/Field/Buffer/Package initialization:.......................... ................................................................................ ........................................ Initialized 50/50 Regions 17/17 Fields 32/32 Buffers 45/45 Packages (1494 nodes) Initializing Device/Processor/Thermal objects by executing _INI methods:..._ Unfortunately, there is no BIOS update available - on Asus website there is only BIOS 204, which I already have. I am experiencing this problem too. Is there any info that you still need? I am experiencing this problem too on a M51Ta notebook (AFAIK same hardware but HD3650 video card). Yesterday I noticed that 207 bios has been released for M51Tr (unfortunately not for M51Ta, even though I think it should be the same code...). Has anyone tried if this update solves the problem? Hello. I've tried new BIOS (207) on my M51Tr, but unfortunately it doesn't solve the problem. Created attachment 18423 [details]
Ignore the _INI object in course of ACPI initialization
Will you please try the debug patch and see whether the problem still exists?
Thanks.
Created attachment 18481 [details]
dmesg of PC-BSD install cd on M51Ta
I tried, just for curiosity, to boot my M51Ta with PC-BSD 7.0.1 install cd and (surprisingly?) it works with ACPI support.
I don't know if this info can be useful to solve this bug in Linux, but I'm attaching the output of dmesg hoping that it could be.
(btw I didn't test with an installed PC-BSD too both because of lack of a free partition to install and of the video card not supported by the graphical installer)
(In reply to comment #20) > Will you please try the debug patch and see whether the problem still exists? > Thanks. Thanks for patch, but unfortunately it still doesn't fix the bug (compiled with Linux 2.6.27.3). My dmesg seems to be the same, as the last dmesg that I attached (with ACPI): (...) calling init_udf_fs+0xx/0x49 initcall init_udf_fs+0x0/0x49 returned 0 after 0 msecs calling ipc_init+0x0/0x1c mgmni has been set to 1731 initcall ipc_init+0x0/0x1c returned 0 after 0 msecs calling ipc_sysctl_init+0x0/0xd initcall ipc_sysctl_init+0x0/0xd returned 0 after 0 msecs calling init_mqueue_fs+0x0/0xa5 initcall init_mqueue_fs+0x0/0xa5 returned 0 after 0 msecs calling noop_init+0x0/0xd io scheduler noop registered initcall noop_init+0x0/0xd returned 0 after 0 msecs calling as_init+0x0/0xd io scheduler deadline registered initcall deadline_init+0x0/0xd returned 0 after 0 msecs calling cfq_init+0x0/0x84 io scheduler cfq registered (default) initcall cfq_init+0x0/0x84 returned 0 after 0 msecs calling percpu_counter_startup+0x0/0xd initcall percpu_counter_startup+0x0/0xd returned 0 after 0 msecs calling pci_init+0x0/0x28 *** Bug 11737 has been marked as a duplicate of this bug. *** the problem doesn't exist in 2.6.21.7. the problem exists in 2.6.22-rc1. can anybody try to use git-bisect to find out the commit that introduces this bug please? how to use git-bisect can be found at: http://www.lesswatts.org/projects/acpi/debug.php Hello, i have the same laptop (asus m51tr) and i think i've the same problem. Without acpi=off, kernel freeze at (sorry, i'm new on this website, i don't how to create an "attachment" so let me give you my picture on my webspace): http://snacksou33.free.fr/28112008161.jpg Whatever the option (idle=pool, nolapic_timer, processor.name_cstate=1, nohz=off), it still freeze. I'm with 2.6.26-1-amd64 kernel. I didn't find any dsdt for this laptop. So, is there a fix for this problem now, or....? My apologize for my english, it's not my foreign language... Thks. please try booting with "pci=nommconf" thanks, but on M51Ta does ot work ... :( neither on mine. Still freeze on the same instruction... (In reply to comment #24) > the problem doesn't exist in 2.6.21.7. > the problem exists in 2.6.22-rc1. > can anybody try to use git-bisect to find out the commit that introduces this > bug please? > > how to use git-bisect can be found at: > http://www.lesswatts.org/projects/acpi/debug.php > Hello, I suppose this is the commit, which has broken ACPI on M51Ta [2b1f6278d77c1f2f669346fc2bb48012b5e9495a] Created attachment 19075 [details]
git-bissect log m51ta
git-bisect log
Hello, i've tried with nosmp and nolapic options ans then i can boot. But i've got only one cpu and i think that my battery is not so powerfull as before (less autonomy) which is useless for a laptop :( Ondrej, thanks for the bisect. If b1f6278d77c1f2f669346fc2bb48012b5e9495a is at fault, then this is an MTRR related issue. Does your system boot a kernel built with CONFIG_MTRR=n? confirmed. I can boot kernel with acpi when kernel is compiled with CONFIG_MTRR=n. Should i post some further debug info? also confirm that with CONFIG_MTRR=n and kernel 2.6.27.7 on a asus m51tr it works great :) both cpu, automatic halt, ect... Thanks a lot for this fix ;-) See you. Thanks for the test and confirmation. It seems that the system can work well w/o ACPI enabled if CONFIG_MTTR is disabled in kernel configuration. So it will be assigned to the MTTR category of memory management. don't know if this is on Rafael's regression list. cc Rafael. /usr/src/linux-2.6.28-gentoo/arch/x86/kernel/cpu/mtrr/generic.c:218: mtrr_state.have_fixed = 0;/*rba(lo >> 8) & 1;rba*/ aton ~ # cat /proc/mtrr reg00: base=0x000000000 ( 0MB), size= 1024MB, count=1: write-back reg01: base=0x040000000 ( 1024MB), size= 512MB, count=1: write-back reg02: base=0x060000000 ( 1536MB), size= 256MB, count=1: write-back #################### /proc/acpi/video/VGA/LCDD/EDID <not supported> #################### /proc/acpi/video/VGA/LCDD/brightness levels: 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 current: 0 #################### /proc/acpi/video/VGA/LCDD/state state: 0x1f query: 0x01 #################### /proc/acpi/video/VGA/LCDD/info device_id: 0x0110 type: UNKNOWN known by bios: no #################### /proc/acpi/video/VGA/DVID/EDID <not supported> #################### /proc/acpi/video/VGA/DVID/brightness <not supported> #################### /proc/acpi/video/VGA/DVID/state state: 0x1d query: 0x00 #################### /proc/acpi/video/VGA/DVID/info device_id: 0x0210 type: UNKNOWN known by bios: no #################### /proc/acpi/video/VGA/CRTD/EDID <not supported> #################### /proc/acpi/video/VGA/CRTD/brightness <not supported> #################### /proc/acpi/video/VGA/CRTD/state state: 0x1d query: 0x00 #################### /proc/acpi/video/VGA/CRTD/info device_id: 0x0100 type: UNKNOWN known by bios: no #################### /proc/acpi/video/VGA/DOS DOS setting: <0> #################### /proc/acpi/video/VGA/POST <not supported> #################### /proc/acpi/video/VGA/POST_info <not supported> #################### /proc/acpi/video/VGA/ROM #################### /proc/acpi/video/VGA/info Switching heads: yes Video ROM: no Device to be POSTed on boot: no #################### /proc/acpi/wakeup Device S-state Status Sysfs node SBAZ S4 disabled pci:0000:00:14.2 UHC1 S4 disabled pci:0000:00:12.0 UHC2 S4 disabled pci:0000:00:12.1 UHC3 S4 disabled pci:0000:00:12.2 USB4 S4 disabled pci:0000:00:13.0 UHC5 S4 disabled pci:0000:00:13.1 UHC6 S4 disabled pci:0000:00:13.2 UHC7 S4 disabled pci:0000:00:14.5 PCE4 S4 disabled pci:0000:00:04.0 PCE5 S4 disabled pci:0000:00:05.0 GLAN S4 disabled pci:0000:04:00.0 PCE6 S4 disabled pci:0000:00:06.0 PCE2 S4 disabled pci:0000:00:02.0 P0PC S4 disabled pci:0000:00:14.4 PWRB S4 *enabled SLPB S4 *enabled #################### /proc/acpi/sleep S0 S3 S4 S5 #################### /proc/acpi/thermal_zone/THRM/polling_frequency <polling disabled> #################### /proc/acpi/thermal_zone/THRM/cooling_mode 0 - Active; 1 - Passive #################### /proc/acpi/thermal_zone/THRM/trip_points critical (S5): 108 C passive: 105 C: tc1=2 tc2=10 tsp=100 devices=P001 P002 #################### /proc/acpi/thermal_zone/THRM/temperature temperature: 59 C #################### /proc/acpi/thermal_zone/THRM/state state: ok #################### /proc/acpi/processor/P002/power active state: C0 max_cstate: C8 bus master activity: 00000000 maximum allowed latency: 2000000000 usec #################### /proc/acpi/processor/P002/limit <not supported> #################### /proc/acpi/processor/P002/throttling <not supported> #################### /proc/acpi/processor/P002/info processor id: 1 acpi id: 2 bus mastering control: yes power management: no throttling control: no limit interface: no #################### /proc/acpi/processor/P001/power active state: C0 max_cstate: C8 bus master activity: 00000000 maximum allowed latency: 2000000000 usec C1: type[C1] promotion[--] demotion[--] latency[000] usage[00000000] duration[00000000000000000000] #################### /proc/acpi/processor/P001/limit active limit: P0:T0 user limit: P0:T0 thermal limit: P0:T0 #################### /proc/acpi/processor/P001/throttling state count: 8 active state: T0 state available: T0 to T7 *T0: 100% T1: 87% T2: 75% T3: 62% T4: 50% T5: 37% T6: 25% T7: 12% #################### /proc/acpi/processor/P001/info processor id: 0 acpi id: 1 bus mastering control: yes power management: no throttling control: yes limit interface: yes #################### /proc/acpi/button/lid/LID/state state: open #################### /proc/acpi/button/lid/LID/info type: Lid Switch #################### /proc/acpi/button/sleep/SLPB/info type: Sleep Button (CM) #################### /proc/acpi/button/power/PWRB/info type: Power Button (CM) #################### /proc/acpi/button/power/PWRF/info type: Power Button (FF) #################### /proc/acpi/battery/BAT0/alarm alarm: unsupported #################### /proc/acpi/battery/BAT0/state present: yes capacity state: ok charging state: charged present rate: 0 mW remaining capacity: 47344 mWh present voltage: 12403 mV #################### /proc/acpi/battery/BAT0/info present: yes design capacity: 52800 mWh last full capacity: 48345 mWh battery technology: rechargeable design voltage: 11100 mV design capacity warning: 5280 mWh design capacity low: 1584 mWh capacity granularity 1: 528 mWh capacity granularity 2: 528 mWh model number: F3---24 serial number: battery type: LIon OEM info: ASUSTEK #################### /proc/acpi/ac_adapter/AC0/state state: on-line #################### /proc/acpi/event cat: /proc/acpi/event: Device or resource busy #################### /proc/acpi/fadt #################### /proc/acpi/dsdt ANVI pASMIh` 1SMBB zh 1SMBW zh SMBK zh ECRW zh BS_A [ TOPM ROMS MG1B MG1L MG2B MG2L HPTA CPB0 CPB1 CPB2 CPB3 ASSB AAXB SMIF MG3B MG3L MH1B MH1L Windows 2000 Windows 2001 Windows 2001 SP1 Windows 2001 SP2 Windows 2001.1 Windows 2001.1 SP1 Windows 2006 Microsoft Windows NT Microsoft WindowsME: Millennium Edition @0MI__ MD__ HI__ HD__ @&MCI_ MCD_ [ H PCE4 SCAP SCTL N _PS0 K _PS0 SWHD p O CBS0 XCFG ['MUTE XCFG piXCFG['MUTE XCFG {XCFGja}aiXCFG['MUTE SBRV WBOV RPIN {h SWTC pUKER`p L RBEP @ WBEP A ECAV RFOV p RBOV p (DCPS pGPWS` ISMI phSMCM ASMI phALPRp RAMB AVOL FADR FSIZ DBR1 DBR2 DBR3 DBR4 LCDV LCDR BIPA RTCW ALPR PSTN GNBF HDDF Windows 2001 Windows 2001 SP1 Windows 2001 SP2 Windows 2006 Microsoft Windows Microsoft WindowsME: Millennium Edition Microsoft Windows NT GGCC BATShp^/ "CHGS p/ BLK1 BLK2 BLK3 BLK4 BLK5 BLK6 BLK7 BLK8 p^^/ BSMI phBIPAISMI BLK1 BLK2 BLK3 BLK4 BLK5 BLK6 BLK7 BLK8 LDI0 LDI1 p THRI THR1 p NMFN p ODPI ODPC ODPM {ODPM MDL1 MDL2 MDL3 MDL4 p BDL1 BDL2 BDL3 BDL4 p EDL1 EDL2 EDL3 EDL4 p LDI0 LDI1 p -DCPS p/ PRJW p !OWLD phWRSTph/ SBTL / #################### /proc/acpi/info version: 20080926 #################### /proc/acpi/embedded_controller/EC0/info gpe: 0x0e ports: 0x66, 0x62 use global lock: no Created attachment 19546 [details]
dsdt.txt
aton ~ # hexdump -C /proc/acpi/fadt
00000000 46 41 43 50 84 00 00 00 02 ad 30 38 32 39 30 38 |FACP......082908|
00000010 46 41 43 50 30 39 35 37 29 08 08 20 4d 53 46 54 |FACP0957).. MSFT|
00000020 97 00 00 00 00 e0 f9 6f 80 06 f9 6f 01 02 09 00 |.......o...o....|
00000030 b0 00 00 00 e1 1e 00 e2 00 08 00 00 00 00 00 00 |................|
00000040 04 08 00 00 00 00 00 00 ff 08 00 00 08 08 00 00 |................|
00000050 20 08 00 00 00 00 00 00 04 02 01 04 08 00 00 00 | ...............|
00000060 65 00 e9 03 00 04 10 00 01 03 0d 00 32 03 00 00 |e...........2...|
00000070 a5 c1 01 00 01 08 00 00 f9 0c 00 00 00 00 00 00 |................|
00000080 06 00 00 00 |....|
00000084
Hello, I have ASUS M51Ta laptop with AMD Turion Ultra processor (ZM-84) - AMD Family 11h. And I can confirm that this problem is present in 2.6.27.x kernel Tested: Fedora 10, Mandriva 2009, OpenSuSE 11.1, Ubuntu 8.10 Filed corresponding bug reports: https://bugzilla.redhat.com/show_bug.cgi?id=477978 https://qa.mandriva.com/show_bug.cgi?id=46639 https://bugzilla.novell.com/show_bug.cgi?id=462637 My understanding that "AMD Family 11h" is supported in Linux kernel 2.6.28. And problem should be fixed by now. Reference to Release Notes: 6. Tracing/Profiling OProfile * Add support for AMD Family 11h Can someone confirm that problem has been fixed in kernel 2.6.28? If yes - are there any plans to backport changes to kernel 2.6.27, as it used by latest versions of all major distributions? Following bugs filed for OpenSuSE may be related to this: 1) System freeze after 10.3 -> 11.0 upgrade for ACPI conflict https://bugzilla.novell.com/show_bug.cgi?id=411797 2) C1 state unsupported on modern AMD mobile CPUs https://bugzilla.novell.com/show_bug.cgi?id=432809 3) AMD Sempron 3500/3600+ needs highres=off and nohz=off boot params https://bugzilla.novell.com/show_bug.cgi?id=396220 Created attachment 19657 [details]
ASUS M51Ta - lspci
ASUS M51Ta - lspci output
* AMD Turion Ultra ZM-84
* RS780 Host Bridge
* Advanced Micro Devices [AMD] Family 11h HyperTransport Configuration (rev 40)
Created attachment 19658 [details]
ASUS M51Ta ( AMD Turion Ultra CPU, RS780 Host Bridge) - dmidecode
* AMD Turion Ultra (ZM-84) processor
* AMD RS780 Host Bridge
* Host bridge: Advanced Micro Devices [AMD] Family 11h HyperTransport Configuration (rev 40)
Created attachment 19659 [details]
ASUS M51Ta ( AMD Turion Ultra, RS780 HB) - Fedora 10 dmesg output with ACPI=OFF
ASUS M51Ta ( AMD Turion Ultra CPU, RS780 Host Bridge) - Fedora 10 dmesg output with ACPI=OFF
* AMD Turion Ultra (ZM-84) processor
* AMD RS780 Host Bridge
* Host bridge: Advanced Micro Devices [AMD] Family 11h HyperTransport
Configuration (rev 40)
With ACPI=OFF, system heats enermously.
I tested Ubuntu 9.04 Alpha 2 (kernel 2.6.28) on ASUS M51Ta laptop. Despite it has kernel 2.6.28, which is supposed to support AMD Family 11h procesors, boot procedure hangs. I forced parameter "maxcpus=1" and was able to boot after it. ACPI was working, but processor was at 100% of CPU frequency (active state: T0) Here are dteails what was available after boot with 1 CPU: #ls /proc/acpi ac_adapter battery button dsdt embedded_controller event fadt fan info power_resource processor sleep thermal_zone video wakeup # cat /proc/acpi/processor/P001/info processor id: 0 acpi id: 1 bus mastering control: yes power management: no throttling control: yes limit interface: yes # cat /proc/acpi/processor/P001/limit active limit: P0:T0 user limit: P0:T0 thermal limit: P0:T0 # cat /proc/acpi/processor/P001/power active state: C0 max_cstate: C8 bus master activity: 00000000 maximum allowed latency: 2000000000 usec states: C1: type[C1] promotion[--] demotion[--] latency[000] usage[00000000] duration[00000000000000000000] # cat /proc/acpi/processor/P001/throttling state count: 8 active state: T0 state available: T0 to T7 states: *T0: 100% T1: 87% T2: 75% T3: 62% T4: 50% T5: 37% T6: 25% T7: 12% Created attachment 19675 [details]
acpidump for ASUS M51Ta (AMD Turion Ultra)
ACPI dump has been made after booting with 2.6.28 kernel (Ubuntu 8.04 Alpha 2), option "maxcpus=1".
from /proc/cpuinfo:
processor : 0
vendor_id : AuthenticAMD
cpu family : 17
power management: ts ttp tm stc 100mhzsteps hwpstate
CPU works at full frequency, and system heats - but less than with ACPI=off
# sensors
acpitz-virtual-0
Adapter: Virtual device
temp1: +73.0В°C (crit = +108.0В°C)
Created attachment 19676 [details]
DMESG from Ubuntu 9.04 Alpha 2 (kernel 2.6.28) boot, option "maxcpus=1"
from DMESG:
[ 0.000000] BIOS EBDA/lowmem at: 0009fc00/0009fc00
[ 0.000000] Initializing cgroup subsys cpuset
[ 0.000000] Initializing cgroup subsys cpu
[ 0.000000] Linux version 2.6.28-3-generic (buildd@palmer) (gcc version 4.3.3 20081210 (prerelease) (Ubuntu 4.3.2-2ubuntu8) ) #4-Ubuntu SMP Fri Dec 12 22:48:15 UTC 2008 (Ubuntu 2.6.28-3.4-generic)
...
[ 0.000000] ACPI: RSDP 000FA3C0, 0014 (r0 ACPIAM)
[ 0.000000] ACPI: RSDT AFF80000, 004C (r1 _ASUS_ Notebook 20080909 MSFT 97)
[ 0.000000] ACPI: FACP AFF80200, 0084 (r2 090908 FACP1039 20080909 MSFT 97)
[ 0.000000] ACPI: DSDT AFF80680, BC7E (r1 M51Ta M51Ta001 1 INTL 20051117)
[ 0.000000] ACPI: FACS AFF8E000, 0040
[ 0.000000] ACPI: APIC AFF80390, 005C (r1 090908 APIC1039 20080909 MSFT 97)
[ 0.000000] ACPI: MCFG AFF80430, 003C (r1 090908 OEMMCFG 20080909 MSFT 97)
[ 0.000000] ACPI: SLIC AFF80470, 0176 (r1 _ASUS_ Notebook 20080909 MSFT 97)
[ 0.000000] ACPI: ECDT AFF80620, 0055 (r1 090908 OEMECDT 20080909 MSFT 97)
[ 0.000000] ACPI: DBGP AFF803F0, 0034 (r1 090908 DBGP1039 20080909 MSFT 97)
[ 0.000000] ACPI: BOOT AFF805F0, 0028 (r1 090908 BOOT1039 20080909 MSFT 97)
[ 0.000000] ACPI: OEMB AFF8E040, 0071 (r1 090908 OEMB1039 20080909 MSFT 97)
[ 0.000000] ACPI: HPET AFF8C300, 0038 (r1 090908 OEMHPET 20080909 MSFT 97)
[ 0.000000] ACPI: SSDT AFF8C340, 0386 (r1 AMI POWERNOW 1 AMD 1)
[ 0.000000] ACPI: Local APIC address 0xfee00000
...
[ 0.432001] ACPI: EC: EC description table is found, configuring boot EC
[ 0.436657] ACPI: BIOS _OSI(Linux) query ignored
[ 0.449340] ACPI: Interpreter enabled
[ 0.449344] ACPI: (supports S0 S3 S4 S5)
[ 0.449363] ACPI: Using IOAPIC for interrupt routing
...
[ 1.887814] fuse init (API version 7.10)
[ 1.989660] processor ACPI_CPU:00: registered as cooling_device0
[ 1.989664] ACPI: Processor [P001] (supports 8 throttling states)
[ 1.989716] processor ACPI_CPU:01: registered as cooling_device1
[ 1.991474] ACPI: EC: non-query interrupt received, switching to interrupt mode
[ 1.994688] thermal LNXTHERM:01: registered as thermal_zone0
[ 1.996097] ACPI: Thermal Zone [THRM] (71 C)
...
[ 96.704020] ACPI: Power Button (FF) [PWRF]
[ 96.704151] input: Power Button (CM) as /devices/LNXSYSTM:00/device:00/PNP0C0C:00/input/input4
[ 96.720042] ACPI: Power Button (CM) [PWRB]
[ 96.720125] input: Sleep Button (CM) as /devices/LNXSYSTM:00/device:00/PNP0C0E:00/input/input5
[ 96.736047] ACPI: Sleep Button (CM) [SLPB]
[ 96.736154] input: Lid Switch as /devices/LNXSYSTM:00/device:00/PNP0C0D:00/input/input6
[ 96.738589] ACPI: Lid Switch [LID]
[ 96.801855] asus-laptop: Asus Laptop Support version 0.42
[ 96.805245] asus-laptop: M51Ta model detected
[ 96.843516] asus-laptop: Brightness ignored, must be controlled by ACPI video driver
[ 96.843568] Registered led device: asus::mail
[ 97.104049] usb 6-2: new full speed USB device using ohci_hcd and address 3
[ 97.278579] usb 6-2: configuration #1 chosen from 1 choice
[ 97.318573] ACPI: AC Adapter [AC0] (on-line)
[ 97.491998] ACPI: Battery Slot [BAT0] (battery present)
[ 97.617063] acpi device:06: registered as cooling_device2
[ 97.617226] input: Video Bus as /devices/LNXSYSTM:00/device:00/PNP0A03:00/device:02/device:03/input/input7
[ 97.632042] ACPI: Video Device [VGA] (multi-head: yes rom: no post: no)
...
[ 111.238166] ACPI: WMI: Mapper loaded
[ 112.473484] powernow-k8: Found 1 AMD Turion(tm) X2 Ultra Dual-Core Mobile ZM-84 processors (1 cpu cores) (version 2.20.00)
[ 112.473530] powernow-k8: 0 : pstate 0 (2300 MHz)
[ 112.473532] powernow-k8: 1 : pstate 1 (1200 MHz)
[ 112.473533] powernow-k8: 2 : pstate 2 (600 MHz)
Last section looks like strange- procesor has 8 power states
(state available: T0 to T7)
Is there powernow-k10 module available?
AMD Turion Ultra is from "AMD Family 11h" family of processors.
(In reply to comment #12) > Hi, Jakub > Sorry for the late response. > From the log in comment #10 the 2.6.21.7 kernel can be booted with ACPI. > After checking the ACPI table I find that the ECDT table is incorrect. > >Namepath : \\_SB.PCI0.SBRG.EC0 > In fact the namepath should be \_SB.PCI0.SBRG.EC0. > Because of the above error the EC device is not initialized before > evaluating the _INI object on the 2.6.21.7 kernel. > But from the 2.6.22-rc1 kernel the EC device will be initialized before > evaluating the _INI object as there exists the ECDT table. > > I don't know why the system is broken by initializing EC device before > evaluating the _INI object. > Maybe it is related with the BIOS. > Hi ykzhao : There is no problem with ECDT, You can find a example for "EC_ID" in ACPI spec. 30b / page 123 The example is "\\_SB.PCI0.ISA.EC" ! Linux kernel can not handle more than 1 backslash prefix ,but it is a Linux issue. Refering the link, please : http://www.acpica.org/bugzilla/show_bug.cgi?id=739 Old kernel is the solution. .in debian net_install (2.6.18-6 - adm64) everything is ok (acpi,smp)- on asus M51TA. but wifi drivers doesn't work, because they are released on 2.6.27 up;/ (madwifi<2.6.27) on all new kernel's i still get bug with acpi/smp. i sugest - use older kernel ;)) sorry for my english by the way :)) First: Thanks for the MTRR disabling solution! Second: Excuse me for impatience, but if there is buggy commit found, is there a chance that this bug will be fixed in 2.6.28.x kernel series? (sure with MTRR support) PS. Galek - this is not the solution, do You want to use the same version of kernel for years ;> ? On the other hand, there is MTRR support disabling solution, but everyone want to be able to use their favorite new versions of distributions out of box... noooo. if your hardware is fully suprted by kernel you upgrade him for a new version? im not! i use 2.6.27 only for wifi support and a new ati drivers. for you, kernel upgrade is the challenge ?;) you heave right, probobly only solution is waiting for 2.6.28.x. Work without smp or acpi makes me tired of doing this (@gentoo) ;/ For me, new kernel means better hardware support, new features and sometimes better performance. Maybe You don't need to upgrade Your kernel, but it's not place for discussions like that. The point is better support for ACPI in our notebooks in new kernel releases. Sorry for off topic. Created attachment 19761 [details] DSDT.DAT for ASUS M51Ta extracted with acpixtract utility from iASL package Results of execution this DSDT.DAT in acpiexec (from iASL package) Download: http://www.acpica.org/downloads/ C:\Users\Vadim\Documents\ACPI\iasl-win-20081204>acpiexec dsdt.dat Intel ACPI Component Architecture AML Execution/Debug Utility version 20081204 [Dec 4 2008] Loading Acpi table from file dsdt.dat ACPI: RSDP @ 0x0044EEC0/0x0028 (v001 I_TEST) ACPI: RSDT @ 0x001400F0/0x0040 (v000 0x00000000 0x00000000) ACPI: TEST @ 0x0044EF00/0x0024 (v001 0x00000000 0x00000000) ACPI: BAD! @ 0x0044EFC0/0x0024 (v001 0x00000000 0x00000000) ACPI: FACP @ 0x0044ECE0/0x00F4 (v003 0x00000000 0x00000000) ACPI: DSDT @ 0x007E0048/0xBC7E (v001 M51Ta M51Ta001 0x00000001 INTL 0x20051117) ACPI: FACS @ 0x0044EDE0/0x0040 ACPI: TEST @ 0x0044EF00/0x0024 (v001 0x00000000 0x00000000) ACPI: SSDT @ 0x004437C0/0x0030 (v001 Intel Many 0x00000001 INTL 0x20030424) ACPI: SSDT @ 0x004437F0/0x0030 (v001 Intel Many 0x00000001 INTL 0x20030424) ACPI: OEM1 @ 0x00443850/0x0038 (v001 Intel Many 0x00000001 INTL 0x20030918) Parsing all Control Methods: Table [DSDT](id 0001) - 1656 Objects with 77 Devices 536 Methods 64 Regions Parsing all Control Methods: Table [SSDT](id 0002) - 1 Objects with 0 Devices 1 Methods 0 Regions Parsing all Control Methods: Table [SSDT](id 0003) - 1 Objects with 0 Devices 1 Methods 0 Regions tbxface-0620 [02] TbLoadNamespace : ACPI Tables successfully acquired evgpeblk-1120 [04] EvCreateGpeBlock : GPE 00 to 3F [_GPE] 8 regs on int 0x0 evgpeblk-1120 [04] EvCreateGpeBlock : GPE 60 to 77 [_GPE] 3 regs on int 0x0 Completing Region/Field/Buffer/Package initialization:.......................... ................................................................................ .................................................. Initialized 64/64 Regions 17/17 Fields 37/37 Buffers 38/38 Packages (1667 nodes) Initializing Device/Processor/Thermal objects by executing _INI methods:...... Executed 6 _INI methods requiring 0 _STA executions (examined 82 objects) evgpeblk-1223 [03] EvInitializeGpeBlock : Found 5 Wake, Enabled 3 Runtime GPEs in this block evgpeblk-1223 [03] EvInitializeGpeBlock : Found 0 Wake, Enabled 0 Runtime GPEs in this block - *** So it looks DSDT table is ok. At least acpiexec can "execute" it. I can add that system (ASUS M51Ta) boots with ACPI with FreeBSD. Therefor problem is indeed inside Linux kernel. Hope it can be fixed soon. in freebsd acpi is fully suported -------------wiki/acpi "Microsoft Windows 98 was the first operating system with full support for ACPI, with FreeBSD, the Linux kernel, NetBSD, OpenBSD and PC versions of Solaris all having at least some support for ACPI". Freebsd & ASUS M51TA = missing 3d ATI ;/ sorry for my english ;/ Can you boot the machine MTRR compiled in, but with the offending git commit 2b1f6278d77c1f2f669346fc2bb48012b5e9495a reverted and add the boot param: mtrr.show and attach dmesg oupput, please. Created attachment 19896 [details]
Fix mtrr debug/show boot parameter
The mtrr.show parameter is currently broken.
This patch has been send upstream.
Please do as described and also additionally add this patch:
- remove the found, offending patch
- add this patch
- add boot param mtrr.show
This should make some reasonable debug info show up in dmesg.
Please attach dmesg output of the boot then.
I'm trying to do this, but it's the first time I'm using GIT... I downloaded the lastest git tree to some directory and I'm launching: bash-3.1# git revert 2b1f6278d77c1f2f669346fc2bb48012b5e9495a warning: too many files, skipping inexact rename detection CONFLICT (delete/modify): arch/i386/kernel/cpu/mtrr/main.c deleted in HEAD and modified in 2b1f627... [PATCH] x86: Save the MTRRs of the BSP before booting an AP. Version 2b1f627... [PATCH] x86: Save the MTRRs of the BSP before booting an AP of arch/i386/kernel/cpu/mtrr/main.c left in tree. CONFLICT (delete/modify): arch/i386/kernel/smpboot.c deleted in HEAD and modified in 2b1f627... [PATCH] x86: Save the MTRRs of the BSP before booting an AP. Version 2b1f627... [PATCH] x86: Save the MTRRs of the BSP before booting an AP of arch/i386/kernel/smpboot.c left in tree. CONFLICT (delete/modify): arch/x86_64/kernel/smpboot.c deleted in HEAD and modified in 2b1f627... [PATCH] x86: Save the MTRRs of the BSP before booting an AP. Version 2b1f627... [PATCH] x86: Save the MTRRs of the BSP before booting an AP of arch/x86_64/kernel/smpboot.c left in tree. CONFLICT (delete/modify): include/asm-i386/mtrr.h deleted in HEAD and modified in 2b1f627... [PATCH] x86: Save the MTRRs of the BSP before booting an AP. Version 2b1f627... [PATCH] x86: Save the MTRRs of the BSP before booting an AP of include/asm-i386/mtrr.h left in tree. CONFLICT (delete/modify): include/asm-x86_64/mtrr.h deleted in HEAD and modified in 2b1f627... [PATCH] x86: Save the MTRRs of the BSP before booting an AP. Version 2b1f627... [PATCH] x86: Save the MTRRs of the BSP before booting an AP of include/asm-x86_64/mtrr.h left in tree. Automatic revert failed. After resolving the conflicts, mark the corrected paths with 'git add <paths>' or 'git rm <paths>' and commit the result. There are some error, so I think it's not working. I can still search over internet for solution, but it will take some time. Could You help me doing this? Thanks in advance. Sorry for my lack of skills. PS. I consider, that some changes in files are causing this, but, as I mentioned before, I use GIT for a first time. Created attachment 19921 [details]
Instead of reverting the offending patch, you may want to add this one on top and boot with bootparam: disable_mtrr_save
This also goes around the offending code.
Do not forget to add the mtrr debug param enhancement patch and also add mtrr.show, so that we get some additional info into the dmesg log.
Hope this one patches for latest git kernel, but it should.
Created attachment 19927 [details]
dmesg
Dmesg from M51ta, patches mentioned have been applied
/proc/mtrr:
reg00: base=0x000000000 ( 0MB), size= 2048MB, count=1: write-back
reg01: base=0x080000000 ( 2048MB), size= 512MB, count=1: write-back
reg02: base=0x0a0000000 ( 2560MB), size= 256MB, count=1: write-back
Hello Thomas, I can test with my laptop (ASUS M51Ta) and get more MTRR debug info if you build required kernel for OpenSuSE 11.1 Does it make sense to test those options onIntel-based system, or MTRR is AMD-specific feature? Same question - about older AMD systems (Turion 64 X2 - TL, TK processors - "Trinidad", "Tyler") http://en.wikipedia.org/wiki/List_of_AMD_Turion_microprocessors Regarding MTRR: Memory Type Range Registers http://en.wikipedia.org/wiki/Mtrr Memory Type Range Registers (MTRRs) are a set of Processor Supplementary Capabilities control registers that provide system software with control of how accesses to memory ranges by the CPU are cached. It uses a set of programmable model-specific registers (MSRs) which are special registers provided by most modern CPUs. Possible access modes to memory ranges can be: * uncached * write-through * write-combining * write-protect * write-back . . . Successor Newer (primarily 64-bit) x86 CPU's support a more advanced technique called Page Attribute Tables that allow for per-table setting of these modes, instead of having a limited number of low-granularity registers to deal with modern memory sizes that can be as high as 8GB even on a laptop, and several times that amount on a desktop system. Details on how MTRRs work in detail are described in the processor manuals from CPU vendors. http://en.wikipedia.org/wiki/Page_Attribute_Table The Page Attribute Table (also known as Page Allocation Table) is a Processor Supplementary Capability extension to the page table format of certain x86 and x86-64 microprocessors. Like Memory Type Range Registers (MTRRs), they allow for fine-grained control over how areas of memory are cached. Unlike MTRRs, which provide the ability to manipulate the behavior of caching for a limited number of fixed physical address ranges, Page Attribute Tables allow for such behavior to be specified on a per-page basis, greatly increasing the ability of the operating system to select the most efficient behavior for any given task. I am also not familiar with mtrr. But as the code got introduced from one of our guys who unfortunately does not work for SUSE anymore, I am looking deeper at this (also because I want to know more about the topic anyway and best learning is reading code and debugging). I started reading a bit and looking at the code a bit. I am rather busy right now, give me some days and I go through the output and try to find out something. *** Bug 11714 has been marked as a duplicate of this bug. *** I have also been experiencing a kernel panic that can be suppressed either by using the acpi=ht boot parameter or by compiling without the MTRR code (CONFIG_MTRR=n) and have just closed bug #11714 as a duplicate of this one. Please let me know if there is any information I can provide that will help fix this. BIOS 207 is now available for M51TA from http://www.asustreiber.de/downloads/file/1167-m51taas-207 I am running 2.6.29rc5 with MTRR=ON (!) and the notebook boots with acpi enabled! (without acpi=off, nolapic etc.) ! Nevertheless acpi does not seem to work properly: battery doesn't last for more than 1 hour. I don't know why this new bios is not available from the asus download page (yet). (In reply to comment #64) > BIOS 207 is now available for M51TA from > http://www.asustreiber.de/downloads/file/1167-m51taas-207 > I am running 2.6.29rc5 with MTRR=ON (!) and the notebook boots with acpi > enabled! (without acpi=off, nolapic etc.) ! I tried the above BIOS and works for me too without special boot parameters and with MTRR enabled, on kernel 2.6.27. I experienced poor battery performance too, while using the laptop on battery in the past (with mtrr-disabled kernels). I suspect this may be due to some issue in handling the power state of the second video card, and in powernow-k8 not supporting (by now) voltage scaling but only frequency scaling. I tried with the latest version of the BIOS available for my motherboard (ASUS M3A-H/HDMI, version 1301) but unfortunately, I still get kernel panic unless I use acpi=ht or a kernel compiled without MTRR code. > and in powernow-k8 not supporting (by now) voltage scaling but only frequency > scaling. Either powernow-k8 can do both, frequency and voltage scaling, or it does not work at all. Why do you think frequency scaling is broken? Can you provide dmesg output when loading the powernow-k8 driver. Are you sure the battery lifetime regression comes from then new BIOS or could it be that it is related that you run a newer kernel now? > some issue in handling the power state of the second video card Hmm, mtrr and video card are more related, how this can influence power savings, I have no idea. Is there a changeset about the BIOS modifications? Is this a ATI card? Anyway running the latest binary graphics driver may improve things, Nvidia or ATI. (In reply to comment #67) > > and in powernow-k8 not supporting (by now) voltage scaling but only > frequency > > scaling. > Either powernow-k8 can do both, frequency and voltage scaling, or it does not > work at all. Why do you think frequency scaling is broken? Can you provide > dmesg output when loading the powernow-k8 driver. powernow-k8 works, but I think it doesn't support all of the hardware scaling features: e.g. minimum frequency in Linux is 600MHz while in Windows it is 550MHz (maybe northbridge scaling? maybe something related to voltage? I don't know, I'm no expert, just making some hypotheses) Here is dmesg output for powernow-k8 module insertion [ 19.919509] powernow-k8: Found 1 AMD Turion(tm) X2 Ultra Dual-Core Mobile ZM-82 processors (2 cpu cores) (version 2.20.00) [ 19.919622] powernow-k8: 0 : pstate 0 (2200 MHz) [ 19.919700] powernow-k8: 1 : pstate 1 (1100 MHz) [ 19.919737] powernow-k8: 2 : pstate 2 (600 MHz) > Are you sure the battery lifetime regression comes from then new BIOS or > could > it be that it is related that you run a newer kernel now? There is no regression in battery lifetime: in Linux it always had been of about 1hr (with or without MTRRs). I think this is not related to BIOS or kernel version, but on how some hardware is managed (again, scaling working but at higher frequencies than in Windows and maybe second video card not deactivating while on battery) > > some issue in handling the power state of the second video card > Hmm, mtrr and video card are more related, how this can influence power > savings, I have no idea. This laptop has two video cards: an integrated Ati HD3200, intended to be used when on battery and a discrete HD3650 (HD3450 for M51Tr), intended to be used on AC. With Linux, at least by now, the integrated card is always used, but I don't know if the HD3650 chip is properly turned off or it drains power even if not used. This may be a reason for poor battery life issue. > Is there a changeset about the BIOS modifications? According to http://www.asustreiber.de/downloads/file/1167-m51taas-207 the changelog for bios 207 is: * Fix the fan don't normal work after s3 resume in the power saving mode. * Update external VGA VBIOS to BR030816.001 to fix 3D mark will hang with Hynix VRAM. > Anyway running the latest binary graphics driver may improve things I have not tested yet after the bios upgrade, but unfortunately recent fglrx releases either crash my Xorg or have other issues preventing me to use them (now I'm using the quite old fglrx 8.7 or radeon open-source driver depending on needing 3d acceleration or kernel-with-wireless-working). I'll try them again ASAP, as the bios changelog seems to update video bios too. I suggest to change the Linux code to ignore RdMem/WrMem of fixed MTRR registers and not to sync them between CPUs. I've found this old thread http://lkml.org/lkml/2007/4/3/110 as a rationale of introducing the code which is causing the problems described here. From this thread it seems that the problematic system had the SYSCFG[MtrrFixDramModEn] bit set on CPU0 but not on CPU1. Thus the RdMem/WrMem bits were only reported on CPU0. My assumption is that if SYSCFG[MtrrFixDramModEn] is cleared the fixed MTRR contents will not differ on the CPUs. Linux won't try to change the settings on CPU1 and the system boots normally -- hopefully ;-) Created attachment 20412 [details]
[PATCH] x86: mtrr: don't modify RdDram/WrDram bits of fixed MTRRs
Patch is against Linus' git tree as of today (v2.6.29-rc6-305-g2450cf5).
Paul, can you verify whether this allows to boot your system even with CONFIG_MTRR=y.
Thomas, can you verify whether this doesn't break suspend/resume on the Ferrari
model(s) that Bernhard Kaindl mentioned in the old LKML thread.
Thanks!
WRT comment #68: > powernow-k8 works, but I think it doesn't support all of the hardware scaling >features: e.g. minimum frequency in Linux is 600MHz while in Windows it is >550MHz >Here is dmesg output for powernow-k8 module insertion >[ 19.919622] powernow-k8: 0 : pstate 0 (2200 MHz) >[ 19.919700] powernow-k8: 1 : pstate 1 (1100 MHz) >[ 19.919737] powernow-k8: 2 : pstate 2 (600 MHz) Oops, this is most probably a bug in the powernow-k8 driver. Here is what I get on a Turion Ultra using x86info Pstate-0: fid=e, did=0, vid=24 (2200MHz) (current) Pstate-1: fid=e, did=1, vid=30 (1100MHz) Pstate-2: fid=e, did=2, vid=3c (550MHz) On the same system powernow-k8 reports: powernow-k8: 0 : pstate 0 (2200 MHz) powernow-k8: 1 : pstate 1 (1100 MHz) powernow-k8: 2 : pstate 2 (600 MHz) I'll look at that. > I have not tested yet after the bios upgrade, but unfortunately recent fglrx
> releases
The recent fglrx release should help a lot concerning battery life.
FYI, the 600MHz value for Pstate2 is from an ACPI table (containing _PSS (Performance Supported States) objects). For each Pstate it contains also the frequency, but the family 11h BKDG says about this value "CoreFreq (MHz) = The CPU COF specified by MSRC001_00[6B:64][CpuFid] rounded to the nearest 100 Mhz." This is a rounded value and explains the observed difference. I think, the powernow-k8 driver should determine the frequency from the Pstate MSR directly and not from this ACPI table. (BTW, the tool x86info determines the frequencies directly from the Pstate MSRs.) > I think, the powernow-k8 driver should determine the frequency from the
> Pstate
> MSR directly and not from this ACPI table.
Hmm, I'd vote to keep taking the ACPI values. If the value is wrong you have a BIOS "bug". One of the basic ideas of ACPI is to stay HW independent. Intel and AMD HW of the last years are both using the same code which is good. You end up checking for several CPUIDs implementing different ways to obtain the values in some years.
On the other hand side the PowerNow! Windows driver seem to read the MSR to get the info (which is really bad!), so it's likely we see more broken BIOSes in this area. Hmm, we recently enhanced the powernow-k8 driver to take latency values from ACPI tables (exactly the same table) and Mark confirmed that the Windows driver is doing the same. I somehow cannot believe they don't trust the one value(s), but take the others. It would be great if you (maybe Mark knows details) could find out why Windows does not show the ACPI value. If it's reasonable we could make the change, but just do it because Windows shows other values sounds a bit hasty (maybe this is just a simple monitor tool not able to access ACPI tables not the Windows PowerNow! driver?).
(In reply to comment #74) > > I think, the powernow-k8 driver should determine the frequency from the > Pstate > > MSR directly and not from this ACPI table. > Hmm, I'd vote to keep taking the ACPI values. If the value is wrong you have > a > BIOS "bug". [...] > On the other hand side the PowerNow! Windows driver seem to read the MSR to > get > the info (which is really bad!) [...] > It would be great if you (maybe Mark knows > details) could find out why Windows does not show the ACPI value. If it's > reasonable we could make the change, but just do it because Windows shows > other > values sounds a bit hasty (maybe this is just a simple monitor tool not able > to > access ACPI tables not the Windows PowerNow! driver?). The monitor tool is AMD Power Monitor, so I suppose it knows how to monitor the CPU... ;) I agree that doing the change "because Windows shows other values" is nasty, but I think running a processor "out-of-spec" (or "constantly overclocked"), with subsequent thermal, power and noise problems is somewhat nastier... Well, the HW is not overclocked or something like that under Linux. The frequency/voltage settings for each Pstate are obtained from the corresponding Pstate register when the Pstate is changed. It is just that powernow-k8 reports wrong frequencies whenever a frequency is rounded to 100 MHz, e.g. 550 MHz => ACPI reports 600 MHz and thus Linux thinks CPU clock is 600 MHz instead of 550MHz), Furthermore this involves that the corresponding frequencies in /sys/devices/system/cpu/cpu0/cpufreq and the "cpu MHz" in /proc/cpuinfo are incorrect. Not sure what this means for time keeping if clocksource is TSC and Pstate0 frequency were affected (say reporting 2100 Mhz instead of 2050 MHz). In any case, I think we should fix this for sake of correct frequency reporting. Created attachment 20413 [details]
x86: powernow-k8: determine exact CPU frequency for HW Pstates
(I know, this should be attached to and handled in a new bugzilla, but ...)
The frequency values reported by ACPI PSS objects are incorrect in some cases.
AMD Family 10h Processor BKDG says about CoreFreq (i.e. the frequency
as reported in the ACPI table)
"All CoreFreq values must be rounded to the nearest 100 MHz frequency
resulting in a maximum of 50 MHz frequency difference between the
reported CoreFreq and calculated CPU COF."
Patch is against Linus' git as of today (v2.6.29-rc6-305-g2450cf5).
> Furthermore this involves that the corresponding frequencies in
> /sys/devices/system/cpu/cpu0/cpufreq and the "cpu MHz" in /proc/cpuinfo are
> incorrect.
> In any case, I think we should fix this for sake of correct frequency
> reporting.
This is really ugly, but I tend to agree.
(In reply to comment #70) > Created an attachment (id=20412) [details] > [PATCH] x86: mtrr: don't modify RdDram/WrDram bits of fixed MTRRs > > Patch is against Linus' git tree as of today (v2.6.29-rc6-305-g2450cf5). > > Paul, can you verify whether this allows to boot your system even with > CONFIG_MTRR=y. I can confirm that this patch allows me to boot with the MTRR code, without acpi=ht and without kernel panic. (I get a couple of messages about clearing bits flashing up at the beginning of the boot. It happened so fast I didn't catch the whole message.) Please let me know if I should be trying any other patches. Cheers, Paul Paul, can you please provide the dmesg when booted with the patch. I'd expect to see the message "MTRR: CPU 0: SYSCFG[MtrrFixDramModEn] not cleared, clearing this bit" at least once. But I like know how often this message shows up on your system. Thanks. Created attachment 20425 [details]
dmesg output for 2.6.29 kernel with patch 20412
This dmesg output contains the following 3 lines:
[ 0.000000] MTRR: CPU 0: SYSCFG[MtrrFixDramModEn] not cleared, clearing this bit
[ 0.010000] MTRR: CPU 1: SYSCFG[MtrrFixDramModEn] not cleared, clearing this bit
[ 0.010000] MTRR: CPU 2: SYSCFG[MtrrFixDramModEn] not cleared, clearing this bit
Paul
So your BIOS did not clear the bit on any CPU. Furthermore I've found [ 0.460294] mtrr: your CPUs had inconsistent fixed MTRR settings [ 0.460294] mtrr: probably your BIOS does not setup all CPUs. [ 0.460294] mtrr: corrected configuration. which means that memory (i.e. caching) type of some fixed memory ranges still differs between CPUs and Linux synced the settings. But this time without touching the RdMem/WrMem bits. I am curious how the fixed MTRRs look like and will send a patch to collect this debug information. Created attachment 20427 [details]
debug patch to print fixed mtrr settings
Paul, please apply this patch on top of the other one (attachement #20412) and
provide dmesg output after boot. Thanks!
Created attachment 20429 [details]
dmesg output for 2.6.29 kernel with patch 20427
Thanks for that debug data.
This shows that, the mainline kernel w/o my patch (attachment #20412 [details])
is (potentially(*)) changing RdMem/WrMem bits on CPUs 1 and 2 from:
[ 0.010000] MTRRfixed(268): 0x1818181818181818
[ 0.010000] MTRRfixed(269): 0x1818181818181818
...
[ 0.010000] MTRRfixed(26e): 0x1818181818181818
[ 0.010000] MTRRfixed(26f): 0x1818181818181818
(i.e. reads and writes to the specified ranges are going to DRAM)
to (see first output for CPU 0):
[ 0.000000] MTRRfixed(268): 0x1010101010101010
[ 0.000000] MTRRfixed(269): 0x1010101010101010
...
[ 0.000000] MTRRfixed(26e): 0x1515151515151515
[ 0.000000] MTRRfixed(26f): 0x1515151515151515
This in turn means that reads got to DRAM but writes go to IO.
We just do not know what the BIOS has performed on CPU0 with
those RdMem/WrMem settings, maybe it is changing that later on.
But as we don't know it we shouldn't touch those bits.
(*) "potentially" because my debug patch did not show RdMem/WrMem bits
when fixed MTRR were set on CPU2 and CPU3.
Furthermore the mainline kernel w/ and w/o my patch
did change the memory type for following ranches on
CPUs 2 and 3:
[ 0.010000] MTRRfixed(26e): 0x1818181818181818
[ 0.010000] MTRRfixed(26f): 0x1818181818181818
which was type 0 (uncached) (when masking the RdMem/WrMem bits)
to
[ 0.130000] MTRRfixed(26e): 0x0505050505050505
[ 0.130000] MTRRfixed(26f): 0x0505050505050505
i.e. type 5 (write-protect).
If there are no objections I'll submit
"[PATCH] x86: mtrr: don't modify RdDram/WrDram bits of fixed MTRRs"
for upstream integration.
It prevents my system from going straight into kernel panic without having to disable acpi or compile a custom kernel without MTRR - so I certainly have no objections. (However, I'm not qualified to comment on what effect this patch might have for other people, so you should probably make sure you're satisfied that no one else has any objections before proceeding.) I can confirm that with Andreas' patch from comment #70, the Ferrari 1000 can still be suspend/resumed successfully. This was what the offending patch did fix -> thumbs up... Ok, nice to hear. I've just submitted the patch to x86 maintainers for upstream integration. Thanks Thomas (and all others) for testing! *** Bug 12352 has been marked as a duplicate of this bug. *** Patch (equivalent to attachment #20412 [details]) to fix this issue is on its way into mainline kernel. (see http://marc.info/?l=linux-kernel&m=123693623703978) I expect that the fix will be integrated into 2.6.29.1. Oh, I forgot to thank all of You, who help, very, very much! PS. I know, refreshing old thread is not compatibile with netiquette, but I'm very glad to see working kernel on my notebook ;) Just upgraded to Ubuntu Jaunty and got the latest kernel. It's working for me! Thank you. |