When booting a 2.6.33rc[567] kernel it stops and hangs, if I press a key (even the soft power button) it continues. As long as I feed it interrupts it continues happily. Anyways, bisecting it git found: 73472a46b5b28116b145fb5fc05242c1aa8e1461 x86: Disable HPET MSI on ATI SB700/SB800 Reverting that from 2.6.33rc7 makes it work for me for the last week atleast.. also, that is the same chipset I have :) Hope this helps.. If you have a fix you want to test, I'll test it.
Asbjørn: From the bisection point, I assume you have an ATI SB700/SB800 board? Also do the hangs continue to occur after boot, or just at boot time? Could you attach dmesg output as well as "cat /sys/devices/system/clocksource/clocksource0/current_clocksource" output once the system is up? Venkatesh: Any thoughts here? Shouldn't the hpet work ok without msi?
Hmm. This is strange. Without the patch the clockevents will be HPET 0 used for platform timer in legacy mode, and should also be the broadcast timer APIC timers used on each CPU and used as percpu clockevent HPET2 will be used as MSI per CPU timers and attached to CPU 0. This is meant to be used as per CPU timer so that we don't need to use APIC timer broadcast. IIRC, there are no HPET3, HPET4, etc supported on this platform. With this patch HPET 0 used for platform timer in legacy mode, and should also be the broadcast timer APIC timers used on each CPU and used as percpu clockevent So, if the system has a problem, that means broadcast logic has a problem or HPET 0 also has some problem on this system. Some data that will be interesting - Does the system support deep C-states? Otherwise, things whould just work fine with APIC timer, without ever needing HPET. (grep . /sys/devices/system/cpu/cpu*/cpuidle/state*/* ) - How does /proc/interrupts look with and without this patch, after complete bootup - dmesg as john mentioned, with and without the patch. - Can you also try hpet=disable boot option and see whether that makes any difference in the failing case.
First-Bad-Commit : 73472a46b5b28116b145fb5fc05242c1aa8e1461 Bisected to: commit 73472a46b5b28116b145fb5fc05242c1aa8e1461 Author: Pallipadi, Venkatesh <venkatesh.pallipadi@intel.com> Date: Thu Jan 21 11:09:52 2010 -0800 x86: Disable HPET MSI on ATI SB700/SB800 Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Created attachment 25022 [details] dmesg of plain 2.6.33rc8 with problem
Created attachment 25023 [details] dmesg of plain 2.6.33rc8 with hpet=disabled and problem
Created attachment 25024 [details] dmesg of 2.6.33rc8 with 73472a46b5b28116b145fb5fc05242c1aa8e1461 reverted, works
It often occurs more than once during boot, when there are lots of things running it seems to not happen, altough it almost always happen again during shutdown where I have to help it by pressing a key or similar. Using hpet=disable does not help and the exact same thing happens. Clock sources (/sys/devices/system/clocksource/clocksource0/current_clocksource): plain 2.6.33rc8: tsc 2.6.33rc8 hpet=disable: acpi_pm 2.6.33rc8 with 73472a.. reverted: tsc
Created attachment 25025 [details] my kernel config
Created attachment 25026 [details] /proc/interrupts in plain 2.6.33rc8 (not working)
Created attachment 25027 [details] /proc/interrupts in plain 2.6.33rc8 with 73472a46b5b28116b145fb5fc05242c1aa8e1461 reverted (working)
Could not find any C states, seems I do not have a cpuidle directory within /sys/devices/system/cpu/cpu*/, so : $ grep . /sys/devices/system/cpu/cpu*/cpuidle/state*/* grep: /sys/devices/system/cpu/cpu*/cpuidle/state*/*: No such file or directory thank you for looking into it :)
Similar symptoms here with the Fedora 33 rc kernels around rc5 and later, haven't tried reverting the patch in question yet. ATI SB700/SB800 (Gigabyte GA-MA78GM-S2H)
May be related to C1E. Can you try idle=halt boot option on the not-working kernel (without hpet=disable option). Andreas: Have you seen any issues like this. Looks like IRQ0 does not work correctly here, both with HPET and PIT 0: 351 11 36077 745 IO-APIC-edge timer And the platform needs this 24: 11040 0 0 0 HPET_MSI-edge hpet2 to function correctly.
idle=halt does not go very far here. First there is a stall that's broken by key or mouse movement and then it seems to lock up after NET: Registered protocol family 1" which in a normal fedora boot is usually followed by pci 0000:01:05.0: Boot video device Trying to unpack rootfs image as initramfs... .... FWIW stable 2.6.32.8 now also exhibits this, assuming it has to do with the hpet patch being pulled there. Behavior the same as 33rc 2.6.32.7 with idle=halt has the same single stall but once broken with key or mouse it goes on without further problems.
Can you please provide output of "lspci -nn -xxxx" from your system. Thanks.
Created attachment 25090 [details] GA-MA78GM-S2H lspci
I've checked the lspci output for HPET relevant settings. Everything seems ok (e.g. interrupt routing). I've also checked the code and at the moment it's not clear to me why your system shows these hickups. To gather more debug information it would be nice if you could boot with apic=debug parameter (to double check IO-APIC/PIC settings) - both in the working and non-working case. And while doing this you can also add "hpet=verbose" to your command line. Maybe that gives some insight what's wrong with the latest kernel on your system. And another question. Do you have the latest BIOS for your system installed?
idle=halt makes it stop for me too. I got a GA-MA790FXT-UD6P (BIOS version F7), which is the second to newest.
Created attachment 25098 [details] GA-MA790FXT-UD5P lspci
Created attachment 25099 [details] dmesg of plain 2.6.33rc8 with apic=debug, hpet=verbose and problem on a GA-MA790FXT-UD5P
Created attachment 25100 [details] dmesg of 2.6.33rc8 with apic=debug, hpet=verbose and 7347.. reverted on GA-MA790FXT-UD5P
Created attachment 25101 [details] 2.6.32.8 on GA-MA78GM-S2H with apic=debug and hpet=verbose (problem)
Created attachment 25102 [details] GA-MA78GM-S2H lspci verbose under 2.6.32.8 (problem) I realized that the previous lspci was done under an ok (2.6.32.7) kernel, here is with the problem one.
Created attachment 25103 [details] 2.6.32.7 on the same hardware with apic=debug and hpet=verbose (no problem)
Running with the latest BIOS for this board (rev 1.0, F11)
I am wondering why for both test runs there is this log message: hpet: hpet_msi_capability_lookup(591): hpet: hpet_msi_capability_lookup(596): The respective hpet_print_config() call is static void hpet_msi_capability_lookup(unsigned int start_timer) { unsigned int id; unsigned int num_timers; unsigned int num_timers_used = 0; int i; if (hpet_msi_disable) return; if (boot_cpu_has(X86_FEATURE_ARAT)) return; id = hpet_readl(HPET_ID); num_timers = ((id & HPET_ID_NUMBER) >> HPET_ID_NUMBER_SHIFT); num_timers++; /* Value read out starts from 0 */ hpet_print_config(); ... } but that should never be reached, if hpet_msi_disable is set. I guess the respective quirk should be added to arch/x86/kernel/early-quirks.c instead of arch/x86/kernel/quirks.c. Patch to fix this follows asap.
Created attachment 25176 [details] x86, hpet: disable MSI on ATI SBX00
Can you please verify whether this patch on top of commit 73472a46b5b28116b145fb5fc05242c1aa8e1461 or current git (v2.6.33-rc8-189-g9f3a628) fixes this issue on your box? Thanks.
Created attachment 25177 [details] apic=debug hpet=vervose with attachment 25176 [details] applied Didn't seem to help here. verbose log of 2.6.33-0.51.rc8.git6.fc14.x86_64 + patch local rebuild attached.
hm, so from what I have gathered the previously bisected commit only exposes another bug (hidden by enabling hpet). So I started bisecting again, this time with hpet=disabled all the way and found: commit aa276e1cafb3ce9d01d1e837bcd67e92616013ac Author: Thomas Gleixner <tglx@linutronix.de> Date: Mon Jun 9 19:15:00 2008 +0200 x86, clockevents: add C1E aware idle function Reverting this from a non-working 2.6.27 makes it work also. Things have changed considerably since then so I was not able to revert it from the newest kernel. Maybe that disabling of SBX00 hpet msi, only should be done when you do actually have floppy support? Makes more people boot atleast :P
[Swithing to e-mail, please keep bugzilla-daemon in the CC list.] On Monday 01 March 2010, bugzilla-daemon@bugzilla.kernel.org wrote: > http://bugzilla.kernel.org/show_bug.cgi?id=15289 > > --- Comment #30 from Asbjørn Sannes <kernelbugzilla@sannes.org> 2010-03-01 > 06:17:06 --- > hm, so from what I have gathered the previously bisected commit only exposes > another bug (hidden by enabling hpet). So I started bisecting again, this > time > with hpet=disabled all the way and found: > > commit aa276e1cafb3ce9d01d1e837bcd67e92616013ac > Author: Thomas Gleixner <tglx@linutronix.de> > Date: Mon Jun 9 19:15:00 2008 +0200 > > x86, clockevents: add C1E aware idle function > > Reverting this from a non-working 2.6.27 makes it work also. Things have > changed considerably since then so I was not able to revert it from the > newest > kernel. > > Maybe that disabling of SBX00 hpet msi, only should be done when you do > actually have floppy support? Makes more people boot atleast :P Thomas, it looks like something's missing in our C1E handling. Can you have a look at this bug report, please? Rafael
Not a recent regression, so dropping from the list.
I agree that it may not be a recent regression that is the root cause, but it certainly is a regression in the sense that less people will be able to boot 2.6.33 than 2.6.32 (altough more people will be able to use their floppies flawlessly). It is a tradeoff, just my two cents..
On Mon, 1 Mar 2010, Rafael J. Wysocki wrote: > [Swithing to e-mail, please keep bugzilla-daemon in the CC list.] > > On Monday 01 March 2010, bugzilla-daemon@bugzilla.kernel.org wrote: > > http://bugzilla.kernel.org/show_bug.cgi?id=15289 > > > > --- Comment #30 from Asbjørn Sannes <kernelbugzilla@sannes.org> 2010-03-01 > 06:17:06 --- > > hm, so from what I have gathered the previously bisected commit only > exposes > > another bug (hidden by enabling hpet). So I started bisecting again, this > time > > with hpet=disabled all the way and found: > > > > commit aa276e1cafb3ce9d01d1e837bcd67e92616013ac > > Author: Thomas Gleixner <tglx@linutronix.de> > > Date: Mon Jun 9 19:15:00 2008 +0200 > > > > x86, clockevents: add C1E aware idle function > > > > Reverting this from a non-working 2.6.27 makes it work also. Things have > > changed considerably since then so I was not able to revert it from the > newest > > kernel. > > > > Maybe that disabling of SBX00 hpet msi, only should be done when you do > > actually have floppy support? Makes more people boot atleast :P > > Thomas, it looks like something's missing in our C1E handling. Can you have > a look at this bug report, please? Groan. We have been through that exercise of blaming the above commit and the C1E handling for a couple of times now. It never turned out to be the real culprit. Looking at the various steps Asbjorn took to analyse that problem it simply boils down to the oldest problem with timers on ATI chipsets: the irq0 timer interrupt routing is hosed I have no clue yet, why this is not detected by the test logic we have in place for that, but it might be something which gets borked later in the boot process. Enabling MSI for HPET just papers over the problem as it uses a different interrupt vector and mechanism. Disabling HPET does not help simply because PIT is using IRQ0 as well as the MSI disabled HPET. I need some sleep to come up with a reasonable method to debug that, but maybe someone else has an brilliant idea before I have to twist my brain around it. Thanks, tglx
Asbjørn, can you please run the following tests and report the results ? 1) Add "maxcpus=1" to the kernel command line 2) Add "nomsi" to the kernel command line Thanks, tglx
Looking again on the dmesg output, I am now wondering why APIC IDs are not unique: CPUs 0,...,3 have lapic_ids of 0, ..., 3 The IOAPIC has IO APIC #2...... .... register #00: 00000000 ....... : physical APIC id: 00 I think that each APIC device must have a unique ID (both local APICs and IO APICs). Still have to look into the Linux code, how it copes with that.
Thomas: 1) Boots 2) No difference 1 + 2 boots ..
I assume booting with maxcpus=2 also works as this would result in a system with 3 unique APIC IDs ...
Created attachment 25326 [details] test patch to set physical APIC ID of IOAPIC(0) to 4 Can you please test this patch. (I assume this is a BIOS bug, because your BIOS should assign unique IDs to all APICs in the system.)
maxcpus=2 and maxcpus=3 works patch still had the same problem The changelog of the bios does not say it should fix any apic issues, but they do have one newer (beta) bios: http://www.gigabyte.com.tw/Support/Motherboard/BIOS_Model.aspx?ProductID=3005 Should I give that a chance, or try contacting them for a fix?
> --- Comment #36 from Andreas Herrmann <andreas.herrmann3@amd.com> 2010-03-02 > 16:44:27 --- > Looking again on the dmesg output, I am now wondering why APIC IDs are not > unique: > > CPUs 0,...,3 have lapic_ids of 0, ..., 3 > The IOAPIC has > > IO APIC #2...... > .... register #00: 00000000 > ....... : physical APIC id: 00 Good catch. Missed that ! > I think that each APIC device must have a unique ID (both local APICs and IO > APICs). Yes, we need unique IDs. Thanks, tglx
> maxcpus=2 and maxcpus=3 works > patch still had the same problem Hmm, unfortunate. > The changelog of the bios does not say it should fix any apic issues, > but they do have one newer (beta) bios: http://www.gigabyte.com.tw/Support/Motherboard/BIOS_Model.aspx?ProductID=3005 By changelog you mean the short description, right? That's quite sparse. > Should I give that a chance, or try contacting them for a fix? Informing them about the apic IDs which are reported as ACPI: LAPIC (acpi_id[0x00] lapic_id[0x00] enabled) ACPI: LAPIC (acpi_id[0x01] lapic_id[0x01] enabled) ACPI: LAPIC (acpi_id[0x02] lapic_id[0x02] enabled) ACPI: LAPIC (acpi_id[0x03] lapic_id[0x03] enabled) ACPI: LAPIC_NMI (acpi_id[0x00] dfl dfl lint[0x1]) ACPI: LAPIC_NMI (acpi_id[0x01] dfl dfl lint[0x1]) ACPI: LAPIC_NMI (acpi_id[0x02] dfl dfl lint[0x1]) ACPI: LAPIC_NMI (acpi_id[0x03] dfl dfl lint[0x1]) ACPI: IOAPIC (id[0x02] address[0xfec00000] gsi_base[0]) IOAPIC[0]: apic_id 2, version 33, address 0xfec00000, GSI 0-23 is always a good idea. I can't say whether you should install a beta-BIOS. In general newer BIOS versions are supposed to contain bug fixes. One further test you could do is to boot with idle=mwait. If this works you could use this as a workaround with current kernels for the time being. And can you please install the package x86info and run # for i in `seq 0 3 `; do lsmsr -c $i Int; done
Works to boot with idle=mwait and it gives: # for i in `seq 0 3 `; do lsmsr -c $i Int; done IntPendingMessage = 0x00000000083400b0 IntPendingMessage = 0x00000000083400b0 IntPendingMessage = 0x00000000083400b0 IntPendingMessage = 0x00000000083400b0 Also found that disabling C1E support in the BIOS works, then: IntPendingMessage = 0x0000000000000000 IntPendingMessage = 0x0000000000000000 IntPendingMessage = 0x0000000000000000 IntPendingMessage = 0x0000000000000000 I will try out the beta bios, to see if it helps, and report back again (either way), and then try to figure out how to contact gigabyte about their bioses, which I suspect is not as easy as getting in touch with kernel developers :P
IMHO, this is clear indication that not the timer interrupt routing but the way your system (BIOS) is handling C1e is the root cause. Your system uses "SMI Initiated C1E" (bit 27 is set in the IntPendingMsg MSR) if C1e is enabled in your BIOS. That means that BIOS provides an SMM handler which has to place the system into the C1e state. I think there must be something wrong with that handler. Reporting this problem to Gigabyte and hoping to obtain a working BIOS is the only thing you can do. It doesn't make sense to try further debugging of this problem from OS perspective. HPET MSI and older kernels just seem to hide this C1e problem on your machine. Besides you should disable C1e when using latest kernels (as long as you don't have a BIOS that is properly working). The duplicate APIC IDs is an indication that your BIOS is not 100 percent correct. Maybe Gigabyte did not correctly implement support for quad-core processors and they also forgot to check state of all 4 cores in their SMM handler for C1e. But that's just a wild guess. Last thing to note: The CPU support list of your motherboard (GA-MA790FXT-UD5P) contains only AM3 CPUs. Your CPU is not listed. It also states that AM2+ Phenom II (45nm) CPUs are not supported. And your CPU is even an AMD Phenom (65nm) AM2+ CPU: AMD Phenom(tm) 9350e Quad-Core Processor stepping 03 Are your sure that this hardware configuration is really supported?
> Last thing to note: > The CPU support list of your motherboard (GA-MA790FXT-UD5P) contains only > AM3 CPUs. Your CPU is not listed. > It also states that AM2+ Phenom II (45nm) CPUs are not supported. And > your CPU is even an AMD Phenom (65nm) AM2+ CPU: > AMD Phenom(tm) 9350e Quad-Core Processor stepping 03 > > Are your sure that this hardware configuration is really supported? Some confusion here. Asbjørn's config: GA-MA790FXT-UD5P (F7 and F8C beta) with CPU0: AMD Phenom(tm) II X4 955 Processor stepping 02 My (Yanko) config: GA-MA78GM-S2H (F11) with CPU0: AMD Phenom(tm) 9350e Quad-Core Processor stepping 03
Yanko, You have a different motherboard but same CPU, same BIOS weirdness (duplicate APID IDs), similar HPET/APIC configuration (as seen in your lspci-output) and same symptoms. One difference is that Gigabyte's CPU support list for your motherboard mentions your CPU. Can you please also try to (1) boot with idle=mwait with problematic kernel, (2) check for BIOS option to toggle C1e support (3) and provide lsmsr output(*)? If all is similar to Asbjørn's observation I would give you similar recommendations. (report problem to Gigabyte and switch off C1e support in BIOS.) Thanks, Andreas (*) Install package x86info and run # for i in `seq 0 3`; do lsmsr -c $i Int -V 3; done
In reply to comment #44 and #45: Please ignore my comment about unsupported CPU. I was looking at the wrong dmesg when writing this. Sorry. Of course AMD Phenom(tm) II X4 955 Processor stepping 02 is in the CPU support list of Asbjørn's mobo. Just need some more coffee in the morning ;-)
Correcting comment #46: > You have a different motherboard but same CPU, same BIOS weirdness > (duplicate APID IDs), similar HPET/APIC configuration (as seen in your > lspci-output) and same symptoms. > One difference is that Gigabyte's CPU support list for your motherboard > mentions your CPU. That is wrong it should be: You have a different motherboard and CPU, same BIOS weirdness (duplicate APID IDs), similar HPET/APIC configuration (as seen in your lspci-output) and same symptoms.
FYI, currently there are two ways to enter C1e mode. (1) SMI initiated C1e (which is sometimes problematic as the SMM handler might do things wrong.) (2) Hardware initiated C1e (2) is supported on dual-core CPUs and with family 0x10 revision C3 also with any number of cores. That is the reason why (1) has to be used for Asbjørn's system (CPU revision C2) and for Yanko's system (revision B3). If we get more reports due to insufficient implementation of (1) it might be an option to clear bit 27 of MSRC001_0055 in c1e_idle to simply avoid usage of SMI initiated C1e.
Created attachment 25352 [details] lsmsr with idle=mwait Yes, idle=mwait or disabling "AMD C1E Support" in the BIOS work around the problem here. lsmsr output attached. I feel quite incompetent to write a bug report to GigaByte that might be taken seriously. Perhaps there is an internal AMD-GB channel that might help here, or at least I expect there to be.
*** Bug 15476 has been marked as a duplicate of this bug. ***
M79XTUD6.F8c (beta version of the bios) did not work either with C1E enabled.
FYI, I've passed the problem information on to people who are working with Gigabyte. Will see what happens. For the time being you should disable C1E in BIOS. Thanks.
Patch for a similar issue was posted to lkml: https://patchwork.kernel.org/patch/111824/
Please, can all who have observed this issue run a test with C1e enabled in BIOS and use "acpi_skip_timer_override" on the kernel command line? (Kernel version shouldn't matter -- just use a recent kernel >=2.6.32). Thanks! Further info below. For the records. Unfortunately I am still not in contact with BIOS developers from Gigabyte. But at least Gigabyte provided a mainboard for further testing/debugging. So I was able to test on a "GA-MA78LMT-US2H". My observations are: - It's no regression from .32 to .33 .32 does not properly work either During shutdown/reboot a helping key is required to continue operation - So far I did not find a kernel where "helping keys" are not required. - The APIC ID colission does not seem to cause the problem. Activating code (integrated for some other vendor) to check for collisions does not fix the problem, with a slightly modified 2.6.35.4 (32-bit) I get "IOAPIC[0]: apic_id 2 already used, trying 4" but a helping key is still needed during boot/shutdown. - HPET does not seem to be the problem it's properly configured - configuration for HPET and timer IRQ routing are correct as far as I can tell - Commit e8c534ec068af1a0845aceda373a9bfd2de62030 (x86: Fix keeping track of AMD C1E) does not solve the problem either The only workaround that I have at the moment to run all recent kernels when C1e is activated is using acpi_skip_timer_override as kernel command line parameter. This modifies routing of the timer interrupt. It will be routed via PIC and then to IO-APIC pin 0 (instead of directly routing it to IO-APIC pin 2). I don't have an explanation for this though. (Buggy BIOS?)
I've just booted fedora's 2.6.36-0.18.rc3.git1.fc15.x86_64 kernel with C1E enabled on the same Gigabyte board with the same bios as before without adding additional parameters on the command line and it didn't need a helping hand.
Thanks for this info. My system still requires the helping key during boot with vanilla kernel 2.6.36-rc3-00185-gd56557a Using acpi_skip_timer_override fixes the problem. (Somehow I am afraid that it is pure luck that your system works with the fedora kernel.)
acpi_skip_timer_override fixes the problem for me (only tested on 2.6.36rc4). This was on a "GA-MA790FXT-UD5P".
I have the problem with a Gigabyte GA-MA-790X-UD4 with an AMD Phenom II X4 955 when trying to install Fedora 14 Beta (kernel 2.6.35.4). Disabling AMD C1E in BIOS fix the problem. Using command line options like nmi_watchdog=panic,ioapic or irqpoll seems to also work around the problem.
Created attachment 35862 [details] default dmesg log with freeze on MA-790X-UD4
Created attachment 35872 [details] dmesg log without freeze on MA-790X-UD4 with nmi_watchdog=panic,ioapic Using nmi_watchdog change which timer used, so this work around the problem. Difference between default and nmi_watchdog option: ... CPU0: AMD Phenom(tm) II X4 955 Processor stepping 02 +APIC calibration not consistent with PM-Timer: 49ms instead of 100ms +APIC delta adjusted to PM-Timer: 1255722 (627847) +APIC timer registered as dummy, due to nmi_watchdog=1! calling migration_init+0x0/0x57 @ 1 ... Switch to broadcast mode on CPU3 -Total of 4 processors activated (25718.88 BogoMIPS). +Total of 4 processors activated (16084.16 BogoMIPS). +Testing NMI watchdog ... +WARNING: CPU#0: NMI appears to be stuck (0->0)! +Please report this to bugzilla.kernel.org, +and attach the output of the 'dmesg' command. + +WARNING: CPU#1: NMI appears to be stuck (0->0)! +Please report this to bugzilla.kernel.org, +and attach the output of the 'dmesg' command. + +WARNING: CPU#2: NMI appears to be stuck (0->0)! +Please report this to bugzilla.kernel.org, +and attach the output of the 'dmesg' command. + +WARNING: CPU#3: NMI appears to be stuck (0->0)! +Please report this to bugzilla.kernel.org, +and attach the output of the 'dmesg' command. sizeof(vma)=184 bytes ... +initcall init_ext3_fs+0x0/0x6f returned 0 after 130 usecs +Clockevents: could not switch to one-shot mode: lapic is not functional. +Could not switch to high resolution mode on CPU 3 +Clockevents: could not switch to one-shot mode: lapic is not functional. +Could not switch to high resolution mode on CPU 0 +Clockevents: could not switch to one-shot mode: lapic is not functional. +Could not switch to high resolution mode on CPU 2 +Clockevents: could not switch to one-shot mode: lapic is not functional. +Could not switch to high resolution mode on CPU 1 calling init_ext4_fs+0x0/0xe5 @ 1
Created attachment 35882 [details] dmesg log without freeze on MA-790X-UD4 with irqpoll With irqpoll option, it seems to work without freezing. Difference between default options: +Kernel command line: root=live:CDLABEL=Fedora-14-Beta-x86_64-Live liveimg rhgb debug initcall_debug irqpoll +Misrouted IRQ fixup and polling support enabled +This may significantly impact system performance
Yann, does acpi_skip_timer_override kernel option work around the problem on your system?
(In reply to comment #63) > Yann, > does acpi_skip_timer_override kernel option work around the problem > on your system? Yes, it does also work. Here the difference in dmesg: IOAPIC[0]: apic_id 2, version 33, address 0xfec00000, GSI 0-23 ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 dfl dfl) +ACPI: BIOS IRQ0 pin2 override ignored. ACPI: INT_SRC_OVR (bus 0 bus_irq 9 global_irq 9 low level) -ACPI: IRQ0 used by override. -ACPI: IRQ2 used by override. ACPI: IRQ9 used by override. Using ACPI (MADT) for SMP configuration information ... Setting APIC routing to flat -..TIMER: vector=0x30 apic1=0 pin1=2 apic2=-1 pin2=-1 +..TIMER: vector=0x30 apic1=0 pin1=0 apic2=-1 pin2=-1 +..MP-BIOS bug: 8254 timer not connected to IO-APIC +...trying to set up timer (IRQ0) through the 8259A ... +..... (found apic 0 pin 0) ... +....... works. CPU0: AMD Phenom(tm) II X4 955 Processor stepping 02 +APIC calibration not consistent with PM-Timer: 49ms instead of 100ms +APIC delta adjusted to PM-Timer: 1255712 (627849)
An additional data point for you. I have a Asus M4N78 BIOS 1001 using Athlon II X2 250 Enabling C1E in the bios gives desktop lockups when playing back audio e.g. aplay -vv *.wav. No errors in messages after lockup. Disabling C1E fixes the issue. Noticed the issue (!) after upgrading to Ubuntu maverick kernels 2.6.35-22 and -23 Didn't see anything about C1E in dmesg. Let me know if there are logs that are of interest.
I'm having the same problem on a Toshiba nb205-n210 netbook. acpi_skip_timer_override seems to work around, at least it boots without the need to press shift all the time.
Having the problem with the 2.6.34 kernel using OpenSUSE 11.3 x86_32 on an HP laptop. The Bug# 579932 with Novell is located at the following URL: https://bugzilla.novell.com/show_bug.cgi?id=579932 A couple workarounds consistently work (allow the system to work without extra user input) including hpet=disable and processor.max_cstate=1 from Grub. Anything I can do to test/help? Thanks.
For the record, this bug is also reported on RedHat's bugzilla as bug# 648837, see https://bugzilla.redhat.com/show_bug.cgi?id=648837
*** Bug 29952 has been marked as a duplicate of this bug. ***
I have the same problem with kernel 2.6.28 in Ubuntu 11.04. My motherboard is a Gigabyte GA-MA790XT-UD4P, BIOS version F8g (latest beta), with an AMD Phenom II X2 550 processor. With C1E disabled, the system boots fine but with C1E enabled it needs the helping key.
Another APIC timer bug on older AMD processors were fixed in kernel 2.6.39-rc6 by Boris Ostrovsky. His commit has the following title: "x86, AMD: Fix APIC timer erratum 400 affecting K8 Rev.A-E processors" https://patchwork.kernel.org/patch/746192/ Regression report by Jörg-Volker Peetz "regression due to x86 AMD ARAT feature" https://lkml.org/lkml/2011/4/24/20 Fix by Boris Ostrovsky https://lkml.org/lkml/2011/4/29/328
(In reply to comment #70) > I have the same problem with kernel 2.6.28 in Ubuntu 11.04. > > My motherboard is a Gigabyte GA-MA790XT-UD4P, BIOS version F8g (latest beta), > with an AMD Phenom II X2 550 processor. With C1E disabled, the system boots > fine but with C1E enabled it needs the helping key. There is an unfortunate typo there. I meant kernel 2.6.38 not 2.6.28.
I have a Gigabyte motherboard with a Phenom II X4 810, per DMIDECODE: Base Board Information Manufacturer: Gigabyte Technology Co., Ltd. Product Name: GA-MA78GPM-UD2H Version: x.x Serial Number: It's on the latest non-beta BIOS (F7) and my APIC ID table is the same as Andreas'. Running Ubuntu, since some mid-Lucid-cycle 2.6.32.x kernel update it's had problems with intermittent inability to boot as described above and slow suspends which were both solved by adding "clocksource=hpet" to the kernel command line (after I noticed a few "clocksource TSC unstable" DMESG lines after long pauses). Beginning with the 2.6.38.x kernels in Natty it wouldn't even boot reliably with that fix. Disabling C1E solved it (but locked the CPU to full speed), and now I've re-enabled C1E and changed my kernel command line to just "acpi_skip_timer_override", and it boots well, suspends/resumes well, and clocks down to 800Mhz when idle. This seems to be a fairly widespread Gigabyte problem, should this be added to some known-quirks list?
Is this still the case with recent kernels ?
I've just tried enabling C1E support in the bios with 3.4.2-4.fc17.x86_64 on the same board just with a slightly newer bios verion :12b , and it wouldn't even boot. Hanging somewhere around after hpet initialisation...
Also just tried to boot up with C1E support enabled and it still have the same problems on 3.4.2 from kernel.org.
thanks
I have a Gigabyte X58A-UD3R v2.0 running both FH1 (beta) but also tried the actual FH release BIOS that seems to be exhibiting this exact same problem. It boots and runs my old Debian squeeze just fine, but a new Wheezy install has the CPU #1 through #7 not responding, then reboots. none of hpet=disable, acpi_skip_timer_override or idle=mwait fixes or changes the issue. The only way I can boot is with maxcpus=1. This is obviously suboptimal. My APIC IDs seem fine: [ 0.000000] ACPI: PM-Timer IO Port: 0x408 [ 0.000000] ACPI: Local APIC address 0xfee00000 [ 0.000000] ACPI: LAPIC (acpi_id[0x00] lapic_id[0x00] enabled) [ 0.000000] ACPI: LAPIC (acpi_id[0x01] lapic_id[0x02] enabled) [ 0.000000] ACPI: LAPIC (acpi_id[0x02] lapic_id[0x04] enabled) [ 0.000000] ACPI: LAPIC (acpi_id[0x03] lapic_id[0x06] enabled) [ 0.000000] ACPI: LAPIC (acpi_id[0x04] lapic_id[0x01] enabled) [ 0.000000] ACPI: LAPIC (acpi_id[0x05] lapic_id[0x03] enabled) [ 0.000000] ACPI: LAPIC (acpi_id[0x06] lapic_id[0x05] enabled) [ 0.000000] ACPI: LAPIC (acpi_id[0x07] lapic_id[0x07] enabled) [ 0.000000] ACPI: LAPIC (acpi_id[0x08] lapic_id[0x08] disabled) [ 0.000000] ACPI: LAPIC (acpi_id[0x09] lapic_id[0x09] disabled) [ 0.000000] ACPI: LAPIC (acpi_id[0x0a] lapic_id[0x0a] disabled) [ 0.000000] ACPI: LAPIC (acpi_id[0x0b] lapic_id[0x0b] disabled) [ 0.000000] ACPI: LAPIC (acpi_id[0x0c] lapic_id[0x0c] disabled) [ 0.000000] ACPI: LAPIC (acpi_id[0x0d] lapic_id[0x0d] disabled) [ 0.000000] ACPI: LAPIC (acpi_id[0x0e] lapic_id[0x0e] disabled) [ 0.000000] ACPI: LAPIC (acpi_id[0x0f] lapic_id[0x0f] disabled) [ 0.000000] ACPI: LAPIC_NMI (acpi_id[0x00] dfl dfl lint[0x1]) [ 0.000000] ACPI: LAPIC_NMI (acpi_id[0x01] dfl dfl lint[0x1]) [ 0.000000] ACPI: LAPIC_NMI (acpi_id[0x02] dfl dfl lint[0x1]) [ 0.000000] ACPI: LAPIC_NMI (acpi_id[0x03] dfl dfl lint[0x1]) [ 0.000000] ACPI: LAPIC_NMI (acpi_id[0x04] dfl dfl lint[0x1]) [ 0.000000] ACPI: LAPIC_NMI (acpi_id[0x05] dfl dfl lint[0x1]) [ 0.000000] ACPI: LAPIC_NMI (acpi_id[0x06] dfl dfl lint[0x1]) [ 0.000000] ACPI: LAPIC_NMI (acpi_id[0x07] dfl dfl lint[0x1]) [ 0.000000] ACPI: LAPIC_NMI (acpi_id[0x08] dfl dfl lint[0x1]) [ 0.000000] ACPI: LAPIC_NMI (acpi_id[0x09] dfl dfl lint[0x1]) [ 0.000000] ACPI: LAPIC_NMI (acpi_id[0x0a] dfl dfl lint[0x1]) [ 0.000000] ACPI: LAPIC_NMI (acpi_id[0x0b] dfl dfl lint[0x1]) [ 0.000000] ACPI: LAPIC_NMI (acpi_id[0x0c] dfl dfl lint[0x1]) [ 0.000000] ACPI: LAPIC_NMI (acpi_id[0x0d] dfl dfl lint[0x1]) [ 0.000000] ACPI: LAPIC_NMI (acpi_id[0x0e] dfl dfl lint[0x1]) [ 0.000000] ACPI: LAPIC_NMI (acpi_id[0x0f] dfl dfl lint[0x1]) [ 0.000000] ACPI: IOAPIC (id[0x02] address[0xfec00000] gsi_base[0]) [ 0.000000] IOAPIC[0]: apic_id 2, version 32, address 0xfec00000, GSI 0-23 [ 0.000000] ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 dfl dfl) [ 0.000000] ACPI: INT_SRC_OVR (bus 0 bus_irq 9 global_irq 9 high level) [ 0.000000] ACPI: IRQ0 used by override. [ 0.000000] ACPI: IRQ2 used by override. [ 0.000000] ACPI: IRQ9 used by override. [ 0.000000] Using ACPI (MADT) for SMP configuration information [ 0.000000] ACPI: HPET id: 0x8086a201 base: 0xfed00000 I've tried booting with the BIOS C1E support set to on, off and auto with no change. I have also tried with the HPET set to 32 or 64 bits with no change. I have not yet tried disabling the HPET, but none of this was required with Squeeze. Is there any other information I could provide which would help?
I am not sure if this is the same as my issue, but it sounds similar. I am using a gigabyte GA-890FXA-UD5 updated to the latest BIOS, and cannot boot any version of Fedora (kernels 3.15.5-3.15.8 x86_64, or the Fedora 20 Live CD) without either acpi=off or turning off AMD C1E support in the BIOS (I have opted for the latter). With C1E on, it will lock up almost immediately, and turning off quiet boot, the text on the screen at the time does not seem helpful.
*** Bug 151671 has been marked as a duplicate of this bug. ***
I got hit by this issue on GA-MA770-UD3 too, it seems to have become more prominent since kernel 4.1 since I used to not have this issue before then (see my bisect in bug #151671). Disabling C1E or setting it to auto solves the problem (though looks like power management gets worse and the machine is quite noisy). acpi_skip_timer_override outright locks the kernel up, I see no output whatsoever after GRUB 2 tries to boot the kernel. hpet=disable and idle=mwait do not help at all (I have HPET disabled in the BIOS for a different, but perhaps related bug #68331). This is still an issue on kernel 4.7. Also, perhaps this bug report could use a better title: this is not a regression, and it specifically affects AMD 700 boards with C1E enabled.