Bug 18482 - GA-MA69VM-S2 requires pci=nocrs with 2.6.35.4 (HPET)
Summary: GA-MA69VM-S2 requires pci=nocrs with 2.6.35.4 (HPET)
Status: RESOLVED INSUFFICIENT_DATA
Alias: None
Product: Drivers
Classification: Unclassified
Component: PCI (show other bugs)
Hardware: All Linux
: P1 normal
Assignee: Bjorn Helgaas
URL: http://lkml.org/lkml/2010/9/9/92
Keywords:
Depends on:
Blocks:
 
Reported: 2010-09-14 14:11 UTC by Bjorn Helgaas
Modified: 2012-08-13 16:32 UTC (History)
5 users (show)

See Also:
Kernel Version: 2.6.35.4
Subsystem:
Regression: Yes
Bisected commit-id:


Attachments
dmesg (hang) (34.61 KB, text/plain)
2010-09-14 14:14 UTC, Bjorn Helgaas
Details
dmesg (with pci=nocrs, successful) (121.70 KB, text/plain)
2010-09-14 14:15 UTC, Bjorn Helgaas
Details
Everest log (40.67 KB, application/gzip)
2010-09-14 15:57 UTC, Frédéric L. W. Meunier
Details
hpet debug patch (840 bytes, patch)
2010-09-14 16:26 UTC, Bjorn Helgaas
Details | Diff
hpet debug patch (v2) (810 bytes, patch)
2010-09-14 17:56 UTC, Bjorn Helgaas
Details | Diff
dmesg from 2.6.35.4 (36.21 KB, text/plain)
2010-09-14 18:15 UTC, Frédéric L. W. Meunier
Details
hpet test patch (2.01 KB, patch)
2010-09-14 23:01 UTC, Bjorn Helgaas
Details | Diff
make PCI HPET immovable (2.75 KB, patch)
2010-10-05 16:11 UTC, Bjorn Helgaas
Details | Diff

Description Bjorn Helgaas 2010-09-14 14:11:08 UTC
Simon Arlott reports:

> With a Gigabyte GA-MA69VM-S2 690V motherboard, it stalls in
> quirk_usb_early_handoff unless pci=nocrs is used:

...
[  299.177004] pci 0000:00:13.0: pci_apply_final_quirks
[  299.177004] pci 0000:00:13.0: calling quirk_cardbus_legacy+0x0/0x21
[  299.177004] pci 0000:00:13.0: calling quirk_usb_early_handoff+0x0/0x5dd

I'll attach the dmesg log Simon provided.  From them, it looks like the
HPET appears as a PCI device, in addition to being described in the HPET
table (and possibly in the ACPI namespace):

[    0.000000] ACPI: HPET id: 0x10b9a201 base: 0xfed00000
[    0.454232] ACPI: PCI Root Bridge [PCI0] (domain 0000 [bus 00-ff])
[    0.461348] pci_root PNP0A03:00: host bridge window [io  0x0000-0x0cf7]
[    0.468004] pci_root PNP0A03:00: host bridge window [io  0x0d00-0xffff]
[    0.474004] pci_root PNP0A03:00: host bridge window [mem 0x000a0000-0x000bffff]
[    0.481004] pci_root PNP0A03:00: host bridge window [mem 0x000c0000-0x000dffff]
[    0.489004] pci_root PNP0A03:00: host bridge window [mem 0x80000000-0xfebfffff]
[    1.034049] pci 0000:00:14.0: no compatible bridge window for [mem 0xfed00000-0xfed003ff]
[    1.651998] pci 0000:00:14.0: BAR 1: assigned [mem 0x80000000-0x800003ff]

Since the 00:14.0 BAR looks invalid (it's outside all the host bridge
windows), we moved it to 0x80000000, and that probably broke the HPET
driver, which still thinks it's at 0xfed00000.

If anybody could provide the output of Everest (free trial version at
http://www.lavalys.com/), it would be a great help.
Comment 1 Bjorn Helgaas 2010-09-14 14:14:43 UTC
Created attachment 29982 [details]
dmesg (hang)

Here's the failing console log.
Comment 2 Bjorn Helgaas 2010-09-14 14:15:28 UTC
Created attachment 29992 [details]
dmesg (with pci=nocrs, successful)

Here's the successful boot with "pci=nocrs".
Comment 3 Frédéric L. W. Meunier 2010-09-14 15:57:31 UTC
Well, I have the same motherboard since 03/2008. It works fine on Windows (used Vista and 7). On Linux, I always had a small problem. If I boot to Windows, before I reboot to Linux, I need to go to the BIOS and change the "USB Mouse" setting from "Yes" to "No" or "No" to "Yes". Otherwise, I get "ohci_hcd 0000:00:13.0: Unlink after no-IRQ?  Controller is probably using the wrong IRQ." and the mouse won't work at all. And the keyboard, if it's USB.

Anyway, I don't see any "pci_apply_final_quirks" in my 2.6.35.4 dmesg (but have # CONFIG_HPET is not set), but am attaching the Everest output, if it's of any use for your problem.
Comment 4 Frédéric L. W. Meunier 2010-09-14 15:57:44 UTC
Created attachment 30012 [details]
Everest log
Comment 5 Bjorn Helgaas 2010-09-14 16:26:16 UTC
Created attachment 30032 [details]
hpet debug patch

Thank you very much, Frédéric!  Windows shows the PNP0103 ACPI device at
0xfed00000, but it shows no resources for the PCI device:

    ACPI
      Hardware ID                                       ACPI\PNP0103
      PnP Device                                        High Precision Event Timer
      IRQ                                               00
      IRQ                                               08
      Memory                                            FED00000-FED003FF

    PCI
      Location Information                              @system32\DRIVERS\pci.sys,#65536;PCI bus %1, device %2, function %3;(0,20,0)
      PCI Device                                        ATI SB600 - SMBus Controller

If it's possible, could you turn on CONFIG_HPET and CONFIG_PNP_DEBUG_MESSAGES,
apply this patch, boot with "pnp.debug pci=nocrs", and attach the dmesg log
from Linux?

I think we need to make Linux ignore that 00:14.0 PCI BAR, either by
fixing the existing quirk or by doing something more generic.

I'm sorry that I don't have an idea about the USB issue.  That sounds like
a topic for a different bug report.
Comment 6 Frédéric L. W. Meunier 2010-09-14 17:26:11 UTC
I already had CONFIG_PNP_DEBUG_MESSAGES enabled. I turned on CONFIG_HPET, applied your patch, booted with pnp.debug pci=nocrs, but it reboots after 2-3 seconds.
Comment 7 Bjorn Helgaas 2010-09-14 17:56:28 UTC
Created attachment 30052 [details]
hpet debug patch (v2)

Oh, I'm sorry, that patch had some junk in it that I didn't intend.
Can you try this one instead?
Comment 8 Frédéric L. W. Meunier 2010-09-14 18:15:58 UTC
Created attachment 30062 [details]
dmesg from 2.6.35.4

dmesg from 2.6.35.4. Yes, it's 2.6.35.4 with no EXTRAVERSION.
Comment 9 Rafael J. Wysocki 2010-09-14 19:30:54 UTC
What's the last known good kernel?
Comment 10 Bjorn Helgaas 2010-09-14 23:01:42 UTC
Created attachment 30092 [details]
hpet test patch

It's interesting that it fails for Simon, but works for Frédéric, even
though you both have the same motherboard.  Maybe there's a BIOS difference.

The HPET spec says "[the HPET] is not implemented as a standard PCI function,"
and "It is not expected that the OS will move the location of these timers
once it is set by the BIOS."

Given that we have examples where the HPET apparently *is* implemented as a
PCI function, and the fact that the HPET table provides no way to associate
an Event Timer Block with a PCI function, I think we might have to just check
every PCI function we discover against what we found in the HPET table so
we can keep from moving it.

This patch is a stab at that.  Can anybody test it?
Comment 11 Simon Arlott 2010-09-20 19:59:25 UTC
(In reply to comment #3)
> Anyway, I don't see any "pci_apply_final_quirks" in my 2.6.35.4 dmesg (but
> have

This was some debugging I added by raising the level of the PCI quirks logging.

(In reply to comment #9)
> What's the last known good kernel?

2.6.33-rc6

(In reply to comment #10)
> It's interesting that it fails for Simon, but works for Frédéric, even
> though you both have the same motherboard.  Maybe there's a BIOS difference.

I did test this without any HCDs compiled in and it still did it, but if it's relevant I have EHCI disabled.

dmidecode -t 0 -t 2:
# dmidecode 2.10
SMBIOS 2.4 present.

Handle 0x0000, DMI type 0, 24 bytes
BIOS Information
	Vendor: Award Software International, Inc.
	Version: F8
	Release Date: 03/03/2008
	Address: 0xE0000
	Runtime Size: 128 kB
	ROM Size: 512 kB
	Characteristics:
		...

Handle 0x0002, DMI type 2, 8 bytes
Base Board Information
	Manufacturer: Gigabyte Technology Co., Ltd.
	Product Name: GA-MA69VM-S2
	Version: x.x
	Serial Number:
Comment 12 Bjorn Helgaas 2010-09-20 21:04:19 UTC
Frédéric has BIOS version F10C, and his system works fine even without
my patch (well, he does see a USB issue -- bug 18532, but I don't think
that's related to this HPET issue).  So I think something was fixed in
the BIOS between versions F8 and F10C.

Simon, I guess with my patch (attachment 30092 [details]), you system still hangs
during boot ... right?  I assume it still boots with "pci=nocrs"; could
you attach the dmesg log from that boot?  If there's some way to capture
the log from a failing boot (use "ignore_loglevel" to get more info), such
as a serial console, netconsole, or video, that would be very helpful.
Comment 13 Bjorn Helgaas 2010-09-22 19:47:16 UTC
Simon, can you please attach the dmesg log from the test with the patch
(attachment 30092 [details])?  Thanks!
Comment 14 Simon Arlott 2010-09-22 19:52:32 UTC
I still haven't tried it yet; as I stated before I don't intend to reboot any time soon.
Comment 15 Bjorn Helgaas 2010-10-05 16:11:36 UTC
Created attachment 32612 [details]
make PCI HPET immovable

Please test this instead of the previous version.

Note You need to log in before you can comment on or make changes to this bug.