Bug 31602

Summary: Dell 1546: Usb doesn't work at all
Product: Drivers Reporter: thuban
Component: PCIAssignee: Bjorn Helgaas (bjorn)
Status: CLOSED CODE_FIX    
Severity: high CC: alan, albcamus, bjorn, degeneracypressure, florian, greg, linux-bugs, Matt_Domsch, stuart_hayes
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: 2.6.38 Subsystem:
Regression: Yes Bisected commit-id:
Attachments: 2.6.32 dmesg log (working)
2.6.39 dmesg (broken)
quirk for AMD MMCONFIG space
updated quirk
v3 patch
my backport to maverick's 2.6.35
dmesg with 2.6.39.3 kernel

Description thuban 2011-03-21 20:10:39 UTC
I'm running on debian GNU/Linux sid amd64, and with the last kernel, usb don't work.
The boot process is very slow while it is testing usb. 
Here are the errors : 
[    4.220159] ehci_hcd 0000:00:12.2: USB bus 1 deregistered
[    4.236511] ehci_hcd 0000:00:13.2: new USB bus registered, assigned bus number 1
[    4.248317] ehci_hcd 0000:00:13.2: applying AMD SB600/SB700 USB freeze workaround
[    4.272351] ehci_hcd 0000:00:13.2: USB bus 1 deregistered
[    4.292423] ohci_hcd 0000:00:12.0: new USB bus registered, assigned bus number 1
[   12.304254] ohci_hcd 0000:00:12.0: USB HC takeover failed!  (BIOS/SMM bug)
[   12.304361] ohci_hcd 0000:00:12.0: USB bus 1 deregistered
[   12.320396] ohci_hcd 0000:00:12.1: new USB bus registered, assigned bus number 1
[   20.320259] ohci_hcd 0000:00:12.1: USB HC takeover failed!  (BIOS/SMM bug)
[   20.320366] ohci_hcd 0000:00:12.1: USB bus 1 deregistered
[   20.336421] ohci_hcd 0000:00:13.0: new USB bus registered, assigned bus number 1
[   28.352251] ohci_hcd 0000:00:13.0: USB HC takeover failed!  (BIOS/SMM bug)
[   28.352352] ohci_hcd 0000:00:13.0: USB bus 1 deregistered
[   28.368373] ohci_hcd 0000:00:13.1: new USB bus registered, assigned bus number 1
[   36.380253] ohci_hcd 0000:00:13.1: USB HC takeover failed!  (BIOS/SMM bug)
[   36.380358] ohci_hcd 0000:00:13.1: USB bus 1 deregistered
Comment 1 Greg Kroah-Hartman 2011-03-21 20:32:48 UTC
Is there an updated bios for your machine?

Did 2.6.37 work?  If so, can you run 'git bisect' and try to find the offending patch?
Comment 2 thuban 2011-03-21 21:20:23 UTC
In fact, with the 2.6.37, the problem was exactly the same, I hoped a
correction with 2.6.38, so that's why I report.

Le Mon, 21 Mar 2011 20:32:50 GMT,
bugzilla-daemon@bugzilla.kernel.org a écrit :

> https://bugzilla.kernel.org/show_bug.cgi?id=31602
> 
> 
> 
> 
> 
> --- Comment #1 from Greg Kroah-Hartman <greg@kroah.com>  2011-03-21
> 20:32:48 --- Is there an updated bios for your machine?
> 
> Did 2.6.37 work?  If so, can you run 'git bisect' and try to find the
> offending patch?
>
Comment 3 Greg Kroah-Hartman 2011-03-21 21:33:07 UTC
On Mon, Mar 21, 2011 at 09:20:25PM +0000, bugzilla-daemon@bugzilla.kernel.org wrote:
> In fact, with the 2.6.37, the problem was exactly the same, I hoped a
> correction with 2.6.38, so that's why I report.

Has it ever worked properly?

And again, is there a BIOS update for your hardware, this really looks
like a hardware problem, as the kernel is saying.
Comment 4 thuban 2011-03-22 06:40:35 UTC
Yes, it worked with previous version of the kernel on debian sid, so the
2.6.35. That's why I don't think it because of a hardware failure.

I checked for a bios update, and it is already up to date. Moreover, I
can use usb without any problem with the 2.6.32 currently installed on
my laptop.

Le Mon, 21 Mar 2011 21:33:10 GMT,
bugzilla-daemon@bugzilla.kernel.org a écrit :

> https://bugzilla.kernel.org/show_bug.cgi?id=31602
> 
> 
> 
> 
> 
> --- Comment #3 from Greg Kroah-Hartman <greg@kroah.com>  2011-03-21
> 21:33:07 --- On Mon, Mar 21, 2011 at 09:20:25PM +0000,
> bugzilla-daemon@bugzilla.kernel.org wrote:
> > In fact, with the 2.6.37, the problem was exactly the same, I hoped
> > a correction with 2.6.38, so that's why I report.
> 
> Has it ever worked properly?
> 
> And again, is there a BIOS update for your hardware, this really looks
> like a hardware problem, as the kernel is saying.
>
Comment 5 Greg Kroah-Hartman 2011-03-22 23:14:36 UTC
On Tue, Mar 22, 2011 at 06:40:35AM +0000, bugzilla-daemon@bugzilla.kernel.org wrote:
> Yes, it worked with previous version of the kernel on debian sid, so the
> 2.6.35. That's why I don't think it because of a hardware failure.

Can you do a git bisection to find the problem patch?
Comment 6 thuban 2011-03-23 06:42:25 UTC
I suppose I can, but I'll need help, or a link which explain me how to
do this :s .


Le Tue, 22 Mar 2011 23:14:37 GMT,
bugzilla-daemon@bugzilla.kernel.org a écrit :

> https://bugzilla.kernel.org/show_bug.cgi?id=31602
> 
> 
> 
> 
> 
> --- Comment #5 from Greg Kroah-Hartman <greg@kroah.com>  2011-03-22
> 23:14:36 --- On Tue, Mar 22, 2011 at 06:40:35AM +0000,
> bugzilla-daemon@bugzilla.kernel.org wrote:
> > Yes, it worked with previous version of the kernel on debian sid,
> > so the 2.6.35. That's why I don't think it because of a hardware
> > failure.
> 
> Can you do a git bisection to find the problem patch?
>
Comment 7 Jike Song 2011-03-23 06:59:45 UTC
Pleas refer to this:

http://kernelnewbies.org/Linux_Kernel_Tester%27s_Guide_Chapter4

Section 4.5
Comment 8 thuban 2011-03-27 07:38:37 UTC
Sorry, I'm late, but my internet connection wasn't good enough to
download the entire git.
So, I ran : 
git bisect start
git bisect bad
git bisect good v2.6.32

And the result is : 
Bisecting: 36664 revisions left to test after this (roughly 15 steps)
[cc41f5cede3c63836d1c0958204630b07f5b5ee7] Merge
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging-2.6

I hope I did what you needed.
Comment 9 thuban 2011-04-28 19:07:22 UTC
I tried the latest 2.6.39-rc5, but the problem doesn't seems to be solved, and because there was no answer since my last comment, I wonder if the git bisect I gave was what you were really expecting for.

I'm sorry to annoy you with my problem.
Comment 10 thuban 2011-05-03 07:50:35 UTC
Here is the complete git bisect procedure after some explanation I found. Sorry for my mistake.
Last git bisect returned : 

7bc5e3f2be32ae6fb0c74cd0f707f986b3a01a26 is the first bad commit
commit 7bc5e3f2be32ae6fb0c74cd0f707f986b3a01a26
Author: Bjorn Helgaas <bjorn.helgaas@hp.com>
Date:   Tue Feb 23 10:24:41 2010 -0700

    x86/PCI: use host bridge _CRS info by default on 2008 and newer machines
    
    The main benefit of using ACPI host bridge window information is that
    we can do better resource allocation in systems with multiple host bridges,
    e.g., http://bugzilla.kernel.org/show_bug.cgi?id=14183
    
    Sometimes we need _CRS information even if we only have one host bridge,
    e.g., https://bugs.launchpad.net/ubuntu/+source/linux/+bug/341681
    
    Most of these systems are relatively new, so this patch turns on
    "pci=use_crs" only on machines with a BIOS date of 2008 or newer.
    
    Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
    Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>

:040000 040000 525fd3f801565d9563b56b62e2971c978c395b94 3df5ce131f48cc83c34d1d45d0fc4
:040000 040000 ae952c266bf0eedff4fac6064868e1ebab998614 e0a2c7fe581e90cfe3e1491024080
:040000 040000 dac57f16f20f7eb9843dc6727a3911d5fdadb0cb 660e22e248aad862fde1181b4dc77
:040000 040000 0a13c74f2a5a1dea034e12b721cddf3ae2624fdc c7167df363b9e25a8d902ca7907c5

And git bisect log : 
git bisect start
# bad: [1be6a1f89f131e9c3d22f819ec542be9cda8c9e3] Merge git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi-rc-fixes-2.6
git bisect bad 1be6a1f89f131e9c3d22f819ec542be9cda8c9e3
# good: [22763c5cf3690a681551162c15d34d935308c8d7] Linux 2.6.32
git bisect good 22763c5cf3690a681551162c15d34d935308c8d7
# bad: [44d51a029f95d49c5c7ccd7808f81904c20c3abd] Merge branch 'davinci-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/khilman/
linux-davinci
git bisect bad 44d51a029f95d49c5c7ccd7808f81904c20c3abd
# bad: [d89b218b801fd93ea95880f1c7fde348cbcc51c5] Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6
git bisect bad d89b218b801fd93ea95880f1c7fde348cbcc51c5
# good: [b58454ec25e80fdb84e294758aeb22dd6d5ee6f9] wmi: check find_guid() return value to prevent oops
git bisect good b58454ec25e80fdb84e294758aeb22dd6d5ee6f9
# bad: [b3b1cc3ba62374e71155ba8c09ee481c3c2d923e] USB: musb: move two printk to dev_err
git bisect bad b3b1cc3ba62374e71155ba8c09ee481c3c2d923e
# good: [0d8e1d0637a4636b0c19c94ba633812421544d91] V4L/DVB: cx18: rename cx18-alsa.c
git bisect good 0d8e1d0637a4636b0c19c94ba633812421544d91
# bad: [f1dd6ad599732fc89f36fdd65a2c2cf3c63a8711] Merge branch 'upstream' of git://ftp.linux-mips.org/pub/scm/upstream-linus
git bisect bad f1dd6ad599732fc89f36fdd65a2c2cf3c63a8711
# bad: [654451748b779b28077d9058442d0f354251870d] Merge git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi-misc-2.6
git bisect bad 654451748b779b28077d9058442d0f354251870d
# bad: [37d4008484977f60d5d37499a2670c79b214dd46] Merge git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6
git bisect bad 37d4008484977f60d5d37499a2670c79b214dd46
# good: [d7930c9ef9cc67044f5ddaac54d06ca22645a012] Merge branch 'next-spi' of git://git.secretlab.ca/git/linux-2.6
git bisect good d7930c9ef9cc67044f5ddaac54d06ca22645a012
# bad: [bb8d41330ce27edb91adb6922d3f8e1a8923f727] x86/PCI: Prevent mmconfig memory corruption
git bisect bad bb8d41330ce27edb91adb6922d3f8e1a8923f727
# good: [cd81e1ea1a4cda94aa5f3e942301cf0da497c262] PCI: reject mmio ranges starting at 0 on pci_bridge read
git bisect good cd81e1ea1a4cda94aa5f3e942301cf0da497c262
# good: [f517709d65beed95f52f021b43e3035b52ef791a] ACPI / PM: Add more run-time wake-up fields
git bisect good f517709d65beed95f52f021b43e3035b52ef791a
# good: [fa27b2d108fa49685129867a8c5b968344d6e197] PCI: split up pci_read_bridge_bases()
git bisect good fa27b2d108fa49685129867a8c5b968344d6e197
# good: [2fe2abf896c1e7a0ee65faaf3ef0ce654848abbd] PCI: augment bus resource table with a list
git bisect good 2fe2abf896c1e7a0ee65faaf3ef0ce654848abbd
# bad: [cbbc0de700e61d0cdc854d435dbc2ef148de0e00] ACPI: Use GPE reference counting to support shared GPEs
git bisect bad cbbc0de700e61d0cdc854d435dbc2ef148de0e00
# bad: [7bc5e3f2be32ae6fb0c74cd0f707f986b3a01a26] x86/PCI: use host bridge _CRS info by default on 2008 and newer machines
git bisect bad 7bc5e3f2be32ae6fb0c74cd0f707f986b3a01a26
Comment 11 Greg Kroah-Hartman 2011-05-03 18:13:17 UTC
On Tue, May 03, 2011 at 07:50:37AM +0000, bugzilla-daemon@bugzilla.kernel.org wrote:
> https://bugzilla.kernel.org/show_bug.cgi?id=31602
> 
> 
> 
> 
> 
> --- Comment #10 from thuban@singularity.fr  2011-05-03 07:50:35 ---
> Here is the complete git bisect procedure after some explanation I found.
> Sorry
> for my mistake.
> Last git bisect returned : 
> 
> 7bc5e3f2be32ae6fb0c74cd0f707f986b3a01a26 is the first bad commit
> commit 7bc5e3f2be32ae6fb0c74cd0f707f986b3a01a26
> Author: Bjorn Helgaas <bjorn.helgaas@hp.com>
> Date:   Tue Feb 23 10:24:41 2010 -0700

Can you post this on the linux-pci@vger.kernel.org and
linux-usb@vger.kernel.org mailing lists and cc: the developers involved
in this patch so they can comment and hopefully fix the issue?
Comment 12 thuban 2011-05-03 18:34:49 UTC
It's done :)
Comment 13 Bjorn Helgaas 2011-05-26 19:42:20 UTC
I'm sorry I missed this problem.  I just happened to trip over it by accident while looking for something else in bugzilla.

In any event, can you please attach the complete dmesg logs from before and after the commit you identified?
Comment 14 thuban 2011-05-27 07:04:36 UTC
Here are the complete dmesg log, which are huge. So, the first is the good one :
http://pastebin.toile-libre.org/?show=635047
The second is when usb don't work.
http://pastebin.toile-libre.org/?show=635046

If you prefer it on another pastebin or in an other way, just tell me.
Comment 15 Bjorn Helgaas 2011-05-27 14:27:33 UTC
Created attachment 59692 [details]
2.6.32 dmesg log (working)

Attaching directly here just for completeness.
Comment 16 Bjorn Helgaas 2011-05-27 14:42:27 UTC
Created attachment 59702 [details]
2.6.39 dmesg (broken)

This looks like a BIOS bug.  Here are the memory apertures of the PCI host bridge, according to the ACPI description:

  pci_root PNP0A03:00: host bridge window [mem 0x000a0000-0x000bffff]
  pci_root PNP0A03:00: host bridge window [mem 0x000d0000-0x000dffff]
  pci_root PNP0A03:00: host bridge window [mem 0xd4000000-0xfebfffff]
  pci_root PNP0A03:00: host bridge window [mem 0xfec10000-0xfecfffff]
  pci_root PNP0A03:00: host bridge window [mem 0xfed00400-0xfedfffff]
  pci_root PNP0A03:00: host bridge window [mem 0xfee10000-0xff9fffff]
  pci_root PNP0A03:00: host bridge window [mem 0xffc00000-0xffdffbff]
  pci_root PNP0A03:00: host bridge window [mem 0x130000000-0x327ffffff]

And the USB device BARs are not in any of those apertures:

  pci 0000:00:12.0: reg 10: [mem 0xffb00000-0xffb00fff]
  pci 0000:00:12.1: reg 10: [mem 0xffb01000-0xffb01fff]
  pci 0000:00:12.2: reg 10: [mem 0xffa80000-0xffa800ff]
  pci 0000:00:13.0: reg 10: [mem 0xffb02000-0xffb02fff]
  pci 0000:00:13.1: reg 10: [mem 0xffb03000-0xffb03fff]
  pci 0000:00:13.2: reg 10: [mem 0xffa80400-0xffa804ff]
  ...
  pci 0000:00:12.0: no compatible bridge window for [mem 0xffb00000-0xffb00fff]
  pci 0000:00:12.1: no compatible bridge window for [mem 0xffb01000-0xffb01fff]
  pci 0000:00:12.2: no compatible bridge window for [mem 0xffa80000-0xffa800ff]
  pci 0000:00:13.0: no compatible bridge window for [mem 0xffb02000-0xffb02fff]
  pci 0000:00:13.1: no compatible bridge window for [mem 0xffb03000-0xffb03fff]
  pci 0000:00:13.2: no compatible bridge window for [mem 0xffa80400-0xffa804ff]

So we reassign them to put them inside an aperture:

  pci 0000:00:12.0: BAR 0: assigned [mem 0xd4000000-0xd4000fff]
  pci 0000:00:12.1: BAR 0: assigned [mem 0xd4001000-0xd4001fff]
  pci 0000:00:13.0: BAR 0: assigned [mem 0xd4002000-0xd4002fff]
  pci 0000:00:13.1: BAR 0: assigned [mem 0xd4003000-0xd4003fff]
  pci 0000:00:12.2: BAR 0: assigned [mem 0xd4004000-0xd40040ff]
  pci 0000:00:13.2: BAR 0: assigned [mem 0xd4004100-0xd40041ff]

That all seems fine and legal so far.  In my experience, Windows does the same thing (although Windows assigns from the top of an aperture down, and Linux goes bottom-up).  If you happen to have Windows available, it'd be interesting to see where it puts the USB devices.

The reason it worked in 2.6.32 was that we ignored the ACPI description of the host bridge, so we didn't reassign the USB devices.  There are other reasons why we need to use that description, so we can't easily revert that change.

I'm 99% sure that Windows will also move the USB devices, and presumably, they do work under Windows, so there must be something else going on -- some other difference between the way Windows handles those devices and the way Linux does.
Comment 17 Bjorn Helgaas 2011-06-15 20:14:20 UTC
You should be able to use "pci=nocrs" as a workaround until we figure out what's wrong.

This is a pretty big problem because new distro releases don't work out of the box.

I see several reports of it on the net:

  https://bugs.launchpad.net/ubuntu/+source/linux/+bug/647043
  https://bugs.launchpad.net/ubuntu/+source/linux/+bug/761894 (dupe)
  https://bbs.archlinux.org/viewtopic.php?id=102553

I suspect we'll see many more as newer kernels get deployed.

I don't see what's wrong from the kernel PCI point of view: we moved the USB devices to addresses that appear to be legal.

We've seen several instances of Dell BIOSes leaving the USB devices outside the host bridge windows, which causes Linux (and Windows) to move them.  If there's something Linux is doing wrong when it does this, I'd love to figure it out, so we could fix Linux.  Otherwise, the only alternatives I see are to

  - Blacklist some Dell machines so we use "pci=nocrs" automatically.  This is a hack and means we can't support multi-host bridge machines correctly.
  - Hope for a Dell BIOS update, and force all users to upgrade (yuck).

Is there anybody at Dell who could help figure this out?
Comment 18 Bjorn Helgaas 2011-06-15 22:28:56 UTC
I have a theory.  I suspect the E820 and PNP0C02 descriptions of the MMCONFIG space are incorrect (too small).

From the failing dmesg log (attachment #59702 [details]):

  BIOS-e820: 00000000cfec5400 - 00000000d4000000 (reserved)
  Fam 10h mmconf [d0000000, dfffffff]
  PCI: MMCONFIG for domain 0000 [bus 00-3f] at [mem 0xd0000000-0xd3ffffff] (base 0xd0000000)
  PCI: MMCONFIG at [mem 0xd0000000-0xd3ffffff] reserved in E820
  pnp 00:0c: [mem 0xd0000000-0xd3ffffff]
  system 00:0c: Plug and Play ACPI device, IDs PNP0c01 (active)

The E820 table and the PNP0c01 device both show the region [mem 0xd0000000-0xd3ffffff] as being reserved.

However, the "Fam 10h mmconf" message from arch/x86/pci/amd_bus.c makes it look like we're actually configured for MMCONFIG space for the entire [mem 0xd0000000-0xdfffffff] range.

Note that amd_bus.c reads MSRs directly, which is probably reliable in this case.  However, it is not the architected way for the OS to discover this, because it requires CPU/chipset-specific code in the OS.

I can try a Linux quirk to reserve that space as a workaround.

But I thought there were WHQL tests that checked for address space issues like this, and they should have caught this.  I very painfully debugged an HP laptop issue where putting a USB device in a certain spot (that appeared to be available) would lock up the machine.  If we can't rely on the address maps from firmware, we're just going to keep tripping over things like this.
Comment 19 Bjorn Helgaas 2011-06-16 00:00:54 UTC
Created attachment 62152 [details]
quirk for AMD MMCONFIG space

Can you try this patch (based on 3.0-rc3, but should apply to any recent kernel) and attach the dmesg log?
Comment 20 Bjorn Helgaas 2011-06-28 19:30:34 UTC
Ping!  Anybody have time to test this patch?
Comment 21 Bjorn Helgaas 2011-06-29 17:24:26 UTC
Created attachment 63882 [details]
updated quirk

Here's an updated patch.
Comment 22 Bjorn Helgaas 2011-07-01 20:53:23 UTC
Created attachment 64432 [details]
v3 patch

Oops, I botched the previous one.  Thank you, Lisa, for testing it.
Comment 23 dann frazier 2011-07-08 23:15:28 UTC
Created attachment 65022 [details]
my backport to maverick's 2.6.35

Thanks Bjorn. I tried backporting the v3 fix to maverick and building a kernel for Lisa to test, but it didn't seem to fix things. Here's my backport in case you notice something incorrect with it.
Comment 24 Bjorn Helgaas 2011-07-12 19:38:26 UTC
Thanks for testing again, Dann and Lisa.  The backport looks fine to me, so the problem isn't obvious to me.  To avoid wasting your time, I'm going to wait on this until I have access to a machine, or until Dell steps up with some help.  In the meantime, the workaround is to boot with "pci=nocrs".
Comment 25 stuart hayes 2011-07-21 20:49:07 UTC
I just tried the 2.6.39.3 kernel, and it worked on my Dell 1546.  I was able to reproduce the failure with an older kernel (2.6.32-ish), if I used "pci=use_crs".

With 2.6.39.3, it is (re-)assigning memory resources to the USB devices starting at memory address 0x80000000, which seems to work.
Comment 26 Bjorn Helgaas 2011-07-21 21:05:19 UTC
Thank you very much for testing this, Stuart.  Would you mind
attaching the dmesg log from 2.6.39.3 so I can see exactly what
happened?  I fixed several CRS-related bugs, and I'd like to
positively identify this as one of them.

Oh, I forgot to mention that the problem only happens with 3GB of
memory (see the launchpad bug below) , which I bet you don't have in
your 1546.  Any chance you could test with more memory?

https://bugs.launchpad.net/ubuntu/+source/linux/+bug/647043/comments/23
Comment 27 stuart hayes 2011-07-21 21:13:12 UTC
Created attachment 66322 [details]
dmesg with 2.6.39.3 kernel

Here's the dmesg with 2.6.39.3.

I have 2GB of memory, and I definitely see the problem when I use "pci=use_crs" on an older kernel (RHEL 6.1 kernel, specifically--2.6.32-131.0.15.el6)... I get the same error messages that are described in this issue, and it works without "pci=use_crs", etc.
Comment 28 Bjorn Helgaas 2011-07-21 22:21:35 UTC
Thanks for the 2.6.39.3 dmesg.

I think the problem you see when using "pci=use_crs" on RHEL6.1 is
actually a different problem that happens to have similar symptoms.
If you attach that dmesg, I can probably figure out exactly which
problem that is.

But from your 2.6.39.3 dmesg, I strongly suspect that you will see a
problem if you put 3GB of memory in the 1545.  The dmesg shows:

   BIOS-e820: 00000000d0000000 - 00000000d4000000 (reserved)
  Fam 10h mmconf [d0000000, dfffffff]
  PCI: MMCONFIG for domain 0000 [bus 00-3f] at [mem
0xd0000000-0xd3ffffff] (base 0xd0000000)
  PCI: MMCONFIG at [mem 0xd0000000-0xd3ffffff] reserved in E820
  pci_root PNP0A03:00: host bridge window [mem 0xd4000000-0xfebfffff]

The "Fam 10h" line means the chipset is programmed to claim the entire
[d0000000, dfffffff] area as MMCONFIG space.  But the other lines tell
the OS that only [mem 0xd0000000-0xd3ffffff] is reserved for MMCONFIG.
 And the last line tells the OS that the [mem 0xd4000000-0xfebfffff]
area is available for PCI devices.

But in fact, 0xd4000000 is NOT available for PCI; it's claimed by the
chipset as MMCONFIG.

If you put 3GB of memory in, the host bridge window at [mem
0x80000000-0xcfffffff], which is where 2.6.39.2 puts the USB devices,
will be used for memory instead.  Therefore, we'll put them at
0xd4000000, where they won't work.

That's my theory, anyway :)

The Ubuntu folks did confirm that 1536/1546 USB works fine with 2GB of
memory, but fails with 3GB.
Comment 29 stuart hayes 2011-07-22 20:26:25 UTC
OK, I used a "memmap=" kernel parameter to make sure that 0x80000000-0xd0000000 was reserved even though I only have 2GB memory.

Now I see that the kernel assigns 0xd4000000 range addresses to the USB controllers.  I actually get a kernel panic when the EHCI driver tries to do something, so I boot with "nousb" also and just look at the address that's assigned to the USB controllers.

Looks like your patch didn't work, because the reservation failed.  Here's the relevant part of dmesg:

  ...
  pnp 00:0c: [mem 0x00000000-0x0009efff]
  pnp 00:0c: [mem 0x0009f000-0x0009ffff]
  pnp 00:0c: [mem 0x000c0000-0x000cffff]
  pnp 00:0c: [mem 0x000e0000-0x000fffff]
  pnp 00:0c: [mem 0x00100000-0x7fec53ff]
  pnp 00:0c: [mem 0x7fec5400-0x7fefffff]
  pnp 00:0c: [mem 0x7ff00000-0x7fffffff]
  pnp 00:0c: [mem 0xffe00000-0xffffffff]
  pnp 00:0c: [mem 0xfec00000-0xfec0ffff]
  pnp 00:0c: [mem 0xfee00000-0xfee0ffff]
  pnp 00:0c: [mem 0xfee10000-0xfee103ff]
  pnp 00:0c: [mem 0xfed1c800-0xfed1cfff]
> pnp 00:0c: [mem 0xd0000000-0xd3ffffff]
> pnp 00:0c: [Firmware Bug]: enlarging [mem 0xd0000000-0xd3ffffff] for AMD
> MMCONFIG area at [mem 0xd0000000-0xdfffffff]
  system 00:0c: [mem 0x00000000-0x0009efff] could not be reserved
  system 00:0c: [mem 0x0009f000-0x0009ffff] could not be reserved
  system 00:0c: [mem 0x000c0000-0x000cffff] has been reserved
  system 00:0c: [mem 0x000e0000-0x000fffff] has been reserved
  system 00:0c: [mem 0x00100000-0x7fec53ff] could not be reserved
  system 00:0c: [mem 0x7fec5400-0x7fefffff] has been reserved
  system 00:0c: [mem 0x7ff00000-0x7fffffff] has been reserved
  system 00:0c: [mem 0xffe00000-0xffffffff] has been reserved
  system 00:0c: [mem 0xfec00000-0xfec0ffff] could not be reserved
  system 00:0c: [mem 0xfee00000-0xfee0ffff] has been reserved
  system 00:0c: [mem 0xfee10000-0xfee103ff] has been reserved
  system 00:0c: [mem 0xfed1c800-0xfed1cfff] could not be reserved
> system 00:0c: [mem 0xd0000000-0xdfffffff] could not be reserved
  system 00:0c: Plug and Play ACPI device, IDs PNP0c01 (active)
  pnp: PnP ACPI: found 13 devices
  ACPI: ACPI bus type pnp unregistered
> pci 0000:00:12.0: BAR 0: assigned [mem 0xd4000000-0xd4000fff]
> pci 0000:00:12.0: BAR 0: set to [mem 0xd4000000-0xd4000fff] (PCI address
> [0xd4000000-0xd4000fff])
> pci 0000:00:12.1: BAR 0: assigned [mem 0xd4001000-0xd4001fff]
> pci 0000:00:12.1: BAR 0: set to [mem 0xd4001000-0xd4001fff] (PCI address
> [0xd4001000-0xd4001fff])
> pci 0000:00:13.0: BAR 0: assigned [mem 0xd4002000-0xd4002fff]
> pci 0000:00:13.0: BAR 0: set to [mem 0xd4002000-0xd4002fff] (PCI address
> [0xd4002000-0xd4002fff])
> pci 0000:00:13.1: BAR 0: assigned [mem 0xd4003000-0xd4003fff]
> pci 0000:00:13.1: BAR 0: set to [mem 0xd4003000-0xd4003fff] (PCI address
> [0xd4003000-0xd4003fff])
> pci 0000:00:12.2: BAR 0: assigned [mem 0xd4004000-0xd40040ff]
> pci 0000:00:12.2: BAR 0: set to [mem 0xd4004000-0xd40040ff] (PCI address
> [0xd4004000-0xd40040ff])
> pci 0000:00:13.2: BAR 0: assigned [mem 0xd4004100-0xd40041ff]
> pci 0000:00:13.2: BAR 0: set to [mem 0xd4004100-0xd40041ff] (PCI address
> [0xd4004100-0xd40041ff])
  pci 0000:02:00.0: BAR 6: assigned [mem 0xf7d00000-0xf7d1ffff pref]
  pci 0000:00:02.0: PCI bridge to [bus 02-02]
  pci 0000:00:02.0:   bridge window [io  0xe000-0xefff]
  pci 0000:00:02.0:   bridge window [mem 0xf7d00000-0xf7efffff]
  pci 0000:00:02.0:   bridge window [mem 0xe0000000-0xefffffff 64bit pref]
  pci 0000:09:00.0: BAR 6: assigned [mem 0xf0020000-0xf003ffff pref]
  pci 0000:00:05.0: PCI bridge to [bus 09-09]
  ...
Comment 30 stuart hayes 2011-07-22 22:04:34 UTC
I think it is failing to reserve 0xd0000000-0xdfffffff because 0xd0000000-0xd3ffffff is already reserved.
Comment 31 Bjorn Helgaas 2011-07-22 22:38:23 UTC
You're right.  Curses, foiled again by our hodge-podge system of
reserving things.  Nice trick to use "memmap=", by the way :)  Wish
I'd thought of that.

I'll puzzle this over and figure out something to do.  At least we
have the information we need from the chipset to work around this
problem.
Comment 32 stuart hayes 2011-07-25 16:32:15 UTC
The patch below (applied on top of the "v3 patch") works for me, but there may be problems with it that I'm not noticing.  This just adds a new resource (or two) instead of increasing the size of the existing resource when it sees that it needs to enlarge the size of the mmconfig reserved area.  With this patch the kernel assings addresses at 0xf0400000-0xf04041ff to the USB controllers on my system, and USB works.

--- linux-2.6.39.3/drivers/pnp/quirks.c	2011-07-26 00:30:35.747101584 -0400
+++ linux-2.6.39.3_patched/drivers/pnp/quirks.c	2011-07-26 00:40:16.246582297 -0400
@@ -304,6 +304,7 @@ static void quirk_amd_mmconfig_area(stru
 	u64 mmconfig_start, mmconfig_end;
 	struct pnp_resource *pnp_res;
 	struct resource *res;
+	resource_size_t start, end;
 
 	amd_mmconfig_range(&mmconfig_start, &mmconfig_end);
 	if (!mmconfig_end)
@@ -319,10 +320,22 @@ static void quirk_amd_mmconfig_area(stru
 			 "enlarging %pR for AMD MMCONFIG area at [mem %#010llx-%#010llx]\n",
 			 res, (unsigned long long) mmconfig_start,
 			 (unsigned long long) mmconfig_end);
-		if (mmconfig_start < res->start)
-			res->start = mmconfig_start;
-		if (mmconfig_end > res->end)
-			res->end = mmconfig_end;
+
+		if (res->start > mmconfig_start) {
+			start = mmconfig_start;
+			end = res->start - 1;
+			pnp_dbg(&dev->dev, "...adding resource from %lx to %lx\n",(unsigned long)start,(unsigned long)end);
+			pnp_add_mem_resource(dev, start, end, 0);
+		}
+
+		if (res->end < mmconfig_end) {
+			start = res->end + 1;
+			end = mmconfig_end;
+			pnp_dbg(&dev->dev, "...adding resource from %lx to %lx\n",(unsigned long)start,(unsigned long)end);
+			pnp_add_mem_resource(dev, start, end, 0);
+		}
+	
+		break;
 	}
 }
 #endif
Comment 33 Florian Mickler 2012-01-21 16:35:37 UTC
A patch referencing this bug report has been merged in Linux v3.3-rc1:

commit eb31aae8cb5eb54e234ed2d857ddac868195d911
Author: Bjorn Helgaas <bhelgaas@google.com>
Date:   Thu Jan 5 14:27:24 2012 -0700

    PNP: work around Dell 1536/1546 BIOS MMCONFIG bug that breaks USB