Bug 22132 - USB devices not recognized without pci=nocrs - Vaio VGN-P39VRL
USB devices not recognized without pci=nocrs - Vaio VGN-P39VRL
Status: CLOSED CODE_FIX
Product: ACPI
Classification: Unclassified
Component: Config-Hotplug
All Linux
: P1 normal
Assigned To: Bjorn Helgaas
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2010-11-05 12:47 UTC by sergey.khorev
Modified: 2011-07-30 06:54 UTC (History)
5 users (show)

See Also:
Kernel Version: 2.6.36
Tree: Mainline
Regression: Yes


Attachments
output of lspci, dmesg etc (77.95 KB, application/x-compressed-tar)
2010-11-05 12:52 UTC, sergey.khorev
Details
dmesg for 2.6.37-rc2 (12.95 KB, application/x-stuffit)
2010-12-01 06:48 UTC, sergey.khorev
Details

Description sergey.khorev 2010-11-05 12:47:13 UTC
Starting from 2.6.34, USB devices are recognized only if pci=nocrs is used. Attached is output of lspci, dmesg etc for various versions
Comment 1 sergey.khorev 2010-11-05 12:52:35 UTC
Created attachment 36192 [details]
output of lspci, dmesg etc
Comment 2 Zhang Rui 2010-11-08 02:11:19 UTC
cc bjorn.
Comment 3 Bjorn Helgaas 2010-11-11 23:33:53 UTC
Sergey, thank you very much for the report and detailed logs.  Some
significant changes in this area appeared in 2.6.37-rc1, and I'm pretty
sure that will work on your system (please test it if you can).
However, I think there's still a problem we need to fix.

Here's what I think is happening.  The BIOS didn't assign space for
the 00:1d.7 USB BAR, so Linux did it.  Without _CRS, we guess that the
[mem 0x80000000-0xdfffffff] region is available for PCI devices, based
on these E820 entries:

  BIOS-e820: 000000007f6f0000 - 0000000080000000 (reserved)
  BIOS-e820: 00000000e0000000 - 00000000f0000000 (reserved)
  Allocating PCI resources starting at 80000000 (gap: 80000000:60000000)

If we look at /proc/iomem, looking for space from low addresses to high,
[mem 0x942c4000-0x942c43ff] is the first available place to put the 00:1d.7
USB device, which works fine:

  7f6f0000-7fffffff : reserved
  80000000-8fffffff : 0000:00:02.0
  90000000-90ffffff : PCI Bus 0000:01
  91000000-91ffffff : PCI Bus 0000:02
  92000000-930fffff : PCI Bus 0000:02
  93100000-941fffff : PCI Bus 0000:01
  94200000-9427ffff : 0000:00:02.0
  94280000-942bffff : 0000:00:02.0
  942c0000-942c3fff : 0000:00:1b.0
  942c4000-942c43ff : 0000:00:1d.7   (** assigned by Linux **)
  e0000000-efffffff : PCI MMCONFIG 0000 [bus 00-ff]

When Linux pays attention to the host bridge _CRS, as it does by default
starting in 2.6.34, we no longer just guess at what's available.  Instead,
we use the _CRS information:

  pci_root PNP0A08:00: host bridge window [mem 0x7f800000-0x7fffffff]
  pci_root PNP0A08:00: host bridge window [mem 0x80000000-0xfebfffff]

Note that the entire first window ([mem 0x7f800000-0x7fffffff]) is
"reserved" according to the E820 table.  I think it's a video framebuffer:

  uvesafb: framebuffer at 0x7f800000, mapped to 0xf8100000, using 7872k ...

I suppose this region is connected with the 00:02.0 video device (though it
doesn't appear in any BARs), so it makes some sense that the _CRS tells us
it's routed to PCI bus 00.

Now, here's the rub: we do try to pay attention to the fact that E820
tells us the area is reserved, but we don't do it correctly.  Here's what
it looks like in /proc/iomem:

  7f6f0000-7fffffff : reserved
    7f800000-7fffffff : PCI Bus 0000:00
  80000000-febfffff : PCI Bus 0000:00

When we allocate space for the 00:1d.7 USB device, we look for available
space *below* the PCI Bus 0000:00 resources.  We never look *up* at the
parents of those resources, so the 7f6f0000-7fffffff reserved area doesn't
affect the allocation at all.  The result is that we assign some of that
framebuffer space to the USB device, which doesn't work because accesses
go to the framebuffer instead of to the USB device:

  pci 0000:00:1d.7: BAR 0: assigned [mem 0x7f800000-0x7f8003ff]

I think 2.6.37-rc1 will work on your machine because we'll allocate from
the top down instead of from the bottom up, so we'll probably get a
region like [mem 0xfebffc00-0xfebfffff].

But of course, we still haven't fixed the underlying problem of ignoring
the E820 reservations.
Comment 4 sergey.khorev 2010-11-12 18:58:23 UTC
Thank you Bjorn for your comprehensive explanation. Indeed 2.6.37-rc1 worked fine.
Comment 5 Zhang Rui 2010-11-15 02:29:38 UTC
the bug reported by Sergey has been fixed in 2.6.37-rc1 kernel.

there is another bug
(In reply to comment #3)
> But of course, we still haven't fixed the underlying problem of ignoring
> the E820 reservations.

but that should be in another bug report, if needed.
Comment 6 Bjorn Helgaas 2010-11-15 15:54:57 UTC
I disagree that Sergey's bug has been fixed.  It just happens that the
2.6.37-rc1 changes cover it up, but this is a perfect example of a
problem caused by Linux ignoring E820 reservations.  So I don't think
there's any point in opening another bug report.  When we fix the E820
problem, we'll need a way to test those changes, and Sergey's machine
is a good candidate.
Comment 7 Len Brown 2010-11-16 02:53:06 UTC
In summary, 2.6.34 - 2.6.36 fail unless pci=nocrs is used,
and 2.6.37 works, but not because of a deliberate fix, right?

Will other machines fail like this one?
If we make pci=nocrs the default for 2.6.34-36,
will that be a net gain or loss?
Comment 8 Bjorn Helgaas 2010-11-16 15:56:47 UTC
(In reply to comment #7)
> In summary, 2.6.34 - 2.6.36 fail unless pci=nocrs is used,
> and 2.6.37 works, but not because of a deliberate fix, right?

Exactly.

> Will other machines fail like this one?
> If we make pci=nocrs the default for 2.6.34-36,
> will that be a net gain or loss?

Good question.  "pci=nocrs" would break a few things (PCI hotplug and
option ROM mapping for virtualized guests) on machines with multiple
host bridges.  And it would make us miss useful bug reports like this
one.

Backporting all the fixes to -stable seems a bit much.  A little patch to
make "pci=nocrs" the default seems more reasonable for -stable.

On the other hand, we don't really have all that many bug reports from
.34-36, and if we use "pci=nocrs" there, we'll effectively extend the
bug-finding period even longer than it already has been.

But I guess making .34-36-stable be as stable and bug-free as possible is
the whole point, and making "pci=nocrs" the default would probably be a win.
Comment 9 Bjorn Helgaas 2010-11-30 17:36:52 UTC
Sergey, could you attach the dmesg log from 2.6.37-rc1 (or later),
please?

I'd like to see what resources are being used by ACPI devices.
We've been assuming that the correct fix here is to pay attention
to E820 reservations.  However, there's some evidence that Windows
uses ACPI resources but doesn't rely on E820 "reserved" areas.
Comment 10 sergey.khorev 2010-12-01 06:48:01 UTC
Created attachment 38722 [details]
dmesg for 2.6.37-rc2

Bjorn,

Please find dmesg from 2.6.37-rc2 in the attachment.
Comment 11 Bjorn Helgaas 2011-05-27 16:14:31 UTC
I think this is fixed in 2.6.37, right?  Can we close this issue?
Comment 12 sergey.khorev 2011-06-18 16:41:25 UTC
The issue seems to be fixed.

On Fri, May 27, 2011 at 8:14 PM,  <bugzilla-daemon@bugzilla.kernel.org> wrote:
> https://bugzilla.kernel.org/show_bug.cgi?id=22132
>
>
> Bjorn Helgaas <bhelgaas@google.com> changed:
>
>           What    |Removed                     |Added
> ----------------------------------------------------------------------------
>                 CC|                            |bhelgaas@google.com
>
>
>
>
> --- Comment #11 from Bjorn Helgaas <bhelgaas@google.com>  2011-05-27 16:14:31 ---
> I think this is fixed in 2.6.37, right?  Can we close this issue?
>
> --
> Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email
> ------- You are receiving this mail because: -------
> You reported the bug.
>

Note You need to log in before you can comment on or make changes to this bug.