Bug 72201 - [bisected] e501 "agp: Support 64-bit APBASE" agp fails without iommu=remap=2
Summary: [bisected] e501 "agp: Support 64-bit APBASE" agp fails without iommu=remap=2
Status: RESOLVED CODE_FIX
Alias: None
Product: Drivers
Classification: Unclassified
Component: PCI (show other bugs)
Hardware: All Linux
: P1 normal
Assignee: drivers_pci@kernel-bugs.osdl.org
URL: https://lkml.kernel.org/r/CAGG0vUgp9b...
Keywords:
Depends on:
Blocks:
 
Reported: 2014-03-16 15:41 UTC by Jouni Mettälä
Modified: 2014-03-25 16:29 UTC (History)
2 users (show)

See Also:
Kernel Version: 3.14-rc1
Subsystem:
Regression: Yes
Bisected commit-id:


Attachments
lspci-vv (24.70 KB, application/octet-stream)
2014-03-16 15:41 UTC, Jouni Mettälä
Details
dmesg-nonworking (43.46 KB, text/plain)
2014-03-16 15:42 UTC, Jouni Mettälä
Details
dmesg-working (44.22 KB, text/plain)
2014-03-16 15:46 UTC, Jouni Mettälä
Details
dmesg from patched kernel (43.72 KB, text/plain)
2014-03-18 11:17 UTC, Jouni Mettälä
Details
Remove GART from resource map (3.02 KB, patch)
2014-03-18 20:42 UTC, Bjorn Helgaas
Details | Diff

Description Jouni Mettälä 2014-03-16 15:41:11 UTC
Created attachment 129611 [details]
lspci-vv

I can't start xsession without iommu=remap=2 kernel parameter.

Kernel bisect lead to commit e501b3d87f003dfad8fcbd0f55ae17ea52495a56
agp: Support 64-bit APBASE

Reverting this fixes it for me.
Comment 1 Jouni Mettälä 2014-03-16 15:42:31 UTC
Created attachment 129621 [details]
dmesg-nonworking
Comment 2 Jouni Mettälä 2014-03-16 15:46:09 UTC
Created attachment 129631 [details]
dmesg-working

this is with 0501 reverted
Comment 3 Jouni Mettälä 2014-03-18 11:17:31 UTC
Created attachment 129971 [details]
dmesg from patched kernel
Comment 4 Bjorn Helgaas 2014-03-18 20:42:12 UTC
Created attachment 130001 [details]
Remove GART from resource map

This is the patch Jouni tested  in comment #3.

This is a regression that appeared in v3.14-rc1, when we merged
e501b3d87f00 [1].  The relevant part of that change is this:

    --- a/drivers/char/agp/amd64-agp.c
    +++ b/drivers/char/agp/amd64-agp.c
    @@ -295,9 +294,7 @@ static int fix_northbridge(struct pci_dev *nb, struct pci_dev *agp, u16 cap)

    -       pci_read_config_dword(agp, 0x10, &aper_low);
    -       pci_read_config_dword(agp, 0x14, &aper_hi);
    -       aper = (aper_low & ~((1<<22)-1)) | ((u64)aper_hi << 32);
    +       aper = pci_bus_address(agp, AGP_APERTURE_BAR);

Here's what I think is happening: Previously, we read the GART
aperture base directly from the BAR.  After e501b3d87f00, we use
pci_bus_address() to convert the aperture *resource* (which the PCI
core has previously read from the BAR and converted to a CPU address)
back into a bus address.

Normally both ways would give the same result, but here we had this:

    Node 0: aperture @ a0000000 size 256 MB
    pci 0000:00:04.0: reg 0x10: [mem 0xa0000000-0xafffffff pref]
    pci 0000:00:04.0: address space collision: [mem 0xa0000000-0xafffffff pref] conflicts with GART [mem 0xa0000000-0xafffffff]

The "Node 0" line is where we inserted the "GART [mem 0xa0000000-
0xafffffff]" resource in gart_iommu_hole_init().  Then we enumerated
the northbridge, which had a BAR containing 0xa0000000.  When we tried
to claim that BAR, it conflicted with the "GART" resource, and we set
r->start = 0 (in pcibios_allocate_dev_resources()).  So when we
finally got to fix_northbridge(), the aperture resource was set to
zero, not 0xa0000000.

Note that we complained about the collision even before e501b3d87f00.
The only difference is that we used to re-read the BAR, where we still
got 0xa0000000 even though the PCI core thought the resource was
invalid and had set it to zero.

I think we should stop inserting the GART resource directly in
iomem_resource in gart_iommu_hole_init().  That should avoid the
collision and leave the BAR resource valid, which means
pci_bus_address() should work.

[1] http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=e501b3d87f00
Comment 5 Bjorn Helgaas 2014-03-25 16:29:40 UTC
Resolved by http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=707d4eefbdb31f8e588277157056b0ce637d6c68, which appeared in v3.14-rc8.

Note You need to log in before you can comment on or make changes to this bug.