Bug 72201

Summary: [bisected] e501 "agp: Support 64-bit APBASE" agp fails without iommu=remap=2
Product: Drivers Reporter: Jouni Mettälä (jtmettala)
Component: PCIAssignee: drivers_pci (drivers_pci)
Status: RESOLVED CODE_FIX    
Severity: normal CC: bjorn, jtmettala
Priority: P1    
Hardware: All   
OS: Linux   
URL: https://lkml.kernel.org/r/CAGG0vUgp9bc=xxp4T672sfm+Y-JQWmvXtdnQonGcHQUZ4WENcg@mail.gmail.com
Kernel Version: 3.14-rc1 Subsystem:
Regression: Yes Bisected commit-id:
Attachments: lspci-vv
dmesg-nonworking
dmesg-working
dmesg from patched kernel
Remove GART from resource map

Description Jouni Mettälä 2014-03-16 15:41:11 UTC
Created attachment 129611 [details]
lspci-vv

I can't start xsession without iommu=remap=2 kernel parameter.

Kernel bisect lead to commit e501b3d87f003dfad8fcbd0f55ae17ea52495a56
agp: Support 64-bit APBASE

Reverting this fixes it for me.
Comment 1 Jouni Mettälä 2014-03-16 15:42:31 UTC
Created attachment 129621 [details]
dmesg-nonworking
Comment 2 Jouni Mettälä 2014-03-16 15:46:09 UTC
Created attachment 129631 [details]
dmesg-working

this is with 0501 reverted
Comment 3 Jouni Mettälä 2014-03-18 11:17:31 UTC
Created attachment 129971 [details]
dmesg from patched kernel
Comment 4 Bjorn Helgaas 2014-03-18 20:42:12 UTC
Created attachment 130001 [details]
Remove GART from resource map

This is the patch Jouni tested  in comment #3.

This is a regression that appeared in v3.14-rc1, when we merged
e501b3d87f00 [1].  The relevant part of that change is this:

    --- a/drivers/char/agp/amd64-agp.c
    +++ b/drivers/char/agp/amd64-agp.c
    @@ -295,9 +294,7 @@ static int fix_northbridge(struct pci_dev *nb, struct pci_dev *agp, u16 cap)

    -       pci_read_config_dword(agp, 0x10, &aper_low);
    -       pci_read_config_dword(agp, 0x14, &aper_hi);
    -       aper = (aper_low & ~((1<<22)-1)) | ((u64)aper_hi << 32);
    +       aper = pci_bus_address(agp, AGP_APERTURE_BAR);

Here's what I think is happening: Previously, we read the GART
aperture base directly from the BAR.  After e501b3d87f00, we use
pci_bus_address() to convert the aperture *resource* (which the PCI
core has previously read from the BAR and converted to a CPU address)
back into a bus address.

Normally both ways would give the same result, but here we had this:

    Node 0: aperture @ a0000000 size 256 MB
    pci 0000:00:04.0: reg 0x10: [mem 0xa0000000-0xafffffff pref]
    pci 0000:00:04.0: address space collision: [mem 0xa0000000-0xafffffff pref] conflicts with GART [mem 0xa0000000-0xafffffff]

The "Node 0" line is where we inserted the "GART [mem 0xa0000000-
0xafffffff]" resource in gart_iommu_hole_init().  Then we enumerated
the northbridge, which had a BAR containing 0xa0000000.  When we tried
to claim that BAR, it conflicted with the "GART" resource, and we set
r->start = 0 (in pcibios_allocate_dev_resources()).  So when we
finally got to fix_northbridge(), the aperture resource was set to
zero, not 0xa0000000.

Note that we complained about the collision even before e501b3d87f00.
The only difference is that we used to re-read the BAR, where we still
got 0xa0000000 even though the PCI core thought the resource was
invalid and had set it to zero.

I think we should stop inserting the GART resource directly in
iomem_resource in gart_iommu_hole_init().  That should avoid the
collision and leave the BAR resource valid, which means
pci_bus_address() should work.

[1] http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=e501b3d87f00
Comment 5 Bjorn Helgaas 2014-03-25 16:29:40 UTC
Resolved by http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=707d4eefbdb31f8e588277157056b0ce637d6c68, which appeared in v3.14-rc8.