Bug 60111

Summary: NULL pointer deref in ASPM alloc_pcie_link_state() on non-compliant topology
Product: Drivers Reporter: Bjorn Helgaas (bjorn)
Component: PCIAssignee: drivers_pci (drivers_pci)
Status: REOPENED ---    
Severity: normal CC: fancer.lancer
Priority: P1    
Hardware: All   
OS: Linux   
URL: https://lkml.kernel.org/r/1371668174-32115-1-git-send-email-rkrcmar@redhat.com
Kernel Version: 3.10 Subsystem:
Regression: No Bisected commit-id:
Attachments: dmesg with aspm bug
lspci -vv with pcie_aspm=off
patch to deal with this topology
boot log with issue
log of successful boot
lspci -vvnnt; lspci -vv; logs
Patch to fix the issue

Description Bjorn Helgaas 2013-06-25 16:50:14 UTC
Reported by Radim Krčmář <rkrcmar@redhat.com>:

PCIe switch upstream port can be connected directly to the PCIe root bus
in QEMU; ASPM does not expect this topology and dereferences NULL pointer
when initializing.

I have not confirmed this can happen on real hardware, but it is presented
as a feature in QEMU, so there is no reason to panic if we can recover.

The dereference happens with topology defined by
  -M q35 -device x3130-upstream,bus=pcie.0,id=upstream \
  -device xio3130-downstream,bus=upstream,id=downstream,chassis=1
where on line drivers/pci/pcie/aspm.c:530 (alloc_pcie_link_state+13):
                parent = pdev->bus->parent->self->link_state;
"pdev->bus->parent->self == NULL", because "pdev->bus->parent" has no
"->parent", hence no "->self".

Even though discouraged by QEMU documentation, one can set up even
topology without the upstream port
  -M q35 -device xio3130-downstream,bus=pcie.0,id=downstream,chassis=1
so "pdev->bus->parent == NULL", because "pdev->bus" is the root bus.
The patch checks for this too, because I do not like *NULL.

Right now, PCIe switch has to connect to the root port
  -M q35 -device ioh3420,bus=pcie.0,id=root.0 \
  -device x3130-upstream,bus=root.0,id=upstream \
  -device xio3130-downstream,bus=upstream,id=downstream,chassis=1
Comment 1 Radim Krčmář 2013-06-25 20:46:07 UTC
Created attachment 106031 [details]
dmesg with aspm bug
Comment 2 Radim Krčmář 2013-06-25 20:46:43 UTC
Created attachment 106041 [details]
lspci -vv with pcie_aspm=off
Comment 3 Bjorn Helgaas 2013-06-25 21:44:40 UTC
After opening this bz, I found this Red Hat report of the same issue:
  https://bugzilla.redhat.com/show_bug.cgi?id=972381

Reproduced with v3.10-rc1 with the following qemu invocation:

/usr/local/bin/qemu-system-x86_64 -M q35 -enable-kvm -m 512 \
  -device x3130-upstream,bus=pcie.0,id=upstream \
  -device xio3130-downstream,bus=upstream,id=downstream,chassis=1 \
  -drive file=ubuntu.img,if=none,id=mydisk \
  -device ide-drive,drive=mydisk,bus=ide.0 \
  -drive file=scratch.img,id=disk1 \
  -device virtio-blk-pci,bus=downstream,id=virtio-disk1,drive=disk1 \
  -nographic -kernel ~/linux/arch/x86/boot/bzImage \
  -append "console=ttyS0,115200n8 root=/dev/sda1 ignore_loglevel"
Comment 4 Bjorn Helgaas 2013-08-28 21:57:01 UTC
Created attachment 107346 [details]
patch to deal with this topology

This patch *would* avoid the issue, but it's a fair bit of code and
comments that future code readers would have to wade through, and the
only benefit is for topologies that are completely invalid and will
never be seen in real hardware.  Therefore, I'm inclined to do nothing
right now.

I think that we could also avoid this issue by reworking our ASPM
support, and if we were to do anything, that's what I would prefer.
For example, consider the following (legal and common) topology:

    00:1c.0 Root Port (bridge leading to [bus 01-03])
    01:00.0 Upstream Port (bridge leading to [bus 02-03])
    02:00.0 Downstream Port (bridge leading to [bus 03])

When we initialize ASPM link state for 02:00.0, we look all the way up
to the link state for 00:1c.0 (the
"pdev->bus->parent->self->link_state" in alloc_pcie_link_state()).  I
think that's crazy.  ASPM is fundamentally a feature of a *link*, and
the only link directly related to 02:00.0 is the one leading
*downstream* to device 03:00.0.

I know there are latency questions where we do care about upstream
links, and there's commit 46bbdfa44c, which is what originally added
this "pdev->bus->parent->self->link_state" expression.  It suggests
that we need to check the whole hierarchy up to the root port.  But
there's not enough information in the changelog to really explain it.

So I strongly support rework that would simplify ASPM management
overall.   I just don't want to add point fixes that only handle
special cases, especially when they don't occur in the real world.
Comment 5 Bjorn Helgaas 2013-08-28 22:01:31 UTC
I'm closing this as invalid because the simulated machine where the problem occurs has an invalid PCIe topology (an Upstream Port with no Downstream Port or Root Port above it).  As far as I know, there is no valid topology, e.g.,
a real hardware machine in the field, that would cause this failure.
Comment 6 Bjorn Helgaas 2016-10-06 13:09:13 UTC
Reopening because Serge Semin <fancer.lancer@gmail.com> reported two systems that hit this issue: http://lkml.kernel.org/r/1475746455-20665-1-git-send-email-fancer.lancer@gmail.com

Serge, can you please attach complete dmesg logs and "lspci -vv" output for these systems?
Comment 7 fancer 2016-10-06 14:21:46 UTC
Created attachment 240981 [details]
boot log with issue
Comment 8 fancer 2016-10-06 14:22:59 UTC
Created attachment 240991 [details]
log of successful boot

Boot log when issue is fixed by the provided patch
Comment 9 fancer 2016-10-06 14:24:51 UTC
Created attachment 241001 [details]
lspci -vvnnt; lspci -vv; logs

Log of lspci utility output on the system with fixed issue. As you can see ASPM is enabled everywhere except 1:00.0 Upstream port.
Comment 10 fancer 2016-10-06 14:25:26 UTC
Created attachment 241011 [details]
Patch to fix the issue
Comment 11 fancer 2016-10-06 15:00:40 UTC
I don't know what the previous patch did to fix the problem, but mine just checks whether the parent bus of having secondary link current device is not root. So we have two cases:
1) Traditional PCIe bus topology:
               |RC
         ---------------- bus 0 (root bus - root complex)
          |[RP]
          |
          | bus 1 (leading to a first Upstream/Downstream device)
          |
          |US0
   --------------- bus 2 (device virtual bus)
    |[DS2]

In accordance with the algorithm it creates "struct pcie_link_state" for RP and DS2. So when we get "pdev->bus->parent->self->link_state" for DS2, we retrieve RP's link state structure.

2) Root-port-less PCIe bus topology:
          |RC+RP (Host-PCIe bridges is merged with root port)
          |
          | bus 1 (root bus)
          |
          |US0
   --------------- bus 2 (device virtual bus)
    |[DS2]

First of all algorithm doesn't create link state structure for root port since there is no one. So when algorithm tries to enumerate [DS2] and create link state structure, it calls "pdev->bus->parent->self->link_state". Since "RC+RP" is not ordinary PCI device, but Host-PCI bridge, bus 1 turns out to be root bus without upstream device, so "pdev->bus->parent->self"-pointer of the bus is NULL. That's how NULL dereference happens.

So the fix is easy. Just check whether the parent bus isn't root bus.
Comment 12 fancer 2016-10-07 08:32:49 UTC
Of course, such method has a drawback. When we apply the attached patch, bus 1 turns out to be considered as root bus. It means, that PCIe link of the bus can't change state, so ASPM service is not enabled for US0 port. As you can see ASPM is disabled for device 1:00.0 of the attached "lspci -vv" log. As a consequence if PCIe bus has nothing, but US0 device directly connected to the merged RC+RP, ASPM will be de-facto disabled.
Comment 13 Bjorn Helgaas 2017-01-27 23:41:05 UTC
(In reply to Radim Krčmář from comment #2)
> Created attachment 106041 [details]
> lspci -vv with pcie_aspm=off

Notes from dmesg (comment #1) and lspci (comment #2):

  Linux version 3.10.0-0.rc6.git0.2.fc20.x86_64 (mockbuild@buildvm-02.phx2.fedoraproject.org) (gcc version 4.8.1 20130612 (Red Hat 4.8.1-2) (GCC) ) #1 SMP Mon Jun 17 15:02:16 UTC 2013
  PCI host bridge to bus 0000:00
  00:03.0 Upstream Port         bridge to [bus 01-02]
  01:00.0 Downstream Port       bridge to [bus 02]
  02:00.0 virtio SCSI (non-PCIe)
  BUG: unable to handle kernel NULL pointer dereference at 0000000000000088
  IP: [<ffffffff813218f1>] pcie_aspm_init_link_state+0x701/0x7d0

No ASPM possible anywhere.  01:00.0 should be the upstream end of a 
Link where ASPM could potentially be used, but the downstream device
(02:00.0) is not a PCIe device.  We should fix ASPM to handle this gracefully
instead of crashing.
Comment 14 Bjorn Helgaas 2017-01-27 23:43:38 UTC
(In reply to fancer from comment #9)
> Created attachment 241001 [details]
> lspci -vvnnt; lspci -vv; logs

Notes from dmesg (comment #7) and lspci (comment #9):

  Linux version 3.19.12-cnccu (builder@baikal.int) (gcc version 5.2.0 (crosstool-NG ) ) #0 SMP Wed Sep 7 11:04:07 MSK 2016
  PCI host bridge to bus 0000:01
  01:00.0 Upstream Port         bridge to [bus 02-06]
  02:02.0 Downstream Port       bridge to [bus 03-05]
  02:04.0 Downstream Port       bridge to [bus 06]
  03:00.0 Upstream Port         bridge to [bus 04-05]
  03:00.1 Endpoint
  04:00.0 Downstream Port       bridge to [bus 05]
  06:00.0 Legacy Endpoint
  CPU 0 Unable to handle kernel paging request at virtual address 00000060, epc == 80316d78, ra == 80316d28
      
ASPM is possible on:
  - 02:02.0 (link to 03:00.0, 03:00.1; both PCIe devices)
  - 02:04.0 (link to 06:00.0)

ASPM is not possible on 04:00.0 (PCIe downstream port but no device on the other end).

Again, we should fix ASPM to handle this gracefully instead of crashing.  I don't have a proposal yet.