Bug 60865 - Excessive console output from acpiphp
Summary: Excessive console output from acpiphp
Status: CLOSED CODE_FIX
Alias: None
Product: Drivers
Classification: Unclassified
Component: PCI (show other bugs)
Hardware: x86-64 Linux
: P1 normal
Assignee: Rafael J. Wysocki
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2013-09-06 15:32 UTC by Alex Williamson
Modified: 2013-09-15 04:36 UTC (History)
2 users (show)

See Also:
Kernel Version: 3.11.0+
Subsystem:
Regression: No
Bisected commit-id:


Attachments
dmesg output with Rafael's patches + s/dbg/pr_info/ in hotplug_event() (25.82 KB, application/gzip)
2013-09-06 15:32 UTC, Alex Williamson
Details
lspci -vv (7.00 KB, application/gzip)
2013-09-06 15:33 UTC, Alex Williamson
Details
dmesg output with Rafael's patches + s/dbg/pr_info/ in hotplug_event() (uncompressed) (150.46 KB, text/plain)
2013-09-06 15:45 UTC, Bjorn Helgaas
Details
lspci -vv (uncompressed) (100.98 KB, text/plain)
2013-09-06 15:46 UTC, Bjorn Helgaas
Details
acpidump (236.21 KB, text/plain)
2013-09-07 18:55 UTC, Alex Williamson
Details
ACPIPHP: Avoid bus scans for spurious device check notifications (1.83 KB, patch)
2013-09-07 21:12 UTC, Rafael J. Wysocki
Details | Diff

Description Alex Williamson 2013-09-06 15:32:47 UTC
Created attachment 107458 [details]
dmesg output with Rafael's patches + s/dbg/pr_info/ in hotplug_event()

Recent changes to acpiphp result in significantly more console output.  On a Lenovo S20 workstation:

CONFIG_HOTPLUG_PCI_ACPI=y
$ dmesg | wc
  5697  49935 384368
$ dmesg | tail --lines=1
[   53.137123] Ebtables v2.0 registered

# CONFIG_HOTPLUG_PCI_ACPI is not set
$ dmesg | wc
 1053  9176 71652
$dmesg | tail --lines=1
[   28.917220] Ebtables v2.0 registered

This is partially resolved by Rafael's patches:

https://lkml.org/lkml/2013/9/6/197

This gets us down to:

$ dmesg | wc
   2169   19047  152710
$ dmesg | tail --lines=1
[   39.980918] Ebtables v2.0 registered

But we're still getting an extra 1100 lines of output, slowing down the boot by about 10s with acpiphp enabled.

The extra output is almost entirely from repeated bus scans caused by device checks on the root ports:

$ dmesg | colrm 1 15 | sort | uniq -c | sort -nr | head --lines=25
     73 pci 0000:00:1f.0: BAR 13: [io  0x1000-0x107f] has bogus alignment
     73 pci 0000:00:1e.0: PCI bridge to [bus 06]
     64 pci 0000:00:1e.0:   bridge window [mem 0x81100000-0x812fffff 64bit pref]
     64 pci 0000:00:1e.0:   bridge window [mem 0x80f00000-0x810fffff]
     64 pci 0000:00:1e.0:   bridge window [io  0x7000-0x7fff]
     38 pci 0000:00:1c.4: PCI bridge to [bus 05]
     38 pci 0000:00:1c.4:   bridge window [mem 0xf4f00000-0xf4ffffff]
     38 pci 0000:00:1c.0: PCI bridge to [bus 04]
     38 pci 0000:00:07.0: PCI bridge to [bus 03]
     38 pci 0000:00:07.0:   bridge window [mem 0xf2000000-0xf40fffff]
     38 pci 0000:00:07.0:   bridge window [mem 0xe0000000-0xf1ffffff 64bit pref]
     38 pci 0000:00:07.0:   bridge window [io  0x4000-0x4fff]
     38 pci 0000:00:03.0: PCI bridge to [bus 02]
     38 pci 0000:00:03.0:   bridge window [mem 0xf4e00000-0xf4efffff]
     38 pci 0000:00:03.0:   bridge window [mem 0xd0000000-0xdfffffff 64bit pref]
     38 pci 0000:00:03.0:   bridge window [io  0x3000-0x3fff]
     38 pci 0000:00:01.0: PCI bridge to [bus 01]
     38 pci 0000:00:01.0:   bridge window [mem 0xf4100000-0xf4bfffff]
     38 pci 0000:00:01.0:   bridge window [io  0x2000-0x2fff]
     37 pci 0000:00:1c.4:   bridge window [mem 0x80c00000-0x80dfffff 64bit pref]
     37 pci 0000:00:1c.4:   bridge window [io  0x6000-0x6fff]
     37 pci 0000:00:1c.0:   bridge window [mem 0x80a00000-0x80bfffff 64bit pref]
     37 pci 0000:00:1c.0:   bridge window [mem 0x80800000-0x809fffff]
     37 pci 0000:00:1c.0:   bridge window [io  0x5000-0x5fff]
     36 pci 0000:00:01.0:   bridge window [mem 0x80000000-0x807fffff 64bit pref]

Converting 'dbg' to 'pr_info' in hotplug_event(), we see this:

$ dmesg | grep "check notify"
[    1.633228] hotplug_event: Device check notify on \_SB_.PCI0.PEX2
[    2.472004] hotplug_event: Device check notify on \_SB_.PCI0.PEX3
[    2.477288] hotplug_event: Device check notify on \_SB_.PCI0.PEX4
[    2.482571] hotplug_event: Device check notify on \_SB_.PCI0.PEX5
[    2.482579] hotplug_event: Device check notify on \_SB_.PCI0.PEX6
[    8.204953] hotplug_event: Device check notify on \_SB_.PCI0.PEX2
[    8.209632] hotplug_event: Device check notify on \_SB_.PCI0.PEX3
[    8.214272] hotplug_event: Device check notify on \_SB_.PCI0.PEX4
[    8.218894] hotplug_event: Device check notify on \_SB_.PCI0.PEX5
[    8.218901] hotplug_event: Device check notify on \_SB_.PCI0.PEX6
Comment 1 Alex Williamson 2013-09-06 15:33:29 UTC
Created attachment 107459 [details]
lspci -vv
Comment 2 Bjorn Helgaas 2013-09-06 15:45:18 UTC
Created attachment 107460 [details]
dmesg output with Rafael's patches + s/dbg/pr_info/ in hotplug_event() (uncompressed)

Uncompressed for easy viewing.
Comment 3 Bjorn Helgaas 2013-09-06 15:46:53 UTC
Created attachment 107461 [details]
lspci -vv (uncompressed)
Comment 4 Rafael J. Wysocki 2013-09-07 00:28:14 UTC
Alex, can you please attach the output from acpidump on the affected machine?
Comment 5 Rafael J. Wysocki 2013-09-07 00:56:27 UTC
From dmesg in comment #2 it looks like we go to the host bridge and scan the entire PCI hierarchy every time, which is suspicious.  If the PEXn objects really are ports, I'd expect that we'd scan only the branch below each of them on a device notify.  For example, this:

Device check notify on \_SB_.PCI0.PEX2

should trigger the scan below PEX2, *unless* it is not regarded as a bridge.
Comment 6 Alex Williamson 2013-09-07 18:55:32 UTC
Created attachment 107471 [details]
acpidump
Comment 7 Rafael J. Wysocki 2013-09-07 20:56:03 UTC
OK, so now it's clear what happens.

PEX2, PEX3, PEX4 and PEX6 are ACPI device objects that have no corresponding PCI devices in the system, so when a device check notify for one of them occurs, we go to the parent bus (which is a root bus) and rescan from there.

PEX5 actually corresponds to a PCIe port, so the notify for that object doesn't generate the excess output.

Clearly, we should not blindly go for rescanning the parent if the notify is for an object without an existing physical device corresponding to it.
Comment 8 Rafael J. Wysocki 2013-09-07 21:12:17 UTC
Created attachment 107491 [details]
ACPIPHP: Avoid bus scans for spurious device check notifications

Alex, can you please try this patch?

It's purpose is to skip the parent bus scan if we don't find anything new in the given slot after a device check notify.

And by the way, I don't think there are too many systems like this out there and I do think that it's pretty damn broken (it is not OK to send device check notifications for devices that aren't and have never been physically present in my opinion).
Comment 10 Alex Williamson 2013-09-09 16:31:41 UTC
(In reply to Rafael J. Wysocki from comment #8)
> Created attachment 107491 [details]
> ACPIPHP: Avoid bus scans for spurious device check notifications
> 
> Alex, can you please try this patch?

Works well, new stats below.  Handful more lines, mostly from still having the s/dbg/pr_info/ additional output.  Timing is well within the noise factor for my system.  Not nearly as many repeated lines in dmesg.  Thanks!

$ dmesg | wc
   1066    9271   72456
$ dmesg | tail --lines=1
[   29.185420] Ebtables v2.0 registered
$ dmesg | colrm 1 15 | sort | uniq -c | sort -nr | head --lines=25
      7 SMP alternatives: lockdep: fixing up alternatives
      7 pci 0000:01:00.0: reg 0x184: [mem 0x00000000-0x00003fff 64bit]
      5 pci 0000:01:00.0: reg 0x190: [mem 0x00000000-0x00003fff 64bit]
      5 ACPI Warning: 
      5  (20130725/utaddress-251)
      4 pci 0000:01:00.1: reg 0x184: [mem 0x00000000-0x00003fff 64bit]
      3 pci 0000:01:00.1: reg 0x190: [mem 0x00000000-0x00003fff 64bit]
      3 ACPI: If an ACPI driver is available for this device, you should use it instead of the native driver
      2 reg 3, base: 12GB, range: 2GB, type WB
      2 reg 2, base: 8GB, range: 4GB, type WB
      2 pci 0000:00:1c.4: PCI bridge to [bus 05]
      2 pci 0000:00:1c.4:   bridge window [mem 0xf4f00000-0xf4ffffff]
      2 pci 0000:00:1c.0: PCI bridge to [bus 04]
      2 pci 0000:00:07.0: PCI bridge to [bus 03]
      2 pci 0000:00:07.0:   bridge window [mem 0xf2000000-0xf40fffff]
      2 pci 0000:00:07.0:   bridge window [mem 0xe0000000-0xf1ffffff 64bit pref]
      2 pci 0000:00:07.0:   bridge window [io  0x4000-0x4fff]
      2 pci 0000:00:03.0: PCI bridge to [bus 02]
      2 pci 0000:00:03.0:   bridge window [mem 0xf4e00000-0xf4efffff]
      2 pci 0000:00:03.0:   bridge window [mem 0xd0000000-0xdfffffff 64bit pref]
      2 pci 0000:00:03.0:   bridge window [io  0x3000-0x3fff]
      2 pci 0000:00:01.0: PCI bridge to [bus 01]
      2 pci 0000:00:01.0:   bridge window [mem 0xf4100000-0xf4bfffff]
      2 pci 0000:00:01.0:   bridge window [io  0x2000-0x2fff]
      2 No AGP bridge found
Comment 11 Rafael J. Wysocki 2013-09-09 20:45:41 UTC
OK, let's mark it as resolved, then.  Thanks for testing!
Comment 12 Rafael J. Wysocki 2013-09-15 04:36:00 UTC
Fixed by commit a47d8c8 ACPI / hotplug / PCI: Avoid parent bus rescans on spurious device checks.

Note You need to log in before you can comment on or make changes to this bug.