Bug 73141 - Can't assign bridge window after resume
Summary: Can't assign bridge window after resume
Status: NEW
Alias: None
Product: Drivers
Classification: Unclassified
Component: PCI (show other bugs)
Hardware: All Linux
: P1 normal
Assignee: drivers_pci@kernel-bugs.osdl.org
URL: http://lkml.kernel.org/r/CAOULuOY5ery...
Keywords:
Depends on:
Blocks:
 
Reported: 2014-03-29 16:08 UTC by Bjorn Helgaas
Modified: 2015-08-20 08:56 UTC (History)
2 users (show)

See Also:
Kernel Version: v3.14-rc8
Subsystem:
Regression: No
Bisected commit-id:


Attachments
dmesg (daily Ubuntu build of v3.14-rc) (71.40 KB, text/plain)
2014-03-29 16:12 UTC, Bjorn Helgaas
Details
lspci -vvxxxx (96.65 KB, text/plain)
2014-03-29 16:12 UTC, Bjorn Helgaas
Details
acipdump with VTd enabled (396.13 KB, text/plain)
2014-03-29 23:12 UTC, paragw
Details
ACPI namespace disassembly (710.93 KB, text/plain)
2014-04-08 21:53 UTC, Bjorn Helgaas
Details
bus check events and backtrace where warning occurs (4.93 KB, text/plain)
2014-04-15 17:14 UTC, Bjorn Helgaas
Details

Description Bjorn Helgaas 2014-03-29 16:08:11 UTC
Parag Warudkar reported (see URL above for original report):

No issues with fresh boot, however after resume from suspend I see
these messages -

[11548.934625] pci_bus 0000:03: Allocating resources
[11548.934645] pci 0000:02:00.0: bridge window [io  0x1000-0x0fff] to
[bus 03] add_size 1000
[11548.934648] pci 0000:02:00.0: bridge window [mem
0x00100000-0x000fffff 64bit pref] to [bus 03] add_size 200000
[11548.934650] pci 0000:02:00.0: bridge window [mem
0x00100000-0x000fffff] to [bus 03] add_size 200000
[11548.934653] pci 0000:02:00.0: res[14]=[mem 0x00100000-0x000fffff]
get_res_add_size add_size 200000
[11548.934655] pci 0000:02:00.0: res[15]=[mem 0x00100000-0x000fffff
64bit pref] get_res_add_size add_size 200000
[11548.934656] pci 0000:02:00.0: res[13]=[io  0x1000-0x0fff]
get_res_add_size add_size 1000
[11548.934659] pci 0000:02:00.0: BAR 14: can't assign mem (size 0x200000)
[11548.934661] pci 0000:02:00.0: BAR 15: can't assign mem pref (size 0x200000)
[11548.934662] pci 0000:02:00.0: BAR 13: can't assign io (size 0x1000)
[11548.934664] pci 0000:02:00.0: BAR 14: can't assign mem (size 0x200000)
[11548.934666] pci 0000:02:00.0: BAR 15: can't assign mem pref (size 0x200000)
[11548.934667] pci 0000:02:00.0: BAR 13: can't assign io (size 0x1000)
[11548.934669] pci 0000:02:00.0: PCI bridge to [bus 03]
[11548.934697] pci 0000:02:00.0: no hotplug settings from platform
[11548.934698] pci 0000:02:00.0: using default PCI settings

The device in question is -

02:00.0 PCI bridge: Integrated Technology Express, Inc. Device 8893
(rev 52) (prog-if 01 [Subtractive decode])
    Physical Slot: 5
    Flags: bus master, fast devsel, latency 0
    Bus: primary=02, secondary=03, subordinate=03, sec-latency=64
    Capabilities: [90] Power Management version 2
    Capabilities: [a0] Subsystem: Hewlett-Packard Company Device 1905

After resume the sec-latency is set to 128 instead of 64.

This seems to be VTd related - as if I disable VTd in BIOS the message
doesn't appear.

I saw an older commit which to me looked liked it fixed a similar issue -

PCI : Calculate right add_size

    commit a4ac9fea016fc5c09227eb479bd35e34978323a4 upstream.

    During debug of one SRIOV enabled hotplug device, we found found that
    add_size is not passed properly.

I am running Ubuntu daily mainline build from couple days ago - I
think it should include the above commit.

Let me know if more info is needed - I don't see any issues because of
these but that may be because I am not using VTd for anything.
Comment 1 Bjorn Helgaas 2014-03-29 16:12:05 UTC
Created attachment 130971 [details]
dmesg (daily Ubuntu build of v3.14-rc)
Comment 2 Bjorn Helgaas 2014-03-29 16:12:56 UTC
Created attachment 130981 [details]
lspci -vvxxxx
Comment 3 paragw 2014-03-29 23:12:52 UTC
Created attachment 131011 [details]
acipdump with VTd enabled
Comment 4 paragw 2014-03-30 00:03:33 UTC
My mistake on the VTd making a difference part - VTd does not make a difference. So only one acpidump is attached. Also this message doesn't appear on 3.11 Ubuntu LTS kernel. So this is possibly a regression (unless Ubuntu LTS kernel has some PCI related patches that mainline 3.11 doesn't).
Comment 5 Bjorn Helgaas 2014-04-08 21:53:15 UTC
Created attachment 131731 [details]
ACPI namespace disassembly

Disassembly of the acpidump from comment #3.
Comment 6 Bjorn Helgaas 2014-04-15 17:14:24 UTC
Created attachment 132381 [details]
bus check events and backtrace where warning occurs

Parag collected this backtrace, which also notes the Bus Check notifications received.

The "Bus check in hotplug_event()" message is printed before we call acpiphp_check_bridge(), so it looks like this is related to RP04.
Comment 7 paragw 2014-05-07 21:40:00 UTC
Not sure how or if this is related but recent kernels trigger the below WARN_ON on fresh boot which seems to be related to pcie bridge.  

 > if (pci_pcie_type(pdev) != PCI_EXP_TYPE_PCI_BRIDGE) { // this one triggers

[    0.475907] WARNING: CPU: 2 PID: 1 at drivers/pci/search.c:46 pci_find_upstream_pcie_bridge+0xb6/0xd0()
[    0.475908] Modules linked in:
[    0.475910] CPU: 2 PID: 1 Comm: swapper/0 Not tainted 3.15.0-rc4+ #12
[    0.475911] Hardware name: Hewlett-Packard HP Z230 Tower Workstation/1905, BIOS L51 v01.18 01/23/2014
[    0.475912]  0000000000000009 ffff880445989d28 ffffffff816804a8 0000000000000000
[    0.475914]  ffff880445989d60 ffffffff810482cd ffff88044519a000 ffff8804451a0000
[    0.475915]  ffff8804451a0098 0000000000000000 0000000000000000 ffff880445989d70
[    0.475916] Call Trace:
[    0.475921]  [<ffffffff816804a8>] dump_stack+0x45/0x56
[    0.475924]  [<ffffffff810482cd>] warn_slowpath_common+0x7d/0xa0
[    0.475926]  [<ffffffff810483aa>] warn_slowpath_null+0x1a/0x20
[    0.475927]  [<ffffffff8135bd16>] pci_find_upstream_pcie_bridge+0xb6/0xd0
[    0.475931]  [<ffffffff8157128e>] intel_iommu_add_device+0x3e/0x220
[    0.475933]  [<ffffffff81566f00>] ? bus_set_iommu+0x50/0x50
[    0.475934]  [<ffffffff81566f2a>] add_iommu_group+0x2a/0x50
[    0.475937]  [<ffffffff81422c43>] bus_for_each_dev+0x63/0xa0
[    0.475939]  [<ffffffff81566ef8>] bus_set_iommu+0x48/0x50
[    0.475941]  [<ffffffff81d691e3>] intel_iommu_init+0x547/0x590
[    0.475944]  [<ffffffff81d223ee>] ? memblock_find_dma_reserve+0x120/0x120
[    0.475946]  [<ffffffff81d22400>] pci_iommu_init+0x12/0x3c
[    0.475949]  [<ffffffff81000332>] do_one_initcall+0xf2/0x1a0
[    0.475952]  [<ffffffff81067d45>] ? parse_args+0x215/0x3e0
[    0.475954]  [<ffffffff81d1c029>] kernel_init_freeable+0x16c/0x1f1
[    0.475955]  [<ffffffff81d1b83d>] ? do_early_param+0x88/0x88
[    0.475957]  [<ffffffff81672470>] ? rest_init+0x80/0x80
[    0.475958]  [<ffffffff8167247e>] kernel_init+0xe/0xf0
[    0.475960]  [<ffffffff8168ffac>] ret_from_fork+0x7c/0xb0
[    0.475962]  [<ffffffff81672470>] ? rest_init+0x80/0x80
[    0.475965] ---[ end trace 86136cb63c22d87f ]---
Comment 8 Bjorn Helgaas 2014-05-07 22:08:42 UTC
I think this WARN_ON is a different problem.  Alex Williamson <alex.williamson@redhat.com> is working on a series of patches that should fix it.  Here's a pointer (I think he might have updated it a little bit since then):

  http://lkml.kernel.org/r/20140501160128.17512.23609.stgit@bling.home
Comment 9 Andrzej Szombierski 2015-08-20 08:56:05 UTC
I'm experiencing random oops-es and userspace crashes after a suspend-resume cycle. I'm commenting on this bug because in dmesg I'm seeing the same BAR-related error messages after each suspend-resume cycle:

[   36.324051] PM: Finishing wakeup.
[   36.324104] r8169 0000:02:00.0: no hotplug settings from platform
[   36.324213] pci_bus 0000:04: Allocating resources
[   36.324339] pci 0000:03:00.0: bridge window [io  0x1000-0x0fff] to [bus 04] add_size 1000
[   36.324343] pci 0000:03:00.0: bridge window [mem 0x00100000-0x000fffff 64bit pref] to [bus 04] add_size 200000
[   36.324344] pci 0000:03:00.0: bridge window [mem 0x00100000-0x000fffff] to [bus 04] add_size 200000
[   36.324347] pci 0000:03:00.0: res[14]=[mem 0x00100000-0x000fffff] get_res_add_size add_size 200000
[   36.324349] pci 0000:03:00.0: res[15]=[mem 0x00100000-0x000fffff 64bit pref] get_res_add_size add_size 200000
[   36.324350] pci 0000:03:00.0: res[13]=[io  0x1000-0x0fff] get_res_add_size add_size 1000
[   36.324353] pci 0000:03:00.0: BAR 14: can't assign mem (size 0x200000)
[   36.324825] pci 0000:03:00.0: BAR 15: can't assign mem pref (size 0x200000)
[   36.324829] pci 0000:03:00.0: BAR 13: can't assign io (size 0x1000)
[   36.324832] pci 0000:03:00.0: BAR 14: can't assign mem (size 0x200000)
[   36.324833] pci 0000:03:00.0: BAR 15: can't assign mem pref (size 0x200000)
[   36.324835] pci 0000:03:00.0: BAR 13: can't assign io (size 0x1000)

I'm seeing the same symptoms on two machines (with basically the same hardware). An example oops is as follows:
 BUG: Bad page state in process khugepaged  pfn:33e5f
 page:ffffea0000cf97c0 count:67108864 mapcount:0 mapping:          (null) index:0x0
 page flags: 0x1ffff0000000000()
 Modules linked in: uvcvideo videobuf2_vmalloc videobuf2_memops videobuf2_core snd_usb_audio videodev snd_usbmidi_lib usbmon joydev hid_microsoft pl2303 usbserial snd_hda_codec_realtek snd_hda_codec_hdmi snd_hda_intel snd_hda_codec snd_hwdep hid_generic snd_pcm snd_page_alloc x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel snd_seq_midi snd_seq_midi_event kvm bnep crct10dif_pclmul crc32_pclmul rfcomm ghash_clmulni_intel snd_rawmidi usbhid aesni_intel usb_storage hid bluetooth aes_x86_64 lrw gf128mul glue_helper ablk_helper cryptd snd_seq serio_raw snd_seq_device lpc_ich snd_timer i915 parport_pc snd ppdev mac_hid video drm_kms_helper mei_me mei lp parport shpchp soundcore drm i2c_algo_bit binfmt_misc nfsd auth_rpcgss nfs_acl nfs lockd sunrpc fscache nls_iso8859_1 raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq raid1 psmouse ahci libahci r8169 raid0 mii multipath linear
 CPU: 0 PID: 51 Comm: khugepaged Not tainted 3.13.0-61-generic #100-Ubuntu
 Hardware name: Gigabyte Technology Co., Ltd. H81M-HD3/H81M-HD3, BIOS FA 12/01/2014
  0000000000000fee ffff880405135b68 ffffffff81723700 ffffea0000cf97c0
  ffff880405135b80 ffffffff8171df66 0000000000000002 ffff880405135c60
  ffffffff81158cfa ffff88041f5f7e08 0000000200000000 0000000100000000
 Call Trace:
  [<ffffffff81723700>] dump_stack+0x45/0x56
  [<ffffffff8171df66>] bad_page.part.61+0xcf/0xe8
  [<ffffffff81158cfa>] get_page_from_freelist+0x85a/0x930
  [<ffffffff81158f54>] __alloc_pages_nodemask+0x184/0xb80
  [<ffffffff8115d010>] ? release_pages+0x80/0x210
  [<ffffffff811ab1cf>] khugepaged_scan_mm_slot+0x3bf/0xc90
  [<ffffffff811abcef>] khugepaged+0x24f/0x460
  [<ffffffff810ab390>] ? prepare_to_wait_event+0x100/0x100
  [<ffffffff811abaa0>] ? khugepaged_scan_mm_slot+0xc90/0xc90
  [<ffffffff8108b7d2>] kthread+0xd2/0xf0
  [<ffffffff8108b700>] ? kthread_create_on_node+0x1c0/0x1c0
  [<ffffffff81734228>] ret_from_fork+0x58/0x90
  [<ffffffff8108b700>] ? kthread_create_on_node+0x1c0/0x1c0

In one of the machines, everything was working fine, untilthe motherboard was replaced - with the same model, but newer bios:

old: Gigabyte Technology Co., Ltd. H81M-HD3/H81M-HD3, BIOS F3 01/20/2014 (works ok)
new: Gigabyte Technology Co., Ltd. H81M-HD3/H81M-HD3, BIOS FA 12/01/2014 (shows the above errors, and crashes randomly after suspend-resume)

I can see in the logs for the older motherboard that the BAR's were assigned correctly:
Aug 10 19:52:02 andrzej kernel: [32251.585921] pci 0000:03:00.0: res[14]=[mem 0x00100000-0x000fffff] get_res_add_size add_size 200000
Aug 10 19:52:02 andrzej kernel: [32251.585922] pci 0000:03:00.0: res[15]=[mem 0x00100000-0x000fffff 64bit pref] get_res_add_size add_size 200000
Aug 10 19:52:02 andrzej kernel: [32251.585923] pci 0000:03:00.0: res[13]=[io  0x1000-0x0fff] get_res_add_size add_size 1000
Aug 10 19:52:02 andrzej kernel: [32251.585926] pci 0000:03:00.0: BAR 14: assigned [mem 0xdfa00000-0xdfbfffff]
Aug 10 19:52:02 andrzej kernel: [32251.585928] pci 0000:03:00.0: BAR 15: assigned [mem 0xdfc00000-0xdfdfffff 64bit pref]
Aug 10 19:52:02 andrzej kernel: [32251.585931] pci 0000:03:00.0: BAR 13: assigned [io  0x2000-0x2fff]

One more random note - since I don't really understand all this PCI and BAR stuff, I made some semi-random tests. 
1. Calling "echo 1 > /sys/bus/pci/rescan" immediately after booting makes some dmesg noise:
[   23.759389] i915 0000:00:02.0: BAR 6: [??? 0x00000000 flags 0x2] has bogus alignment
[   23.759394] pci 0000:03:00.0: PCI bridge to [bus 04]

2. Calling the same command after a suspend-resume cycle, results in the following:
[   43.307132] pci 0000:03:00.0: bridge window [io  0x1000-0x0fff] to [bus 04] add_size 1000
[   43.307135] pci 0000:03:00.0: bridge window [mem 0x00100000-0x000fffff 64bit pref] to [bus 04] add_size 200000
[   43.307137] pci 0000:03:00.0: bridge window [mem 0x00100000-0x000fffff] to [bus 04] add_size 200000
[   43.307142] pci 0000:03:00.0: res[13]=[io  0x1000-0x0fff] get_res_add_size add_size 1000
[   43.307144] pcieport 0000:00:1c.3: bridge window [io  0x1000-0x0fff] to [bus 03-04] add_size 1000
[   43.307145] pci 0000:03:00.0: res[15]=[mem 0x00100000-0x000fffff 64bit pref] get_res_add_size add_size 200000
[   43.307146] pcieport 0000:00:1c.3: bridge window [mem 0x00100000-0x000fffff 64bit pref] to [bus 03-04] add_size 200000
[   43.307147] pci 0000:03:00.0: res[14]=[mem 0x00100000-0x000fffff] get_res_add_size add_size 200000
[   43.307148] pcieport 0000:00:1c.3: bridge window [mem 0x00100000-0x000fffff] to [bus 03-04] add_size 200000
[   43.307155] i915 0000:00:02.0: BAR 6: [??? 0x00000000 flags 0x2] has bogus alignment
[   43.307157] pcieport 0000:00:1c.3: res[14]=[mem 0x00100000-0x000fffff] get_res_add_size add_size 200000
[   43.307158] pcieport 0000:00:1c.3: res[15]=[mem 0x00100000-0x000fffff 64bit pref] get_res_add_size add_size 200000
[   43.307159] pcieport 0000:00:1c.3: res[13]=[io  0x1000-0x0fff] get_res_add_size add_size 1000
[   43.307162] pcieport 0000:00:1c.3: BAR 14: assigned [mem 0xdfa00000-0xdfbfffff]
[   43.307164] pcieport 0000:00:1c.3: BAR 15: assigned [mem 0xdfc00000-0xdfdfffff 64bit pref]
[   43.307167] pcieport 0000:00:1c.3: BAR 13: assigned [io  0x2000-0x2fff]
[   43.307169] pci 0000:03:00.0: res[14]=[mem 0x00100000-0x000fffff] get_res_add_size add_size 200000
[   43.307170] pci 0000:03:00.0: res[15]=[mem 0x00100000-0x000fffff 64bit pref] get_res_add_size add_size 200000
[   43.307171] pci 0000:03:00.0: res[13]=[io  0x1000-0x0fff] get_res_add_size add_size 1000
>>> [   43.307173] pci 0000:03:00.0: BAR 14: assigned [mem
>>> 0xdfa00000-0xdfbfffff]
>>> [   43.307174] pci 0000:03:00.0: BAR 15: assigned [mem
>>> 0xdfc00000-0xdfdfffff 64bit pref]
>>> [   43.307175] pci 0000:03:00.0: BAR 13: assigned [io  0x2000-0x2fff]
[   43.307176] pci 0000:03:00.0: PCI bridge to [bus 04]
[   43.307179] pci 0000:03:00.0:   bridge window [io  0x2000-0x2fff]
[   43.307186] pci 0000:03:00.0:   bridge window [mem 0xdfa00000-0xdfbfffff]
[   43.307191] pci 0000:03:00.0:   bridge window [mem 0xdfc00000-0xdfdfffff 64bit pref]

3. Subsequent suspend-resume cycles work fine without mentioning BAR 13-15 errors. I don't have any conclusive results regarding system stability.

Note You need to log in before you can comment on or make changes to this bug.