Parag Warudkar reported (see URL above for original report): No issues with fresh boot, however after resume from suspend I see these messages - [11548.934625] pci_bus 0000:03: Allocating resources [11548.934645] pci 0000:02:00.0: bridge window [io 0x1000-0x0fff] to [bus 03] add_size 1000 [11548.934648] pci 0000:02:00.0: bridge window [mem 0x00100000-0x000fffff 64bit pref] to [bus 03] add_size 200000 [11548.934650] pci 0000:02:00.0: bridge window [mem 0x00100000-0x000fffff] to [bus 03] add_size 200000 [11548.934653] pci 0000:02:00.0: res[14]=[mem 0x00100000-0x000fffff] get_res_add_size add_size 200000 [11548.934655] pci 0000:02:00.0: res[15]=[mem 0x00100000-0x000fffff 64bit pref] get_res_add_size add_size 200000 [11548.934656] pci 0000:02:00.0: res[13]=[io 0x1000-0x0fff] get_res_add_size add_size 1000 [11548.934659] pci 0000:02:00.0: BAR 14: can't assign mem (size 0x200000) [11548.934661] pci 0000:02:00.0: BAR 15: can't assign mem pref (size 0x200000) [11548.934662] pci 0000:02:00.0: BAR 13: can't assign io (size 0x1000) [11548.934664] pci 0000:02:00.0: BAR 14: can't assign mem (size 0x200000) [11548.934666] pci 0000:02:00.0: BAR 15: can't assign mem pref (size 0x200000) [11548.934667] pci 0000:02:00.0: BAR 13: can't assign io (size 0x1000) [11548.934669] pci 0000:02:00.0: PCI bridge to [bus 03] [11548.934697] pci 0000:02:00.0: no hotplug settings from platform [11548.934698] pci 0000:02:00.0: using default PCI settings The device in question is - 02:00.0 PCI bridge: Integrated Technology Express, Inc. Device 8893 (rev 52) (prog-if 01 [Subtractive decode]) Physical Slot: 5 Flags: bus master, fast devsel, latency 0 Bus: primary=02, secondary=03, subordinate=03, sec-latency=64 Capabilities: [90] Power Management version 2 Capabilities: [a0] Subsystem: Hewlett-Packard Company Device 1905 After resume the sec-latency is set to 128 instead of 64. This seems to be VTd related - as if I disable VTd in BIOS the message doesn't appear. I saw an older commit which to me looked liked it fixed a similar issue - PCI : Calculate right add_size commit a4ac9fea016fc5c09227eb479bd35e34978323a4 upstream. During debug of one SRIOV enabled hotplug device, we found found that add_size is not passed properly. I am running Ubuntu daily mainline build from couple days ago - I think it should include the above commit. Let me know if more info is needed - I don't see any issues because of these but that may be because I am not using VTd for anything.
Created attachment 130971 [details] dmesg (daily Ubuntu build of v3.14-rc)
Created attachment 130981 [details] lspci -vvxxxx
Created attachment 131011 [details] acipdump with VTd enabled
My mistake on the VTd making a difference part - VTd does not make a difference. So only one acpidump is attached. Also this message doesn't appear on 3.11 Ubuntu LTS kernel. So this is possibly a regression (unless Ubuntu LTS kernel has some PCI related patches that mainline 3.11 doesn't).
Created attachment 131731 [details] ACPI namespace disassembly Disassembly of the acpidump from comment #3.
Created attachment 132381 [details] bus check events and backtrace where warning occurs Parag collected this backtrace, which also notes the Bus Check notifications received. The "Bus check in hotplug_event()" message is printed before we call acpiphp_check_bridge(), so it looks like this is related to RP04.
Not sure how or if this is related but recent kernels trigger the below WARN_ON on fresh boot which seems to be related to pcie bridge. > if (pci_pcie_type(pdev) != PCI_EXP_TYPE_PCI_BRIDGE) { // this one triggers [ 0.475907] WARNING: CPU: 2 PID: 1 at drivers/pci/search.c:46 pci_find_upstream_pcie_bridge+0xb6/0xd0() [ 0.475908] Modules linked in: [ 0.475910] CPU: 2 PID: 1 Comm: swapper/0 Not tainted 3.15.0-rc4+ #12 [ 0.475911] Hardware name: Hewlett-Packard HP Z230 Tower Workstation/1905, BIOS L51 v01.18 01/23/2014 [ 0.475912] 0000000000000009 ffff880445989d28 ffffffff816804a8 0000000000000000 [ 0.475914] ffff880445989d60 ffffffff810482cd ffff88044519a000 ffff8804451a0000 [ 0.475915] ffff8804451a0098 0000000000000000 0000000000000000 ffff880445989d70 [ 0.475916] Call Trace: [ 0.475921] [<ffffffff816804a8>] dump_stack+0x45/0x56 [ 0.475924] [<ffffffff810482cd>] warn_slowpath_common+0x7d/0xa0 [ 0.475926] [<ffffffff810483aa>] warn_slowpath_null+0x1a/0x20 [ 0.475927] [<ffffffff8135bd16>] pci_find_upstream_pcie_bridge+0xb6/0xd0 [ 0.475931] [<ffffffff8157128e>] intel_iommu_add_device+0x3e/0x220 [ 0.475933] [<ffffffff81566f00>] ? bus_set_iommu+0x50/0x50 [ 0.475934] [<ffffffff81566f2a>] add_iommu_group+0x2a/0x50 [ 0.475937] [<ffffffff81422c43>] bus_for_each_dev+0x63/0xa0 [ 0.475939] [<ffffffff81566ef8>] bus_set_iommu+0x48/0x50 [ 0.475941] [<ffffffff81d691e3>] intel_iommu_init+0x547/0x590 [ 0.475944] [<ffffffff81d223ee>] ? memblock_find_dma_reserve+0x120/0x120 [ 0.475946] [<ffffffff81d22400>] pci_iommu_init+0x12/0x3c [ 0.475949] [<ffffffff81000332>] do_one_initcall+0xf2/0x1a0 [ 0.475952] [<ffffffff81067d45>] ? parse_args+0x215/0x3e0 [ 0.475954] [<ffffffff81d1c029>] kernel_init_freeable+0x16c/0x1f1 [ 0.475955] [<ffffffff81d1b83d>] ? do_early_param+0x88/0x88 [ 0.475957] [<ffffffff81672470>] ? rest_init+0x80/0x80 [ 0.475958] [<ffffffff8167247e>] kernel_init+0xe/0xf0 [ 0.475960] [<ffffffff8168ffac>] ret_from_fork+0x7c/0xb0 [ 0.475962] [<ffffffff81672470>] ? rest_init+0x80/0x80 [ 0.475965] ---[ end trace 86136cb63c22d87f ]---
I think this WARN_ON is a different problem. Alex Williamson <alex.williamson@redhat.com> is working on a series of patches that should fix it. Here's a pointer (I think he might have updated it a little bit since then): http://lkml.kernel.org/r/20140501160128.17512.23609.stgit@bling.home
I'm experiencing random oops-es and userspace crashes after a suspend-resume cycle. I'm commenting on this bug because in dmesg I'm seeing the same BAR-related error messages after each suspend-resume cycle: [ 36.324051] PM: Finishing wakeup. [ 36.324104] r8169 0000:02:00.0: no hotplug settings from platform [ 36.324213] pci_bus 0000:04: Allocating resources [ 36.324339] pci 0000:03:00.0: bridge window [io 0x1000-0x0fff] to [bus 04] add_size 1000 [ 36.324343] pci 0000:03:00.0: bridge window [mem 0x00100000-0x000fffff 64bit pref] to [bus 04] add_size 200000 [ 36.324344] pci 0000:03:00.0: bridge window [mem 0x00100000-0x000fffff] to [bus 04] add_size 200000 [ 36.324347] pci 0000:03:00.0: res[14]=[mem 0x00100000-0x000fffff] get_res_add_size add_size 200000 [ 36.324349] pci 0000:03:00.0: res[15]=[mem 0x00100000-0x000fffff 64bit pref] get_res_add_size add_size 200000 [ 36.324350] pci 0000:03:00.0: res[13]=[io 0x1000-0x0fff] get_res_add_size add_size 1000 [ 36.324353] pci 0000:03:00.0: BAR 14: can't assign mem (size 0x200000) [ 36.324825] pci 0000:03:00.0: BAR 15: can't assign mem pref (size 0x200000) [ 36.324829] pci 0000:03:00.0: BAR 13: can't assign io (size 0x1000) [ 36.324832] pci 0000:03:00.0: BAR 14: can't assign mem (size 0x200000) [ 36.324833] pci 0000:03:00.0: BAR 15: can't assign mem pref (size 0x200000) [ 36.324835] pci 0000:03:00.0: BAR 13: can't assign io (size 0x1000) I'm seeing the same symptoms on two machines (with basically the same hardware). An example oops is as follows: BUG: Bad page state in process khugepaged pfn:33e5f page:ffffea0000cf97c0 count:67108864 mapcount:0 mapping: (null) index:0x0 page flags: 0x1ffff0000000000() Modules linked in: uvcvideo videobuf2_vmalloc videobuf2_memops videobuf2_core snd_usb_audio videodev snd_usbmidi_lib usbmon joydev hid_microsoft pl2303 usbserial snd_hda_codec_realtek snd_hda_codec_hdmi snd_hda_intel snd_hda_codec snd_hwdep hid_generic snd_pcm snd_page_alloc x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel snd_seq_midi snd_seq_midi_event kvm bnep crct10dif_pclmul crc32_pclmul rfcomm ghash_clmulni_intel snd_rawmidi usbhid aesni_intel usb_storage hid bluetooth aes_x86_64 lrw gf128mul glue_helper ablk_helper cryptd snd_seq serio_raw snd_seq_device lpc_ich snd_timer i915 parport_pc snd ppdev mac_hid video drm_kms_helper mei_me mei lp parport shpchp soundcore drm i2c_algo_bit binfmt_misc nfsd auth_rpcgss nfs_acl nfs lockd sunrpc fscache nls_iso8859_1 raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq raid1 psmouse ahci libahci r8169 raid0 mii multipath linear CPU: 0 PID: 51 Comm: khugepaged Not tainted 3.13.0-61-generic #100-Ubuntu Hardware name: Gigabyte Technology Co., Ltd. H81M-HD3/H81M-HD3, BIOS FA 12/01/2014 0000000000000fee ffff880405135b68 ffffffff81723700 ffffea0000cf97c0 ffff880405135b80 ffffffff8171df66 0000000000000002 ffff880405135c60 ffffffff81158cfa ffff88041f5f7e08 0000000200000000 0000000100000000 Call Trace: [<ffffffff81723700>] dump_stack+0x45/0x56 [<ffffffff8171df66>] bad_page.part.61+0xcf/0xe8 [<ffffffff81158cfa>] get_page_from_freelist+0x85a/0x930 [<ffffffff81158f54>] __alloc_pages_nodemask+0x184/0xb80 [<ffffffff8115d010>] ? release_pages+0x80/0x210 [<ffffffff811ab1cf>] khugepaged_scan_mm_slot+0x3bf/0xc90 [<ffffffff811abcef>] khugepaged+0x24f/0x460 [<ffffffff810ab390>] ? prepare_to_wait_event+0x100/0x100 [<ffffffff811abaa0>] ? khugepaged_scan_mm_slot+0xc90/0xc90 [<ffffffff8108b7d2>] kthread+0xd2/0xf0 [<ffffffff8108b700>] ? kthread_create_on_node+0x1c0/0x1c0 [<ffffffff81734228>] ret_from_fork+0x58/0x90 [<ffffffff8108b700>] ? kthread_create_on_node+0x1c0/0x1c0 In one of the machines, everything was working fine, untilthe motherboard was replaced - with the same model, but newer bios: old: Gigabyte Technology Co., Ltd. H81M-HD3/H81M-HD3, BIOS F3 01/20/2014 (works ok) new: Gigabyte Technology Co., Ltd. H81M-HD3/H81M-HD3, BIOS FA 12/01/2014 (shows the above errors, and crashes randomly after suspend-resume) I can see in the logs for the older motherboard that the BAR's were assigned correctly: Aug 10 19:52:02 andrzej kernel: [32251.585921] pci 0000:03:00.0: res[14]=[mem 0x00100000-0x000fffff] get_res_add_size add_size 200000 Aug 10 19:52:02 andrzej kernel: [32251.585922] pci 0000:03:00.0: res[15]=[mem 0x00100000-0x000fffff 64bit pref] get_res_add_size add_size 200000 Aug 10 19:52:02 andrzej kernel: [32251.585923] pci 0000:03:00.0: res[13]=[io 0x1000-0x0fff] get_res_add_size add_size 1000 Aug 10 19:52:02 andrzej kernel: [32251.585926] pci 0000:03:00.0: BAR 14: assigned [mem 0xdfa00000-0xdfbfffff] Aug 10 19:52:02 andrzej kernel: [32251.585928] pci 0000:03:00.0: BAR 15: assigned [mem 0xdfc00000-0xdfdfffff 64bit pref] Aug 10 19:52:02 andrzej kernel: [32251.585931] pci 0000:03:00.0: BAR 13: assigned [io 0x2000-0x2fff] One more random note - since I don't really understand all this PCI and BAR stuff, I made some semi-random tests. 1. Calling "echo 1 > /sys/bus/pci/rescan" immediately after booting makes some dmesg noise: [ 23.759389] i915 0000:00:02.0: BAR 6: [??? 0x00000000 flags 0x2] has bogus alignment [ 23.759394] pci 0000:03:00.0: PCI bridge to [bus 04] 2. Calling the same command after a suspend-resume cycle, results in the following: [ 43.307132] pci 0000:03:00.0: bridge window [io 0x1000-0x0fff] to [bus 04] add_size 1000 [ 43.307135] pci 0000:03:00.0: bridge window [mem 0x00100000-0x000fffff 64bit pref] to [bus 04] add_size 200000 [ 43.307137] pci 0000:03:00.0: bridge window [mem 0x00100000-0x000fffff] to [bus 04] add_size 200000 [ 43.307142] pci 0000:03:00.0: res[13]=[io 0x1000-0x0fff] get_res_add_size add_size 1000 [ 43.307144] pcieport 0000:00:1c.3: bridge window [io 0x1000-0x0fff] to [bus 03-04] add_size 1000 [ 43.307145] pci 0000:03:00.0: res[15]=[mem 0x00100000-0x000fffff 64bit pref] get_res_add_size add_size 200000 [ 43.307146] pcieport 0000:00:1c.3: bridge window [mem 0x00100000-0x000fffff 64bit pref] to [bus 03-04] add_size 200000 [ 43.307147] pci 0000:03:00.0: res[14]=[mem 0x00100000-0x000fffff] get_res_add_size add_size 200000 [ 43.307148] pcieport 0000:00:1c.3: bridge window [mem 0x00100000-0x000fffff] to [bus 03-04] add_size 200000 [ 43.307155] i915 0000:00:02.0: BAR 6: [??? 0x00000000 flags 0x2] has bogus alignment [ 43.307157] pcieport 0000:00:1c.3: res[14]=[mem 0x00100000-0x000fffff] get_res_add_size add_size 200000 [ 43.307158] pcieport 0000:00:1c.3: res[15]=[mem 0x00100000-0x000fffff 64bit pref] get_res_add_size add_size 200000 [ 43.307159] pcieport 0000:00:1c.3: res[13]=[io 0x1000-0x0fff] get_res_add_size add_size 1000 [ 43.307162] pcieport 0000:00:1c.3: BAR 14: assigned [mem 0xdfa00000-0xdfbfffff] [ 43.307164] pcieport 0000:00:1c.3: BAR 15: assigned [mem 0xdfc00000-0xdfdfffff 64bit pref] [ 43.307167] pcieport 0000:00:1c.3: BAR 13: assigned [io 0x2000-0x2fff] [ 43.307169] pci 0000:03:00.0: res[14]=[mem 0x00100000-0x000fffff] get_res_add_size add_size 200000 [ 43.307170] pci 0000:03:00.0: res[15]=[mem 0x00100000-0x000fffff 64bit pref] get_res_add_size add_size 200000 [ 43.307171] pci 0000:03:00.0: res[13]=[io 0x1000-0x0fff] get_res_add_size add_size 1000 >>> [ 43.307173] pci 0000:03:00.0: BAR 14: assigned [mem >>> 0xdfa00000-0xdfbfffff] >>> [ 43.307174] pci 0000:03:00.0: BAR 15: assigned [mem >>> 0xdfc00000-0xdfdfffff 64bit pref] >>> [ 43.307175] pci 0000:03:00.0: BAR 13: assigned [io 0x2000-0x2fff] [ 43.307176] pci 0000:03:00.0: PCI bridge to [bus 04] [ 43.307179] pci 0000:03:00.0: bridge window [io 0x2000-0x2fff] [ 43.307186] pci 0000:03:00.0: bridge window [mem 0xdfa00000-0xdfbfffff] [ 43.307191] pci 0000:03:00.0: bridge window [mem 0xdfc00000-0xdfdfffff 64bit pref] 3. Subsequent suspend-resume cycles work fine without mentioning BAR 13-15 errors. I don't have any conclusive results regarding system stability.