Bug 81841 - amd-iommu: kernel BUG & lockup after shutting down KVM guest using PCI passthrough/PCIe bridge
Summary: amd-iommu: kernel BUG & lockup after shutting down KVM guest using PCI passth...
Status: RESOLVED CODE_FIX
Alias: None
Product: Virtualization
Classification: Unclassified
Component: kvm (show other bugs)
Hardware: All Linux
: P1 normal
Assignee: virtualization_kvm
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2014-08-07 14:45 UTC by Marti Raudsepp
Modified: 2014-10-10 15:58 UTC (History)
3 users (show)

See Also:
Kernel Version: 3.16.0 (originally Ubuntu 3.13.0-32-generic)
Subsystem:
Regression: No
Bisected commit-id:


Attachments
crash_netconsole.txt (18.20 KB, text/plain)
2014-08-07 14:46 UTC, Marti Raudsepp
Details
startup_dmesg.txt (70.23 KB, text/plain)
2014-08-07 14:47 UTC, Marti Raudsepp
Details
dmidecode.txt (9.09 KB, text/plain)
2014-08-07 14:47 UTC, Marti Raudsepp
Details
Possible fix as a patch against v3.13 (1.48 KB, patch)
2014-08-12 09:36 UTC, Joerg Roedel
Details | Diff

Description Marti Raudsepp 2014-08-07 14:45:40 UTC
I have a Windows XP virtual machine in libvirt and I'm trying to use PCI passthrough to provide access to a legacy Dialogic ISDN card (0000:00:05.0). Since it's an old PCI device, there's also a PCIe-to-PCI bridge (0000:00:14.4). With some manual tinkering, the virtual machine starts up and passthrough works fine, but when I stop or shut down the virtual machine, I immediately get an oops in dmesg and after some time passes, the whole machine freezes.

I'm using the ASRock FM2A88X Extreme6+ motherboard, tried with the latest BIOS version 2.90 as well as beta version L3.16. AMD A10-7850K processor.

The same symptoms have also been reported before:
* 3.2.0: http://permalink.gmane.org/gmane.comp.emulators.kvm.devel/85138
* 3.0.6: https://www.mail-archive.com/kvm@vger.kernel.org/msg64854.html
* 2.6.37-rc6: http://marc.info/?l=kvm&m=129867567106942 - slightly different traceback

In order for the VM to successfully start up, I need to run the following commands manually, to bind the PCI bridge to pci-stub and then unbind:

modprobe pci-stub
echo '1022 780f' > /sys/bus/pci/drivers/pci-stub/new_id
echo 0000:00:14.4 > /sys/bus/pci/drivers/pci-stub/bind
echo 0000:00:14.4 > /sys/bus/pci/drivers/pci-stub/unbind
echo '1022 780f' > /sys/bus/pci/drivers/pci-stub/remove_id

(If I don't do this, I get the kernel message:
    pci-stub 0000:01:05.0: kvm assign device failed ret -16)

lspci -vt
-[0000:00]-+-00.0  Advanced Micro Devices, Inc. [AMD] Device 1422
           +-00.2  Advanced Micro Devices, Inc. [AMD] Device 1423
           +-01.0  Advanced Micro Devices, Inc. [AMD/ATI] Kaveri [Radeon R7 200 Series]
           +-01.1  Advanced Micro Devices, Inc. [AMD/ATI] Device 1308
           +-02.0  Advanced Micro Devices, Inc. [AMD] Device 1424
           +-03.0  Advanced Micro Devices, Inc. [AMD] Device 1424
           +-04.0  Advanced Micro Devices, Inc. [AMD] Device 1424
           +-10.0  Advanced Micro Devices, Inc. [AMD] FCH USB XHCI Controller
           +-10.1  Advanced Micro Devices, Inc. [AMD] FCH USB XHCI Controller
           +-11.0  Advanced Micro Devices, Inc. [AMD] FCH SATA Controller [AHCI mode]
           +-12.0  Advanced Micro Devices, Inc. [AMD] FCH USB OHCI Controller
           +-12.2  Advanced Micro Devices, Inc. [AMD] FCH USB EHCI Controller
           +-13.0  Advanced Micro Devices, Inc. [AMD] FCH USB OHCI Controller
           +-13.2  Advanced Micro Devices, Inc. [AMD] FCH USB EHCI Controller
           +-14.0  Advanced Micro Devices, Inc. [AMD] FCH SMBus Controller
           +-14.1  Advanced Micro Devices, Inc. [AMD] FCH IDE Controller
           +-14.2  Advanced Micro Devices, Inc. [AMD] FCH Azalia Controller
           +-14.3  Advanced Micro Devices, Inc. [AMD] FCH LPC Bridge
           +-14.4-[01]----05.0  Dialogic Corporation PRI
           +-14.5  Advanced Micro Devices, Inc. [AMD] FCH USB OHCI Controller
           +-15.0-[02]--
           +-15.2-[03]----00.0  ASMedia Technology Inc. ASM1042 SuperSpeed USB Host Controller
           +-15.3-[04]----00.0  Qualcomm Atheros QCA8171 Gigabit Ethernet
           +-18.0  Advanced Micro Devices, Inc. [AMD] Device 141a
           +-18.1  Advanced Micro Devices, Inc. [AMD] Device 141b
           +-18.2  Advanced Micro Devices, Inc. [AMD] Device 141c
           +-18.3  Advanced Micro Devices, Inc. [AMD] Device 141d
           +-18.4  Advanced Micro Devices, Inc. [AMD] Device 141e
           \-18.5  Advanced Micro Devices, Inc. [AMD] Device 141f

After shutting down, I get lots of oops messages; these are captured via netconsole.

[ 1949.942276] ------------[ cut here ]------------
[ 1949.942311] kernel BUG at /build/buildd/linux-3.13.0/drivers/iommu/amd_iommu.c:2382!
[ 1949.942342] invalid opcode: 0000 [#1] SMP
[ 1949.942359] Modules linked in: pci_stub ipt_MASQUERADE iptable_nat nf_nat_ipv4 nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack ipt_REJECT xt_CHECKSUM iptable_mangle xt_tcpudp bridge stp llc ip6table_filter ip6_tables iptable_filter ip_tables ebtable_nat ebtables x_tables nct6775 hwmon_vid snd_hda_codec_realtek netconsole kvm_amd snd_timer drm_kms_helper snd drm soundcore mac_hid i2c_algo_bit[ 1949.942716] Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./FM2A88X Extreme6+, BIOS L3.16 04/16/2014
[ 1949.942745] task: ffff8804284497f0 ti: ffff8800361a2000 task.ti: ffff8800361a2000
[ 1949.942767] RIP: 0010:[<ffffffff815eec8d>]  [<ffffffff815eec8d>] __detach_device+0xad/0xb0
[ 1949.942798] RSP: 0018:ffff8800361a3d00  EFLAGS: 00010046
[ 1949.942814] RAX: 0000000000000000 RBX: ffff880427210660 RCX: ffff8800361a3cb0
[ 1949.942834] RDX: dead000000100100 RSI: 0000000000000086 RDI: ffff880427210660
[ 1949.942855] RBP: ffff8800361a3d20 R08: 0000000000000046 R09: ffff8804299b5240
[ 1949.942875] R10: ffff88043ebf2f60 R11: 000ffffffffff000 R12: 0000000000000000
[ 1949.942895] R13: ffff880428596e10 R14: ffff880036019a80 R15: ffff880427210660
[ 1949.942916] FS:  00007fca76222980(0000) GS:ffff88043ec00000(0000) knlGS:0000000000000000
[ 1949.942939] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1949.942956] CR2: 00007fca65bc5e74 CR3: 0000000001c0e000 CR4: 00000000000407f0
[ 1949.942978] DR0: 0000000000000003 DR1: 00000000000000b0 DR2: 0000000000000001
[ 1949.942998] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[ 1949.943018] Stack:
[ 1949.943025]  dead000000100100 ffff880428596e00 ffff880428596e10 ffff880036019a80
[ 1949.943055]  ffff8800361a3d60 ffffffff815eed2e 0000000000000202 ffff880036019a80
[ 1949.943082]  ffff880036019a80 ffff880427cb0008 ffff88042752a300 ffff880424f3ad48
[ 1949.943110] Call Trace:
[ 1949.943123]  [<ffffffff815eed2e>] amd_iommu_domain_destroy+0x9e/0x160
[ 1949.943144]  [<ffffffff815eb3bb>] iommu_domain_free+0x1b/0x30
[ 1949.943176]  [<ffffffffa044fa63>] kvm_iommu_unmap_guest+0x53/0x60 [kvm]
[ 1949.943205]  [<ffffffffa0460299>] kvm_arch_destroy_vm+0x39/0x1f0 [kvm]
[ 1949.943227]  [<ffffffff810c696d>] ? synchronize_srcu+0x1d/0x20
[ 1949.943250]  [<ffffffffa04486ee>] kvm_put_kvm+0x10e/0x1c0 [kvm]
[ 1949.943273]  [<ffffffffa04487d8>] kvm_vcpu_release+0x18/0x20 [kvm]
[ 1949.943293]  [<ffffffff811be6d4>] __fput+0xe4/0x260
[ 1949.943309]  [<ffffffff811be89e>] ____fput+0xe/0x10
[ 1949.943326]  [<ffffffff81088174>] task_work_run+0xc4/0xe0
[ 1949.943344]  [<ffffffff81069c18>] do_exit+0x2b8/0xa50
[ 1949.943362]  [<ffffffff8109a7f0>] ? wake_up_state+0x10/0x20
[ 1949.943380]  [<ffffffff81077b5e>] ? signal_wake_up_state+0x1e/0x30
[ 1949.943400]  [<ffffffff81078ed2>] ? zap_other_threads+0x82/0xa0
[ 1949.943418]  [<ffffffff8106a42f>] do_group_exit+0x3f/0xa0
[ 1949.943435]  [<ffffffff8106a4a4>] SyS_exit_group+0x14/0x20
[ 1949.943455]  [<ffffffff8172c87f>] tracesys+0xe1/0xe6
[ 1949.943470] Code: fe ff ff eb b8 66 0f 1f 84 00 00 00 00 00 48 8b 35 39 f4 9d 00 49 39 f4
[ 1949.947837] ---[ end trace e6893b1ed79451c3 ]---
[ 1949.947853] Fixing recursive fault but reboot is needed!
[ 1950.189137] usb 10-1: reset high-speed USB device number 2 using xhci_hcd
[ 1950.240587] xhci_hcd 0000:03:00.0: xHCI xhci_drop_endpoint called with disabled ep ffff8804249eee00
[ 1950.240666] xhci_hcd 0000:03:00.0: xHCI xhci_drop_endpoint called with disabled ep ffff8804249eee40
[ 2045.294007] ------------[ cut here ]------------
[ 2045.294067] WARNING: CPU: 3 PID: 1083 at /build/buildd/linux-3.13.0/kernel/watchdog.c:245 watchdog_overflow_callback+0x9c/0xd0()
[ 2045.294142] Watchdog detected hard LOCKUP on cpu 3
[ 2045.294176] Modules linked in: pci_stub ipt_MASQUERADE iptable_nat nf_nat_ipv4 nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack ipt_REJECT xt_CHECKSUM iptable_mangle xt_tcpudp bridge stp llc ip6table_filter ip6_tables iptable_filter ip_tables ebtable_nat ebtables x_tables nct6775 hwmon_vid snd_hda_codec_realtek netconsole kvm_amd kvm crct10dif_pclmul crc32_pclmul ghash_clmulni_intel aesni_intel aes_x86_64 lrw gf128mul glue_helper ablk_helper cryptd configfs serio_raw edac_core edac_mce_amd k10temp snd_hda_codec_hdmi radeon snd_hda_intel snd_hda_codec snd_hwdep video snd_pcm ttm snd_page_alloc i2c_piix4 snd_timer drm_kms_helper snd drm soundcore mac_hid i2c_algo_bit lp parport usb_storage pata_acpi hid_generic usbhid hid alx psmouse mdio ahci pata_atiixp libahci
[ 2045.299884] CPU: 3 PID: 1083 Comm: irqbalance Tainted: G      D      3.13.0-32-generic #57-Ubuntu
[ 2045.302472] Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./FM2A88X Extreme6+, BIOS L3.16 04/16/2014
[ 2045.305088]  0000000000000009 ffff88043ed86c38 ffffffff8171bcb4 ffff88043ed86c80
[ 2045.307720]  ffff88043ed86c70 ffffffff810676cd ffff8804295c4000 0000000000000000
[ 2045.310342]  ffff88043ed86d88 0000000000000000 ffff88043ed86ef8 ffff88043ed86cd0
[ 2045.312921] Call Trace:
[ 2045.315411]  <NMI>  [<ffffffff8171bcb4>] dump_stack+0x45/0x56
[ 2045.317892]  [<ffffffff810676cd>] warn_slowpath_common+0x7d/0xa0
[ 2045.320309]  [<ffffffff8106773c>] warn_slowpath_fmt+0x4c/0x50
[ 2045.322665]  [<ffffffff8110d590>] ? restart_watchdog_hrtimer+0x50/0x50
[ 2045.324974]  [<ffffffff8110d62c>] watchdog_overflow_callback+0x9c/0xd0
[ 2045.327255]  [<ffffffff81144dae>] __perf_event_overflow+0x8e/0x240
[ 2045.329506]  [<ffffffff811458c4>] perf_event_overflow+0x14/0x20
[ 2045.331741]  [<ffffffff81029414>] x86_pmu_handle_irq+0x144/0x190
[ 2045.333961]  [<ffffffff81725b2b>] perf_event_nmi_handler+0x2b/0x50
[ 2045.336168]  [<ffffffff81725348>] nmi_handle.isra.3+0x88/0x180
[ 2045.338374]  [<ffffffff81725510>] do_nmi+0xd0/0x340
[ 2045.340574]  [<ffffffff817247b1>] end_repeat_nmi+0x1e/0x2e
[ 2045.342771]  [<ffffffff815f0450>] ? compose_msi_msg+0x90/0x90
[ 2045.344967]  [<ffffffff8136e9eb>] ? __write_lock_failed+0xb/0x20
[ 2045.347144]  [<ffffffff8136e9eb>] ? __write_lock_failed+0xb/0x20
[ 2045.349290]  [<ffffffff8136e9eb>] ? __write_lock_failed+0xb/0x20
[ 2045.351409]  <<EOE>>  [<ffffffff81723bb8>] _raw_write_lock_irqsave+0x28/0x30
[ 2045.353560]  [<ffffffff815efddc>] get_irq_table+0x2c/0x370
[ 2045.355714]  [<ffffffff8115c95e>] ? lru_cache_add+0xe/0x10
[ 2045.357852]  [<ffffffff81183638>] ? page_add_new_anon_rmap+0xd8/0x170
[ 2045.359983]  [<ffffffff815f04a8>] set_affinity+0x58/0x180
[ 2045.362109]  [<ffffffff815f9e75>] set_remapped_irq_affinity+0x25/0x40
[ 2045.364238]  [<ffffffff810c084c>] irq_do_set_affinity+0x1c/0x70
[ 2045.366350]  [<ffffffff810c0a38>] irq_set_affinity_locked+0xb8/0xf0
[ 2045.368454]  [<ffffffff810c0ab6>] __irq_set_affinity+0x46/0x70
[ 2045.370563]  [<ffffffff810c51f5>] write_irq_affinity.isra.6+0xd5/0x100
[ 2045.372663]  [<ffffffff810c5259>] irq_affinity_proc_write+0x19/0x20
[ 2045.374763]  [<ffffffff81222c4d>] proc_reg_write+0x3d/0x80
[ 2045.376854]  [<ffffffff811bcb64>] vfs_write+0xb4/0x1f0
[ 2045.378941]  [<ffffffff811bd599>] SyS_write+0x49/0xa0
[ 2045.381020]  [<ffffffff8172c87f>] tracesys+0xe1/0xe6
[ 2045.383092] ---[ end trace e6893b1ed79451c4 ]---
[ 2045.385173] perf samples too long (712292 > 2500), lowering kernel.perf_event_max_sample_rate to 50000
[ 2045.387310] INFO: NMI handler (perf_event_nmi_handler) took too long to run: 93.305 msecs
[ 2060.824098] INFO: rcu_sched detected stalls on CPUs/tasks: { 3} (detected by 1, t=15002 jiffies, g=5056, c=5055, q=0)
[ 2060.826366] sending NMI to all CPUs:
[ 2060.828573] NMI backtrace for cpu 1
[ 2060.828577] perf samples too long (706733 > 5000), lowering kernel.perf_event_max_sample_rate to 25000
[ 2060.833588] CPU: 1 PID: 0 Comm: swapper/1 Tainted: G      D W    3.13.0-32-generic #57-Ubuntu
[ 2060.836158] Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./FM2A88X Extreme6+, BIOS L3.16 04/16/2014
[ 2060.838776] task: ffff8804295147d0 ti: ffff880429520000 task.ti: ffff880429520000
[ 2060.841361] RIP: 0010:[<ffffffff8136d5c2>]  [<ffffffff8136d5c2>] __const_udelay+0x12/0x30
[ 2060.843929] RSP: 0018:ffff88043ec83df0  EFLAGS: 00000046
[ 2060.846469] RAX: 0000000001062560 RBX: 0000000000002710 RCX: 0000000000000004
[ 2060.849029] RDX: 0000000000e16bb4 RSI: 0000000000000100 RDI: 0000000000418958
[ 2060.851586] RBP: ffff88043ec83e08 R08: 0000000000000082 R09: 00000000000004ba
[ 2060.854135] R10: ffff880036b8a000 R11: 00000003937e3000 R12: ffffffff81c4e1c0
[ 2060.856674] R13: ffffffff81d137a0 R14: ffffffff81c4e1c0 R15: 0000000000000001
[ 2060.859194] FS:  00007f8d6a2ac840(0000) GS:ffff88043ec80000(0000) knlGS:0000000000000000
[ 2060.861752] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 2060.864281] CR2: 00007f43403ce000 CR3: 0000000424883000 CR4: 00000000000407e0
[ 2060.866791] Stack:
[ 2060.869242]  ffff88043ec83e08 ffffffff81044c7f ffff88043ec8e800 ffff88043ec83e60
[ 2060.871711]  ffffffff810cac21 ffffffff81c4e1c0 ffffffff00000003 0000000000000000
[ 2060.874122]  0000000000000001 ffff8804295147d0 0000000000000000 0000000000000001
[ 2060.876470] Call Trace:
[ 2060.878724]  <IRQ> 
[ 2060.878744]  [<ffffffff81044c7f>] ? arch_trigger_all_cpu_backtrace+0x8f/0xb0
[ 2060.883075]  [<ffffffff810cac21>] rcu_check_callbacks+0x631/0x650
[ 2060.885203]  [<ffffffff81076227>] update_process_times+0x47/0x70
[ 2060.887296]  [<ffffffff810d5cf5>] tick_sched_handle.isra.17+0x25/0x60
[ 2060.889379]  [<ffffffff810d5d71>] tick_sched_timer+0x41/0x60
[ 2060.891449]  [<ffffffff8108e5e7>] __run_hrtimer+0x77/0x1d0
[ 2060.893499]  [<ffffffff810d5d30>] ? tick_sched_handle.isra.17+0x60/0x60
[ 2060.895553]  [<ffffffff8108edaf>] hrtimer_interrupt+0xef/0x230
[ 2060.897596]  [<ffffffff81043097>] local_apic_timer_interrupt+0x37/0x60
[ 2060.899655]  [<ffffffff8172ea3f>] smp_apic_timer_interrupt+0x3f/0x60
[ 2060.901714]  [<ffffffff8172d3dd>] apic_timer_interrupt+0x6d/0x80
[ 2060.903768]  <EOI> 
[ 2060.903787]  [<ffffffff815ceaff>] ? cpuidle_enter_state+0x4f/0xc0
[ 2060.907889]  [<ffffffff815cec29>] cpuidle_idle_call+0xb9/0x1f0
[ 2060.909958]  [<ffffffff8101cebe>] arch_cpu_idle+0xe/0x30
[ 2060.912022]  [<ffffffff810bec65>] cpu_startup_entry+0xc5/0x290
[ 2060.914089]  [<ffffffff81040fd8>] start_secondary+0x218/0x2c0
[ 2060.916154] Code: 89 e5 ff 15 b9 07 92 00 5d c3 66 66 66 66 66 66 2e 0f 1f 84 00 00 00 00 00 55 48 8d 04 bd 00 00 00 00 65 48 8b 14 25 60 8d 0c 12 48 c1 e2 06 48 89 e5 48 29 ca f7 e2 48 8d 7a 01 ff 
[ 2060.920807] NMI backtrace for cpu 2
[ 2060.923013] CPU: 2 PID: 0 Comm: swapper/2 Tainted: G      D W    3.13.0-32-generic #57-Ubuntu
[ 2060.925257] Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./FM2A88X Extreme6+, BIOS L3.16 04/16/2014
[ 2060.927566] task: ffff880429515fc0 ti: ffff880429522000 task.ti: ffff880429522000
[ 2060.929877] RIP: 0010:[<ffffffff8141666e>]  [<ffffffff8141666e>] acpi_idle_do_entry+0x21/0x2b
[ 2060.932215] RSP: 0018:ffff880429523e28  EFLAGS: 00000093
[ 2060.934532] RAX: 000001dfd2d31300 RBX: ffff8804297548a8 RCX: 000000000000c710
[ 2060.936878] RDX: 0000000000001771 RSI: ffff88043ed00000 RDI: ffff8804297548a8
[ 2060.939233] RBP: ffff880429523e28 R08: ffff88043ed112d4 R09: 0000000000000018
[ 2060.941591] R10: 0000000000052971 R11: 000000000005f98e R12: ffff880429754800
[ 2060.943958] R13: 0000000000000002 R14: 0000000000000002 R15: ffffffff81c96ea8
[ 2060.946314] FS:  00007f43403bf880(0000) GS:ffff88043ed00000(0000) knlGS:0000000000000000
[ 2060.948692] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 2060.951083] CR2: 00007fe029130f64 CR3: 00000004263d9000 CR4: 00000000000407e0
[ 2060.953495] Stack:
[ 2060.955883]  ffff880429523e50 ffffffff814166f3 ffff880424c28000 ffffffff81c96de0
[ 2060.958357]  000001e034cff146 ffff880429523e88 ffffffff815ceaf0 ffff880429523f38
[ 2060.960854]  ffff880424c28000 0000000000000002 0000000000000002 ffffffff81c96de0
[ 2060.963368] Call Trace:
[ 2060.965849]  [<ffffffff814166f3>] acpi_idle_enter_simple+0x7b/0x99
[ 2060.968336]  [<ffffffff815ceaf0>] cpuidle_enter_state+0x40/0xc0
[ 2060.970793]  [<ffffffff815cec29>] cpuidle_idle_call+0xb9/0x1f0
[ 2060.973208]  [<ffffffff8101cebe>] arch_cpu_idle+0xe/0x30
[ 2060.975582]  [<ffffffff810bec65>] cpu_startup_entry+0xc5/0x290
[ 2060.977965]  [<ffffffff81040fd8>] start_secondary+0x218/0x2c0
[ 2060.980317] Code: ff fa 66 66 90 66 66 90 5d c3 8a 47 08 55 48 89 e5 3c 01 75 07 e8 b3 94 c2 ff eb 17 3c 02 75 07 e8 ba ff ff ff eb 0c 8b 57 04 ec <48> 8b 15 cf a7 ba 00 ed 5d c3 66 66 66 66 90 55 48 63 c2 48 8d 
[ 2060.985600] NMI backtrace for cpu 0
[ 2060.985606] INFO: NMI handler (arch_trigger_all_cpu_backtrace_handler) took too long to run: 157.026 msecs
[ 2060.989906] CPU: 0 PID: 0 Comm: swapper/0 Tainted: G      D W    3.13.0-32-generic #57-Ubuntu
[ 2060.992076] Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./FM2A88X Extreme6+, BIOS L3.16 04/16/2014
[ 2060.994272] task: ffffffff81c15480 ti: ffffffff81c00000 task.ti: ffffffff81c00000
[ 2060.996487] RIP: 0010:[<ffffffff8141666e>]  [<ffffffff8141666e>] acpi_idle_do_entry+0x21/0x2b
[ 2060.998740] RSP: 0018:ffffffff81c01e38  EFLAGS: 00000093
[ 2061.000974] RAX: 000001dfca387700 RBX: ffff8804297540a8 RCX: 000000000000c710
[ 2061.003228] RDX: 0000000000001771 RSI: ffff88043ec00000 RDI: ffff8804297540a8
[ 2061.005471] RBP: ffffffff81c01e38 R08: ffff88043ec112d0 R09: 0000000000000018
[ 2061.007703] R10: 0000000000030cfc R11: 00000000000ce668 R12: ffff880429754000
[ 2061.009933] R13: 0000000000000002 R14: 0000000000000002 R15: ffffffff81c96ea8
[ 2061.012148] FS:  00007f6bd1547740(0000) GS:ffff88043ec00000(0000) knlGS:0000000000000000
[ 2061.014369] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 2061.016586] CR2: 00007f43403ce000 CR3: 0000000426388000 CR4: 00000000000407f0
[ 2061.018821] DR0: 0000000000000003 DR1: 00000000000000b0 DR2: 0000000000000001
[ 2061.021040] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[ 2061.023264] Stack:
[ 2061.025459]  ffffffff81c01e60 ffffffff814166f3 ffff880425310200 ffffffff81c96de0
[ 2061.027719]  000001e02c3342df ffffffff81c01e98 ffffffff815ceaf0 ffffffffffffffff
[ 2061.029977]  ffff880425310200 0000000000000002 0000000000000000 ffffffff81c96de0
[ 2061.032246] Call Trace:
[ 2061.034488]  [<ffffffff814166f3>] acpi_idle_enter_simple+0x7b/0x99
[ 2061.036763]  [<ffffffff815ceaf0>] cpuidle_enter_state+0x40/0xc0
[ 2061.039035]  [<ffffffff815cec29>] cpuidle_idle_call+0xb9/0x1f0
[ 2061.041301]  [<ffffffff8101cebe>] arch_cpu_idle+0xe/0x30
[ 2061.043556]  [<ffffffff810bec65>] cpu_startup_entry+0xc5/0x290
[ 2061.045812]  [<ffffffff8170a187>] rest_init+0x77/0x80
[ 2061.048062]  [<ffffffff81d35f70>] start_kernel+0x438/0x443
[ 2061.050303]  [<ffffffff81d35941>] ? repair_env_string+0x5c/0x5c
[ 2061.052543]  [<ffffffff81d35120>] ? early_idt_handlers+0x120/0x120
[ 2061.054782]  [<ffffffff81d355ee>] x86_64_start_reservations+0x2a/0x2c
[ 2061.057023]  [<ffffffff81d35733>] x86_64_start_kernel+0x143/0x152
[ 2061.059256] Code: ff fa 66 66 90 66 66 90 5d c3 8a 47 08 55 48 89 e5 3c 01 75 07 e8 b3 94 c2 ff eb 17 3c 02 75 07 e8 ba ff ff ff eb 0c 8b 57 <48> 8b 15 cf a7 ba 00 ed 5d c3 66 66 66 66 90 55 48 63 c2 48 8d 
[ 2061.064162] NMI backtrace for cpu 3
[ 2061.064167] INFO: NMI handler (arch_trigger_all_cpu_backtrace_handler) took too long to run: 235.588 msecs
[ 2061.068651] CPU: 3 PID: 1083 Comm: irqbalance Tainted: G      D W    3.13.0-32-generic #57-Ubuntu
[ 2061.070912] Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./FM2A88X Extreme6+, BIOS L3.16 04/16/2014
[ 2061.073181] task: ffff880427595fc0 ti: ffff88042772a000 task.ti: ffff88042772a000
[ 2061.075433] RIP: 0010:[<ffffffff8136e9ed>]  [<ffffffff8136e9ed>] __write_lock_failed+0xd/0x20
[ 2061.077704] RSP: 0018:ffff88042772bd30  EFLAGS: 00000087
[ 2061.079944] RAX: 0000000000000086 RBX: ffff8804295ecc60 RCX: ffffffff815f0450
[ 2061.082193] RDX: 0000000000000086 RSI: 0000000000000000 RDI: ffffffff81cd6cf0
[ 2061.084434] RBP: ffff88042772bd30 R08: 0000000000000286 R09: 0000000000000004
[ 2061.086675] R10: 000000000000000e R11: 0000000000000001 R12: 0000000000000080
[ 2061.088903] R13: 0000000000000080 R14: 00000000ffffffea R15: 0000000000000000
[ 2061.091127] FS:  00007f6340128780(0000) GS:ffff88043ed80000(0000) knlGS:0000000000000000
[ 2061.093353] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 2061.095565] CR2: 00007f634012d000 CR3: 00000004284d2000 CR4: 00000000000407e0
[ 2061.097784] Stack:
[ 2061.099975]  ffff88042772bd40 ffffffff81723bb8 ffff88042772bda8 ffffffff815efddc
[ 2061.102216]  ffff88042772bd60 ffffffff8115c95e ffff88042772bd88 ffffffff81183638
[ 2061.104441]  00007f634012d000 ffff880428b4a968 ffff8804295ecc60 ffff88042772be88
[ 2061.106657] Call Trace:
[ 2061.108834]  [<ffffffff81723bb8>] _raw_write_lock_irqsave+0x28/0x30
[ 2061.111030]  [<ffffffff815efddc>] get_irq_table+0x2c/0x370
[ 2061.113224]  [<ffffffff8115c95e>] ? lru_cache_add+0xe/0x10
[ 2061.115420]  [<ffffffff81183638>] ? page_add_new_anon_rmap+0xd8/0x170
[ 2061.117621]  [<ffffffff815f04a8>] set_affinity+0x58/0x180
[ 2061.119821]  [<ffffffff815f9e75>] set_remapped_irq_affinity+0x25/0x40
[ 2061.122027]  [<ffffffff810c084c>] irq_do_set_affinity+0x1c/0x70
[ 2061.124187]  [<ffffffff810c0a38>] irq_set_affinity_locked+0xb8/0xf0
[ 2061.126295]  [<ffffffff810c0ab6>] __irq_set_affinity+0x46/0x70
[ 2061.128396]  [<ffffffff810c51f5>] write_irq_affinity.isra.6+0xd5/0x100
[ 2061.130475]  [<ffffffff810c5259>] irq_affinity_proc_write+0x19/0x20
[ 2061.132527]  [<ffffffff81222c4d>] proc_reg_write+0x3d/0x80
[ 2061.134549]  [<ffffffff811bcb64>] vfs_write+0xb4/0x1f0
[ 2061.136535]  [<ffffffff811bd599>] SyS_write+0x49/0xa0
[ 2061.138492]  [<ffffffff8172c87f>] tracesys+0xe1/0xe6
[ 2061.140423] Code: 01 31 c0 66 66 90 c3 b8 f2 ff ff ff 66 66 90 c3 90 90 90 90 90 90 90 90 90 90 90 90 90 90 55 48 89 e5 f0 81 07 f3 90 <81> 3f 00 00 10 00 75 f6 f0 81 2f 00 00 10 00 75 e6 5d c3 55 48 
[ 2061.144744] INFO: NMI handler (arch_trigger_all_cpu_backtrace_handler) took too long to run: 316.161 msecs
[ 2067.038359] perf samples too long (701229 > 10000), lowering kernel.perf_event_max_sample_rate to 12500
[ 2088.782705] perf samples too long (695767 > 20000), lowering kernel.perf_event_max_sample_rate to 6250
[ 2110.527053] perf samples too long (690349 > 40000), lowering kernel.perf_event_max_sample_rate to 3250
[ 2132.271401] perf samples too long (684971 > 76923), lowering kernel.perf_event_max_sample_rate to 1750
[ 2154.015748] perf samples too long (679634 > 142857), lowering kernel.perf_event_max_sample_rate to 1000
Comment 1 Marti Raudsepp 2014-08-07 14:46:26 UTC
Created attachment 145421 [details]
crash_netconsole.txt
Comment 2 Marti Raudsepp 2014-08-07 14:47:12 UTC
Created attachment 145431 [details]
startup_dmesg.txt
Comment 3 Marti Raudsepp 2014-08-07 14:47:25 UTC
Created attachment 145441 [details]
dmidecode.txt
Comment 4 Marti Raudsepp 2014-08-07 15:30:08 UTC
Also occurs with freshly built mainline kernel version 3.16.0.

[   87.327457] ------------[ cut here ]------------
[   87.327488] kernel BUG at drivers/iommu/amd_iommu.c:2382!
[   87.327505] invalid opcode: 0000 [#1] SMP 
[   87.327526] Modules linked in: pci_stub(E) ipt_MASQUERADE(E) iptable_nat(E) nf_nat_ipv4(E) nf_nat(E) nf_conntrack_ipv4(E) nf_defrag_ipv4(E) xt_conntrack(E) nf_conntrack(E) ipt_REJECT(E) xt_CHECKSUM(E) iptable_mangle(E) xt_tcpudp(E) bridge(E) stp(E) llc(E) ip6table_filter(E) ip6_tables(E) iptable_filter(E) ip_tables(E) ebtable_nat(E) ebtables(E) x_tables(E) nct6775(E) hwmon_vid(E) radeon(E) kvm_amd(E) kvm(E) snd_hda_codec_realtek(E) snd_hda_codec_generic(E) snd_hda_codec_hdmi(E) snd_hda_intel(E) snd_hda_controller(E) snd_hda_codec(E) i2c_algo_bit(E) crct10dif_pclmul(E) drm_kms_helper(E) crc32_pclmul(E) ghash_clmulni_intel(E) snd_hwdep(E) aesni_intel(E) snd_pcm(E) ttm(E) aes_x86_64(E) glue_helper(E) netconsole(E) drm(E) lrw(E) snd_timer(E) configfs(E) snd(E) gf128mul(E) ablk_helper(E) cryptd(E) soundcore(E) lp(E) serio_raw(E) k10temp(E) i2c_piix4(E) mac_hid(E) video(E) parport(E) usb_storage(E) pata_acpi(E) hid_generic(E) usbhid(E) hid(E) alx(E) psmouse(E) mdio(E) pata_atiixp(E) ahci(E) libahci(E)
[   87.327963] CPU: 0 PID: 1452 Comm: qemu-system-x86 Tainted: G            E 3.16.0 #1
[   87.327986] Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./FM2A88X Extreme6+, BIOS L3.16 04/16/2014
[   87.328016] task: ffff880427a18000 ti: ffff880421280000 task.ti: ffff880421280000
[   87.328039] RIP: 0010:[<ffffffff816059dd>]  [<ffffffff816059dd>] __detach_device+0xad/0xb0
[   87.328071] RSP: 0018:ffff880421283b38  EFLAGS: 00010046
[   87.328088] RAX: 0000000000000000 RBX: ffff8804286e5240 RCX: ffff880421283ae0
[   87.328110] RDX: dead000000100100 RSI: 0000000000000086 RDI: ffff8804286e5240
[   87.328132] RBP: ffff880421283b58 R08: 0000000000000046 R09: ffff8804299b8900
[   87.328154] R10: ffff880000000000 R11: 000ffffffffff000 R12: 0000000000000000
[   87.328175] R13: ffff88042127a610 R14: ffff88042744c040 R15: ffff8804286e5240
[   87.328197] FS:  00007f1d03857700(0000) GS:ffff88043ec00000(0000) knlGS:0000000000000000
[   87.328221] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   87.328239] CR2: 00007f1d03dc63a0 CR3: 0000000001c13000 CR4: 00000000000407f0
[   87.328260] Stack:
[   87.328268]  dead000000100100 ffff88042127a600 ffff88042127a610 ffff88042744c040
[   87.328299]  ffff880421283b98 ffffffff81605a7e 0000000000000202 ffff88042744c040
[   87.328333]  ffff88042744c040 ffff880420a3c008 ffff8804242b0a80 ffff88007786dfd8
[   87.328365] Call Trace:
[   87.328378]  [<ffffffff81605a7e>] amd_iommu_domain_destroy+0x9e/0x160
[   87.328400]  [<ffffffff816022db>] iommu_domain_free+0x1b/0x30
[   87.328432]  [<ffffffffa03628a3>] kvm_iommu_unmap_guest+0x53/0x60 [kvm]
[   87.328461]  [<ffffffffa0373059>] kvm_arch_destroy_vm+0x39/0x1f0 [kvm]
[   87.328484]  [<ffffffff810cfebd>] ? synchronize_srcu+0x1d/0x20
[   87.328509]  [<ffffffffa035b26e>] kvm_put_kvm+0x10e/0x220 [kvm]
[   87.328535]  [<ffffffffa035b3b8>] kvm_vcpu_release+0x18/0x20 [kvm]
[   87.328556]  [<ffffffff811d0a04>] __fput+0xe4/0x220
[   87.328573]  [<ffffffff811d0b8e>] ____fput+0xe/0x10
[   87.328591]  [<ffffffff8108cd74>] task_work_run+0xc4/0xe0
[   87.328609]  [<ffffffff8106ef18>] do_exit+0x2b8/0xa60
[   87.328627]  [<ffffffff8106f73f>] do_group_exit+0x3f/0xa0
[   87.328645]  [<ffffffff8107f100>] get_signal_to_deliver+0x1d0/0x6f0
[   87.328668]  [<ffffffff81012548>] do_signal+0x48/0x9d0
[   87.328687]  [<ffffffff8111d1bc>] ? acct_account_cputime+0x1c/0x20
[   87.328708]  [<ffffffff810a372b>] ? account_user_time+0x8b/0xa0
[   87.329791]  [<ffffffff810a3cf4>] ? vtime_account_user+0x54/0x60
[   87.330869]  [<ffffffff81012f39>] do_notify_resume+0x69/0xb0
[   87.331950]  [<ffffffff8172b32a>] int_signal+0x12/0x17
[   87.333016] Code: fe ff ff eb b8 66 0f 1f 84 00 00 00 00 00 48 8b 35 69 b0 9a 00 49 39 f4 74 c1 48 89 df e8 8c fd ff ff 5b 41 5c 41 5d 41 5e 5d c3 <0f> 0b 90 66 66 66 66 90 55 48 89 e5 41 57 41 56 49 89 fe 41 55 
[   87.335373] RIP  [<ffffffff816059dd>] __detach_device+0xad/0xb0
[   87.336475]  RSP <ffff880421283b38>
[   87.337562] ---[ end trace bee5733468f37c81 ]---
Comment 5 Alex Williamson 2014-08-07 16:25:53 UTC
What if you use vfio-pci instead of pci-assign?  The BUG happens when the kernel tries to detach a device from the domain, but the device doesn't actually belong to a domain.  VFIO likely already avoids this because the bridge and device will both be in the same IOMMU group and therefore attached to the same domain.
Comment 6 Marti Raudsepp 2014-08-07 17:56:37 UTC
(In reply to Alex Williamson from comment #5)
> What if you use vfio-pci instead of pci-assign?

I run into the dreaded error:
  vfio: error, group 9 is not viable, please ensure all devices within the
  iommu_group are bound to their vfio bus driver

There are some proposed workarounds on the web, like passing vfio_iommu_type1.allow_unsafe_interrupts=1 or pci=realloc, but these seem to change nothing for me.

So I tried adding all the PCI devices in the IOMMU group as passthrough devices (including IDE, SMBus, audio and OHCI controllers). But then QEMU's SeaBIOS gets so confused it can no longer find a hard drive to boot off.

But you're right. At least I can stop the non-functional virtual machine now, so I've got that going for me, which is nice.
Comment 7 Alex Williamson 2014-08-07 18:11:42 UTC
(In reply to Marti Raudsepp from comment #6)
> (In reply to Alex Williamson from comment #5)
> > What if you use vfio-pci instead of pci-assign?
> 
> I run into the dreaded error:
>   vfio: error, group 9 is not viable, please ensure all devices within the
>   iommu_group are bound to their vfio bus driver
> 
> There are some proposed workarounds on the web, like passing
> vfio_iommu_type1.allow_unsafe_interrupts=1 or pci=realloc, but these seem to
> change nothing for me.

None of these remotely address the issue.  If you're running at least 3.12 there are quirks for the following AMD southbridge components:

 * 1002:4385 SBx00 SMBus Controller
 * 1002:439c SB7x0/SB8x0/SB9x0 IDE Controller
 * 1002:4383 SBx00 Azalia (Intel HDA)
 * 1002:439d SB7x0/SB8x0/SB9x0 LPC host controller
 * 1002:4384 SBx00 PCI to PCI Bridge
 * 1002:4399 SB7x0/SB8x0/SB9x0 USB OHCI2 Controller

If your bridge does not match these, then AMD will need to confirm whether isolation is provided between your devices.  There is an ACS override patch floating around which allows assuming device isolation, but this is generally a bad idea, can introduce obscure bugs, and will not be merged upstream.

> So I tried adding all the PCI devices in the IOMMU group as passthrough
> devices (including IDE, SMBus, audio and OHCI controllers). But then QEMU's
> SeaBIOS gets so confused it can no longer find a hard drive to boot off.

Note that it's not required to assign all the devices, they simply need to be detached from host drivers (ie. bound to pci-stub or vfio-pci).
Comment 8 Marti Raudsepp 2014-08-07 18:53:07 UTC
(In reply to Alex Williamson from comment #7)
> > There are some proposed workarounds on the web
> None of these remotely address the issue.

I see. This page claims so: http://www.ovirt.org/Features/hostdev_passthrough

> there are quirks for the following AMD southbridge components

Nope, mine are 1022:780b, 1022:780c, 1022:780d, 1022:780e, 1022:780f, 1022:7809

> If your bridge does not match these, then AMD will need to confirm whether
> isolation is provided between your devices.

How would I go about confirming that? What are the chances that they care, and provide accurate information to a random person?

> There is an ACS override patch

I already ran across it... https://bugzilla.redhat.com/show_bug.cgi?id=1113399
Would I be any worse off using this, compared to the old kvm pci-assign method?

> Note that it's not required to assign all the devices, they simply need to
> be detached from host drivers (ie. bound to pci-stub or vfio-pci).

Thanks, I will give it a shot tomorrow.
Comment 9 Alex Williamson 2014-08-07 19:47:57 UTC
(In reply to Marti Raudsepp from comment #8)
> (In reply to Alex Williamson from comment #7)
> > > There are some proposed workarounds on the web
> > None of these remotely address the issue.
> 
> I see. This page claims so: http://www.ovirt.org/Features/hostdev_passthrough

Sorry, it's wrong.

> > there are quirks for the following AMD southbridge components
> 
> Nope, mine are 1022:780b, 1022:780c, 1022:780d, 1022:780e, 1022:780f,
> 1022:7809
> 
> > If your bridge does not match these, then AMD will need to confirm whether
> > isolation is provided between your devices.
> 
> How would I go about confirming that? What are the chances that they care,
> and provide accurate information to a random person?

AMD would need to confirm it.  IOMMU groups are based on hardware advertised isolation via the PCIe ACS capability.  Without this, or a device specific quirk to take its place, IOMMU groups must assume that peer-to-peer between functions of a multi-function device is possible and therefore that the devices are not isolated.  Chances are that this new chipset in your system is taking the exact same ASICs that were deemed not to do peer-to-peer on previous chipsets, but we need that confirmation from AMD.  Alex Deucher (see MAINTAINERS) may have contacts available that can make that statement.

> > There is an ACS override patch
> 
> I already ran across it...
> https://bugzilla.redhat.com/show_bug.cgi?id=1113399
> Would I be any worse off using this, compared to the old kvm pci-assign
> method?

I think the path forward is to get confirmation from AMD that these function are isolated from each other and add quirks to the kernel.  Then you won't have the device dependencies in vfio-pci.  The override patch allows you to do that with just a kernel boot parameter.  There's no gurantee that pci-assign will ever be fixed since it's being phased out.
Comment 10 Joel Schopp 2014-08-08 02:48:14 UTC
(In reply to Alex Williamson from comment #9)
> (In reply to Marti Raudsepp from comment #8)
> > (In reply to Alex Williamson from comment #7)
> > > > There are some proposed workarounds on the web
> > > None of these remotely address the issue.
> > 
> > I see. This page claims so:
> http://www.ovirt.org/Features/hostdev_passthrough
> 
> Sorry, it's wrong.
> 
> > > there are quirks for the following AMD southbridge components
> > 
> > Nope, mine are 1022:780b, 1022:780c, 1022:780d, 1022:780e, 1022:780f,
> > 1022:7809
> > 
> > > If your bridge does not match these, then AMD will need to confirm
> whether
> > > isolation is provided between your devices.
> > 
> > How would I go about confirming that? What are the chances that they care,
> > and provide accurate information to a random person?


Are you suggesting we'd provide innacurate information to a random person?


> 
> AMD would need to confirm it.  IOMMU groups are based on hardware advertised
> isolation via the PCIe ACS capability.  Without this, or a device specific
> quirk to take its place, IOMMU groups must assume that peer-to-peer between
> functions of a multi-function device is possible and therefore that the
> devices are not isolated.  Chances are that this new chipset in your system
> is taking the exact same ASICs that were deemed not to do peer-to-peer on
> previous chipsets, but we need that confirmation from AMD.  Alex Deucher
> (see MAINTAINERS) may have contacts available that can make that statement.

I don't have an answer for you offhand.  Let me do some digging and get you an answer.

> 
> > > There is an ACS override patch
> > 
> > I already ran across it...
> > https://bugzilla.redhat.com/show_bug.cgi?id=1113399
> > Would I be any worse off using this, compared to the old kvm pci-assign
> > method?
> 
> I think the path forward is to get confirmation from AMD that these function
> are isolated from each other and add quirks to the kernel.  Then you won't
> have the device dependencies in vfio-pci.  The override patch allows you to
> do that with just a kernel boot parameter.  There's no gurantee that
> pci-assign will ever be fixed since it's being phased out.
Comment 11 Marti Raudsepp 2014-08-08 10:19:01 UTC
(In reply to Joel Schopp from comment #10)
> > How would I go about confirming that? What are the chances that they care,
> > and provide accurate information to a random person?
> 
> Are you suggesting we'd provide innacurate information to a random person?

Yes, that's my experience with the "customer support" for desktop hardware.

Of course cutting support out of the equation and asking engineers directly is likely to give better results, that didn't occur to me at first.
Comment 12 Joerg Roedel 2014-08-12 09:36:23 UTC
Created attachment 146311 [details]
Possible fix as a patch against v3.13

Hi Marti,

Can you please test this patch? I think it should fix the issue.

Thanks, Joerg
Comment 13 Marti Raudsepp 2014-08-12 10:30:30 UTC
(In reply to Joerg Roedel from comment #12)
> Thanks, Joerg

Indeed. Thanks, Joerg. And thanks everyone else too, you have been very helpful!

I didn't have v3.13 sources handy, but I applied the attachment 146311 [details] patch to 3.16.0 and it fixes the problem. (I verified that unpatched 3.16.0 also crashes).

I can start & shut down the VM multiple times without crashing the host and PCI passthrough works as expected.

Feel free to add
    Tested-by: Marti Raudsepp <marti@juffo.org>

(In reply to Alex Williamson from comment #7)
> Note that it's not required to assign all the devices, they simply need to
> be detached from host drivers (ie. bound to pci-stub or vfio-pci).

This approach also works; I think I will go this route for the production setup. Seems that we don't actually need any of the devices in the same IOMMU group.

(In reply to Joel Schopp from comment #10)
> > AMD would need to confirm it.
>
> I don't have an answer for you offhand.  Let me do some digging and get you
> an answer.

I am sorry if I sounded frustrated or arrogant earlier. Any update on this?
Comment 14 Joerg Roedel 2014-08-12 10:42:14 UTC
Hi Marti,

> Indeed. Thanks, Joerg. And thanks everyone else too, you have been very
> helpful!
> 
> I didn't have v3.13 sources handy, but I applied the attachment 146311 [details]

> I can start & shut down the VM multiple times without crashing the host and
> PCI passthrough works as expected.
> 
> Feel free to add
>     Tested-by: Marti Raudsepp <marti@juffo.org>

Thanks for testing the fix, I will send it upstream once the merge window is over -rc1 is released. I also added a stable tag so it gets backported.

Joerg
Comment 15 Joel Schopp 2014-08-12 14:53:30 UTC
> (In reply to Joel Schopp from comment #10)
> > > AMD would need to confirm it.
> >
> > I don't have an answer for you offhand.  Let me do some digging and get you
> > an answer.
> 
> I am sorry if I sounded frustrated or arrogant earlier. Any update on this?

It's not clear to me which devices were being put in the same group.  Here's some of my notes on your lspci output

lspci -vt
-[0000:00]-+-00.0  Advanced Micro Devices, Inc. [AMD] Device 1422
           +-00.2  Advanced Micro Devices, Inc. [AMD] Device 1423
           +-01.0  Advanced Micro Devices, Inc. [AMD/ATI] Kaveri [Radeon R7 200 Series]
           +-01.1  Advanced Micro Devices, Inc. [AMD/ATI] Device 1308
           +-02.0  Advanced Micro Devices, Inc. [AMD] Device 1424
           +-03.0  Advanced Micro Devices, Inc. [AMD] Device 1424
           +-04.0  Advanced Micro Devices, Inc. [AMD] Device 1424
           +-10.0  Advanced Micro Devices, Inc. [AMD] FCH USB XHCI Controller
           +-10.1  Advanced Micro Devices, Inc. [AMD] FCH USB XHCI Controller

These xhci controllers are isolated from from the other devices, I would need some more detail on which variant you are running to determine if they are isolated from eachother, they probably aren't.

           +-11.0  Advanced Micro Devices, Inc. [AMD] FCH SATA Controller [AHCI mode]
The sata controller is isolated from the other devices

           +-12.0  Advanced Micro Devices, Inc. [AMD] FCH USB OHCI Controller
           +-12.2  Advanced Micro Devices, Inc. [AMD] FCH USB EHCI Controller
This pair of OHCI/EHCI controllers are together isolated from the other devices

           +-13.0  Advanced Micro Devices, Inc. [AMD] FCH USB OHCI Controller
           +-13.2  Advanced Micro Devices, Inc. [AMD] FCH USB EHCI Controller
This pair of OHCI/EHCI controllers are together isolated from the other devices

           +-14.0  Advanced Micro Devices, Inc. [AMD] FCH SMBus Controller
           +-14.1  Advanced Micro Devices, Inc. [AMD] FCH IDE Controller
           +-14.2  Advanced Micro Devices, Inc. [AMD] FCH Azalia Controller
           +-14.3  Advanced Micro Devices, Inc. [AMD] FCH LPC Bridge
I do not think the SMBus/IDE/Azalia/LPC are isolated from eachother, but they are isolated from the other devices I have identified.


           +-14.4-[01]----05.0  Dialogic Corporation PRI
The legacy PCI should be isolated from the other devices identified.  Not sure what is going on here.

           +-14.5  Advanced Micro Devices, Inc. [AMD] FCH USB OHCI Controller
This OHCI Controller should also be isolated from the other devices.

           +-15.0-[02]--
           +-15.2-[03]----00.0  ASMedia Technology Inc. ASM1042 SuperSpeed USB Host Controller
Is this in a PCI-e slot or otherwise attached to the PCI-e?

           +-15.3-[04]----00.0  Qualcomm Atheros QCA8171 Gigabit Ethernet
Is this in a PCI-e slot or otherwise attached to the PCI-e?

           +-18.0  Advanced Micro Devices, Inc. [AMD] Device 141a
           +-18.1  Advanced Micro Devices, Inc. [AMD] Device 141b
           +-18.2  Advanced Micro Devices, Inc. [AMD] Device 141c
           +-18.3  Advanced Micro Devices, Inc. [AMD] Device 141d
           +-18.4  Advanced Micro Devices, Inc. [AMD] Device 141e
           \-18.5  Advanced Micro Devices, Inc. [AMD] Device 141f
Comment 16 Alex Williamson 2014-08-12 15:09:44 UTC
(In reply to Joel Schopp from comment #15)
> > (In reply to Joel Schopp from comment #10)
> > > > AMD would need to confirm it.
> > >
> > > I don't have an answer for you offhand.  Let me do some digging and get
> you
> > > an answer.
> > 
> > I am sorry if I sounded frustrated or arrogant earlier. Any update on this?
> 
> It's not clear to me which devices were being put in the same group.  Here's
> some of my notes on your lspci output

Marti, the output of 'find /sys/kernel/iommu_groups' would be useful here.  I'll try to help based on what I think is happening...

> lspci -vt
> -[0000:00]-+-00.0  Advanced Micro Devices, Inc. [AMD] Device 1422
>            +-00.2  Advanced Micro Devices, Inc. [AMD] Device 1423
>            +-01.0  Advanced Micro Devices, Inc. [AMD/ATI] Kaveri [Radeon R7
> 200 Series]
>            +-01.1  Advanced Micro Devices, Inc. [AMD/ATI] Device 1308
>            +-02.0  Advanced Micro Devices, Inc. [AMD] Device 1424
>            +-03.0  Advanced Micro Devices, Inc. [AMD] Device 1424
>            +-04.0  Advanced Micro Devices, Inc. [AMD] Device 1424
>            +-10.0  Advanced Micro Devices, Inc. [AMD] FCH USB XHCI Controller
>            +-10.1  Advanced Micro Devices, Inc. [AMD] FCH USB XHCI Controller
> 
> These xhci controllers are isolated from from the other devices, I would
> need some more detail on which variant you are running to determine if they
> are isolated from eachother, they probably aren't.

10.0 & 10.1 will typically be grouped together due to lack of ACS.  This is usually not a problem.

>            +-11.0  Advanced Micro Devices, Inc. [AMD] FCH SATA Controller
> [AHCI mode]
> The sata controller is isolated from the other devices

Yep, and it's a single function device so IOMMU groups should be ok.

>            +-12.0  Advanced Micro Devices, Inc. [AMD] FCH USB OHCI Controller
>            +-12.2  Advanced Micro Devices, Inc. [AMD] FCH USB EHCI Controller
> This pair of OHCI/EHCI controllers are together isolated from the other
> devices

Yep, same as above.

>            +-13.0  Advanced Micro Devices, Inc. [AMD] FCH USB OHCI Controller
>            +-13.2  Advanced Micro Devices, Inc. [AMD] FCH USB EHCI Controller
> This pair of OHCI/EHCI controllers are together isolated from the other
> devices

Yep

>            +-14.0  Advanced Micro Devices, Inc. [AMD] FCH SMBus Controller
>            +-14.1  Advanced Micro Devices, Inc. [AMD] FCH IDE Controller
>            +-14.2  Advanced Micro Devices, Inc. [AMD] FCH Azalia Controller
>            +-14.3  Advanced Micro Devices, Inc. [AMD] FCH LPC Bridge
> I do not think the SMBus/IDE/Azalia/LPC are isolated from eachother, but
> they are isolated from the other devices I have identified.
> 
> 
>            +-14.4-[01]----05.0  Dialogic Corporation PRI
> The legacy PCI should be isolated from the other devices identified.  Not
> sure what is going on here.
> 
>            +-14.5  Advanced Micro Devices, Inc. [AMD] FCH USB OHCI Controller
> This OHCI Controller should also be isolated from the other devices.

All of the above will be grouped together, this is the problem.  Since none of these functions support ACS, IOMMU groups assume that peer-to-peer between functions is possible.  If 14.4 and 14.5 are truly isolated from the rest of the package then we should have quirks to support that.  This whole block is an update or the quirk already shown in comment 7.

>            +-15.0-[02]--
>            +-15.2-[03]----00.0  ASMedia Technology Inc. ASM1042 SuperSpeed
> USB Host Controller
> Is this in a PCI-e slot or otherwise attached to the PCI-e?
> 
>            +-15.3-[04]----00.0  Qualcomm Atheros QCA8171 Gigabit Ethernet
> Is this in a PCI-e slot or otherwise attached to the PCI-e?

I would guess 15.x are all PCIe root ports, hopefully with ACS support.
Comment 17 Marti Raudsepp 2014-08-12 15:20:34 UTC
It's an ASRock FM2A88X Extreme6+ motherboard with the AMD A88X (Bolton-D4) chipset.

There are 12 IOMMU groups on the system. The problematic group for me is number 9 because the legacy PCI bridge (14.4) gets mixed in with other southbridge devices (all 14.*).

/sys/kernel/iommu_groups/0/devices:
0000:00:00.0 -> ../../../../devices/pci0000:00/0000:00:00.0

/sys/kernel/iommu_groups/1/devices:
0000:00:01.0 -> ../../../../devices/pci0000:00/0000:00:01.0
0000:00:01.1 -> ../../../../devices/pci0000:00/0000:00:01.1

/sys/kernel/iommu_groups/2/devices:
0000:00:02.0 -> ../../../../devices/pci0000:00/0000:00:02.0

/sys/kernel/iommu_groups/3/devices:
0000:00:03.0 -> ../../../../devices/pci0000:00/0000:00:03.0

/sys/kernel/iommu_groups/4/devices:
0000:00:04.0 -> ../../../../devices/pci0000:00/0000:00:04.0

/sys/kernel/iommu_groups/5/devices:
0000:00:10.0 -> ../../../../devices/pci0000:00/0000:00:10.0
0000:00:10.1 -> ../../../../devices/pci0000:00/0000:00:10.1

/sys/kernel/iommu_groups/6/devices:
0000:00:11.0 -> ../../../../devices/pci0000:00/0000:00:11.0

/sys/kernel/iommu_groups/7/devices:
0000:00:12.0 -> ../../../../devices/pci0000:00/0000:00:12.0
0000:00:12.2 -> ../../../../devices/pci0000:00/0000:00:12.2

/sys/kernel/iommu_groups/8/devices:
0000:00:13.0 -> ../../../../devices/pci0000:00/0000:00:13.0
0000:00:13.2 -> ../../../../devices/pci0000:00/0000:00:13.2

/sys/kernel/iommu_groups/9/devices:
0000:00:14.0 -> ../../../../devices/pci0000:00/0000:00:14.0
0000:00:14.1 -> ../../../../devices/pci0000:00/0000:00:14.1
0000:00:14.2 -> ../../../../devices/pci0000:00/0000:00:14.2
0000:00:14.3 -> ../../../../devices/pci0000:00/0000:00:14.3
0000:00:14.4 -> ../../../../devices/pci0000:00/0000:00:14.4
0000:00:14.5 -> ../../../../devices/pci0000:00/0000:00:14.5
0000:01:05.0 -> ../../../../devices/pci0000:00/0000:00:14.4/0000:01:05.0

    [When I plug in a card to the other legacy PCI slot, it also appears here as
     pci0000:00/0000:00:14.4/0000:01:06.0]

/sys/kernel/iommu_groups/10/devices:
0000:00:15.0 -> ../../../../devices/pci0000:00/0000:00:15.0
0000:00:15.2 -> ../../../../devices/pci0000:00/0000:00:15.2
0000:00:15.3 -> ../../../../devices/pci0000:00/0000:00:15.3
0000:03:00.0 -> ../../../../devices/pci0000:00/0000:00:15.2/0000:03:00.0
0000:04:00.0 -> ../../../../devices/pci0000:00/0000:00:15.3/0000:04:00.0

/sys/kernel/iommu_groups/11/devices:
0000:00:18.0 -> ../../../../devices/pci0000:00/0000:00:18.0
0000:00:18.1 -> ../../../../devices/pci0000:00/0000:00:18.1
0000:00:18.2 -> ../../../../devices/pci0000:00/0000:00:18.2
0000:00:18.3 -> ../../../../devices/pci0000:00/0000:00:18.3
0000:00:18.4 -> ../../../../devices/pci0000:00/0000:00:18.4
0000:00:18.5 -> ../../../../devices/pci0000:00/0000:00:18.5

(In reply to Joel Schopp from comment #15)
> It's not clear to me which devices were being put in the same group.  Here's
> some of my notes on your lspci output

Other than the 14.* devices everything seems to be as you describe.

>            +-14.0  Advanced Micro Devices, Inc. [AMD] FCH SMBus Controller
>            +-14.1  Advanced Micro Devices, Inc. [AMD] FCH IDE Controller
>            +-14.2  Advanced Micro Devices, Inc. [AMD] FCH Azalia Controller
>            +-14.3  Advanced Micro Devices, Inc. [AMD] FCH LPC Bridge
> I do not think the SMBus/IDE/Azalia/LPC are isolated from eachother, but
> they are isolated from the other devices I have identified.

Ok, that's not a problem.

>            +-14.4-[01]----05.0  Dialogic Corporation PRI
> The legacy PCI should be isolated from the other devices identified.  Not
> sure what is going on here.

Yep, currently shares group 9.

>            +-14.5  Advanced Micro Devices, Inc. [AMD] FCH USB OHCI Controller
> This OHCI Controller should also be isolated from the other devices.

Also shares group 9.

>            +-15.0-[02]--
>            +-15.2-[03]----00.0  ASMedia Technology Inc. ASM1042 SuperSpeed
> USB Host Controller
> Is this in a PCI-e slot or otherwise attached to the PCI-e?

Nope, this is integrated on the motherboard. The only used PCI slot is the Dialogic card.

>            +-15.3-[04]----00.0  Qualcomm Atheros QCA8171 Gigabit Ethernet
> Is this in a PCI-e slot or otherwise attached to the PCI-e?

Integrated Ethernet.
Comment 18 Joerg Roedel 2014-09-01 09:30:50 UTC
The fix is now upstream and part of Linux v3.17-rc2.
Comment 19 Marti Raudsepp 2014-09-09 07:39:09 UTC
(In reply to Joel Schopp from comment #15)
> It's not clear to me which devices were being put in the same group.

Hi Joel, any updates on this? I posted my IOMMU groups in comment #17 in case you missed it.
Comment 20 Joel Schopp 2014-09-09 14:58:28 UTC
What updates are you looking for?  Joerg's fix is now upstream.
Comment 21 Marti Raudsepp 2014-09-09 15:25:51 UTC
(In reply to Joel Schopp from comment #20)
> What updates are you looking for?  Joerg's fix is now upstream.

Yes, but there's still the issue with southbridge component isolation. You requested more information from me in comment #15 that I provided in comment #17.

For background see comment #9 from Alex Williamson:
> AMD would need to confirm it.  IOMMU groups are based on hardware advertised
> isolation via the PCIe ACS capability.  Without this, or a device specific
> quirk to take its place, IOMMU groups must assume that peer-to-peer between
> functions of a multi-function device is possible and therefore that the
> devices are not isolated. [...]

> I think the path forward is to get confirmation from AMD that these function
> are isolated from each other and add quirks to the kernel.  Then you won't
> have the device dependencies in vfio-pci.  The override patch allows you to
> do that with just a kernel boot parameter.  There's no gurantee that
> pci-assign will ever be fixed since it's being phased out.
Comment 22 Marti Raudsepp 2014-10-02 13:30:16 UTC
Since I did not get further confirmation from Mr. Schopp, I decided to push it and submit a patch: https://lkml.org/lkml/2014/10/2/223

The phrases "Not sure what is going on here" and "should also be isolated" in comment #15 don't inspire much confidence, but I have not managed to obtain more concrete statements.
Comment 23 Marti Raudsepp 2014-10-10 15:58:05 UTC
Closing bug, ACS patch merged to mainline Linux:
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=3587e625fe24a2d1cd1891fc660c3313151a368c

Thanks Joerg.

Note You need to log in before you can comment on or make changes to this bug.