Bug 96571 - PCI shutdown causes spurious irq from virtio devices after disabling MSI-X, which could cause soft lockup
Summary: PCI shutdown causes spurious irq from virtio devices after disabling MSI-X, w...
Status: NEW
Alias: None
Product: Drivers
Classification: Unclassified
Component: PCI (show other bugs)
Hardware: All Linux
: P1 normal
Assignee: drivers_pci@kernel-bugs.osdl.org
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2015-04-13 09:24 UTC by Fam Zheng
Modified: 2016-08-19 18:29 UTC (History)
2 users (show)

See Also:
Kernel Version: v4.0
Subsystem:
Regression: No
Bisected commit-id:


Attachments
Pci Fix Incorrect Msi/Msix Disabling if Shutdown Routine is Not Implemented (1.30 KB, patch)
2016-08-19 18:29 UTC, [account disabled by the administrator]
Details | Diff

Description Fam Zheng 2015-04-13 09:24:02 UTC
Ulrich Obergfell initially noticed this bug in RHEL 7 kernel, this issue applies to upstream as well.

The problem occurs during shutdown of a Linux virtual machine that has an active virtio device. A spurious IRQ is generated. If noirqdebug is set, the IRQ will cause soft lockup, which will prevent the guest from shutting down, or trigger panic if softlockup_panic=1.

Below is what happens in this case:

Before shutting down, the device driver operates with MSI-X enabled, so the device notifies us about any events (I/O completion, etc).

To prepare for shutdown, pci core code will disable MSI/MSI-X unconditionally (drivers/pci/pci-driver.c:pci_device_shutdown):

        if (drv && drv->shutdown)
                drv->shutdown(pci_dev);
        pci_msi_shutdown(pci_dev);
        pci_msix_shutdown(pci_dev);

where drv is the "virtio-pci" driver but drv->shutdown is NULL.

In other words, after this point, the device still runs, just that it sees MSI-X disabled. For a virtio-pci device, this means all further notification will be delivered with IRQ (automatic fallback).

However the "virtio-pci" driver doesn't register any IRQ handler because it is expecting MSI-X.

Now, the device has a new event for driver so an IRQ is asserted, but the IRQ is not handled, and the driver is not notified about the event. Without spurious interrupts detection, this can cause a soft lockup.
Comment 1 Fam Zheng 2015-05-20 03:12:42 UTC
I don't have an upstream kernel reproducing now, here is the RHEL 7 call trace reported by Ulrich Obergfell (extracted from the private RHEL 7 bug):

[   15.174115] dracut Warning: Killing all remaining processes
dracut Warning: Killing all remaining processes
[   15.223114] dracut Warning: Unmounted /oldroot.
[   15.244208] dracut: Disassembling device-mapper devices
Powering off.
[   16.046521] irq 11: nobody cared (try booting with the "irqpoll" option)
[   16.047005] CPU: 0 PID: 1820 Comm: poweroff Not tainted 3.10.0-121.el7.x86_64 #1
[   16.047005] Hardware name: Red Hat KVM, BIOS Bochs 01/01/2011
[   16.047005]  ffff88003e005b00 00000000cbb0f224 ffff88003fc03cd8 ffffffff815ec0ba
[   16.047005]  ffff88003fc03d00 ffffffff810f96e2 ffff88003e005b00 000000000000000b
[   16.047005]  0000000000000000 ffff88003fc03d40 ffffffff810f9b02 00000000cbb0f224
[   16.047005] Call Trace:
[   16.047005]  <IRQ>  [<ffffffff815ec0ba>] dump_stack+0x19/0x1b
[   16.047005]  [<ffffffff810f96e2>] __report_bad_irq+0x32/0xd0
[   16.047005]  [<ffffffff810f9b02>] note_interrupt+0x132/0x1f0
[   16.047005]  [<ffffffff810f7221>] handle_irq_event_percpu+0xe1/0x1e0
[   16.047005]  [<ffffffff810f735d>] handle_irq_event+0x3d/0x60
[   16.047005]  [<ffffffff810fa7ba>] handle_fasteoi_irq+0x5a/0x100
[   16.047005]  [<ffffffff81014c3f>] handle_irq+0xbf/0x150
[   16.047005]  [<ffffffff81067247>] ? irq_enter+0x17/0xa0
[   16.047005]  [<ffffffff815fea4f>] do_IRQ+0x4f/0xf0
[   16.047005]  [<ffffffff815f3bad>] common_interrupt+0x6d/0x6d
[   16.047005]  [<ffffffff8106fe8e>] ? update_process_times+0x6e/0x80
[   16.047005]  [<ffffffff81066ff0>] ? __do_softirq+0xa0/0x290
[   16.047005]  [<ffffffff81066f70>] ? __do_softirq+0x20/0x290
[   16.047005]  [<ffffffff810bd9a4>] ? tick_program_event+0x24/0x30
[   16.047005]  [<ffffffff815fe15c>] call_softirq+0x1c/0x30
[   16.047005]  [<ffffffff81014d25>] do_softirq+0x55/0x90
[   16.047005]  [<ffffffff810673e5>] irq_exit+0x115/0x120
[   16.047005]  [<ffffffff815feb35>] smp_apic_timer_interrupt+0x45/0x60
[   16.047005]  [<ffffffff815fd49d>] apic_timer_interrupt+0x6d/0x80
[   16.047005]  <EOI>  [<ffffffff815f367b>] ? _raw_spin_unlock_irqrestore+0x1b/0x40
[   16.047005]  [<ffffffff812e068f>] pci_bus_read_config_word+0x9f/0xb0
[   16.047005]  [<ffffffff812e205c>] pci_read_config_word+0x1c/0x20
[   16.047005]  [<ffffffff813038b0>] msix_set_enable.constprop.8+0x30/0x70
Comment 2 Fam Zheng 2015-05-21 06:15:43 UTC
The reproducing condition is a bit rigorous, we can only see the backtrace when the returning of kernel_power_off is slowed down enough by the continuous spurious IRQ handling, otherwise the power will go off and we lose the console output. Below is an artificial approach derived from Ulrich's on RHEL 7, which reproduces the issue with current master:

1. Boot guest with 1) virtio-scsi-pci, with MSI-X enabled; 2) qxl vga, which uses IRQ 10.

The full command line is:

qemu-system-x86_64  \
  -enable-kvm  \
  -qmp unix:/tmp/qsh/OWHZHW8K/qmp.sock,server,nowait  \
  -monitor 
  -m 1024  \
  -netdev user,id=virtio-nat-0,hostfwd=:0.0.0.0:10022-:22  \
  -device virtio-net-pci,id=virtio-net-pci-virtio-nat-0,netdev=virtio-nat-0  \
  -device virtio-scsi-pci,id=virtio-scsi-bus  \
  -serial file:/tmp/qsh/OWHZHW8K/serial.out  \
  -vga qxl  \
  -kernel arch/x86/boot/bzImage  \
  -append "root=/dev/sda1 console=tty0 console=ttyS0 quiet loglevel=5 do_shutdown"  \
  -display none  \
  -drive file=../1199155/reboot.qcow2,if=none,id=reboot  \
  -device scsi-disk,drive=reboot  \
  -drive file=/tmp/test.img,if=none,id=d0  \
  -device scsi-disk,drive=d0

Login to guest to check that virtio-scsi is using MSI-X, and has also been assigned the fallback IRQ with the same number as QXL, who is actively using IRQ 10:

# lspci -v

...

00:02.0 VGA compatible controller: Red Hat, Inc. QXL paravirtual graphic card (rev 04) (prog-if 00 [VGA controller])
        Subsystem: Red Hat, Inc QEMU Virtual Machine
        Flags: fast devsel, IRQ 10
        Memory at f4000000 (32-bit, non-prefetchable) [size=64M]
        Memory at f8000000 (32-bit, non-prefetchable) [size=64M]
        Memory at fc020000 (32-bit, non-prefetchable) [size=8K]
        I/O ports at c040 [size=32]
        Expansion ROM at fc000000 [disabled] [size=64K]
        Kernel driver in use: qxl

...

00:04.0 SCSI storage controller: Red Hat, Inc Virtio SCSI
        Subsystem: Red Hat, Inc Device 0008
        Flags: bus master, fast devsel, latency 0, IRQ 10
        I/O ports at c000 [size=64]
        Memory at fc023000 (32-bit, non-prefetchable) [size=4K]
        Capabilities: [40] MSI-X: Enable+ Count=4 Masked-
        Kernel driver in use: virtio-pci

# cat /proc/interrupts
           CPU0
  0:        130   IO-APIC-edge      timer
  1:         10   IO-APIC-edge      i8042
  4:         95   IO-APIC-edge      serial
  8:          0   IO-APIC-edge      rtc0
  9:          0   IO-APIC-fasteoi   acpi
 10:        209   IO-APIC-fasteoi   qxl
...

2. At that point, start a loop to resize the attached disk from host side, in order to generate many interrupts:

while :; do
    if [[ "$size" = "2048" ]]; then
        size=1024
    else
        size=2048
    fi
    echo block_resize d0 $size &>/dev/null
    sleep 0.1
done | $hmp_monitor_file

3. Shutdown the guest.


Repeat steps 1-3 until you see the backtrace below. If the timing of the resize event is just between disabling MSI-X and powering off, the spurious IRQ will be detected by note_interrupt:

[    4.131467] irq 10: nobody cared (try booting with the "irqpoll" option)
[    4.131469] CPU: 0 PID: 1527 Comm: agetty Tainted: G        W       4.1.0-rc4+ #88
[    4.131470] Hardware name: Red Hat KVM, BIOS Bochs 01/01/2011
[    4.131471]  ffff88003e02de8c ffff88003fc03ce0 ffffffff81918264 0000000000003232
[    4.131472]  ffff88003e02de00 ffff88003fc03d10 ffffffff8109ce31 0000000000000000
[    4.131473]  ffff88003e02de00 0000000000000000 000000000000000a ffff88003fc03d50
[    4.131473] Call Trace:
[    4.131478]  <IRQ>  [<ffffffff81918264>] dump_stack+0x45/0x57
[    4.131481]  [<ffffffff8109ce31>] __report_bad_irq+0x31/0xd0
[    4.131482]  [<ffffffff8109d1e7>] note_interrupt+0x277/0x2c0
[    4.131483]  [<ffffffff8109aa1b>] handle_irq_event_percpu+0x14b/0x1b0
[    4.131484]  [<ffffffff8109aab5>] handle_irq_event+0x35/0x60
[    4.131485]  [<ffffffff8109d684>] handle_fasteoi_irq+0x74/0x120
[    4.131487]  [<ffffffff81004ce4>] handle_irq+0x74/0x140
[    4.131488]  [<ffffffff8100460a>] do_IRQ+0x4a/0xe0
[    4.131490]  [<ffffffff8192182b>] common_interrupt+0x6b/0x6b
[    4.131492]  [<ffffffff810af2f2>] ? ktime_get+0x32/0xb0
[    4.131494]  [<ffffffff81055989>] ? __do_softirq+0x79/0x270
[    4.131495]  [<ffffffff8105592b>] ? __do_softirq+0x1b/0x270
[    4.131497]  [<ffffffff81055cb5>] irq_exit+0x85/0x90
[    4.131498]  [<ffffffff81038901>] smp_apic_timer_interrupt+0x41/0x50
[    4.131499]  [<ffffffff81921afb>] apic_timer_interrupt+0x6b/0x70
[    4.131501]  <EOI>  [<ffffffff81098e20>] ? console_unlock+0x2d0/0x430
[    4.131503]  [<ffffffff8142fd97>] con_flush_chars+0x37/0x40
[    4.131505]  [<ffffffff8141e0f1>] n_tty_write+0x241/0x4e0
[    4.131507]  [<ffffffff81089e80>] ? __wake_up_sync+0x10/0x10
[    4.131508]  [<ffffffff8141a344>] tty_write+0x174/0x300
[    4.131509]  [<ffffffff8141deb0>] ? process_echoes+0x80/0x80
[    4.131511]  [<ffffffff81164e73>] __vfs_write+0x23/0xe0
[    4.131514]  [<ffffffff8131e2ce>] ? security_file_permission+0x1e/0xa0
[    4.131515]  [<ffffffff811652de>] ? rw_verify_area+0x4e/0x100
[    4.131516]  [<ffffffff81165574>] vfs_write+0xa4/0x1b0
[    4.131517]  [<ffffffff81166311>] SyS_write+0x41/0xb0
[    4.131519]  [<ffffffff81011426>] ? syscall_trace_leave+0x96/0x100
[    4.131520]  [<ffffffff81920d57>] system_call_fastpath+0x12/0x6a
[    4.131521] handlers:
[    4.131523] [<ffffffff8154df80>] qxl_irq_handler
[    4.131524] Disabling IRQ #10
Comment 3 [account disabled by the administrator] 2016-08-19 18:28:38 UTC
See if the below patch based on your expectation of the issue helps fix your IRQ softup issue.
Comment 4 [account disabled by the administrator] 2016-08-19 18:29:19 UTC
Created attachment 229441 [details]
Pci Fix Incorrect Msi/Msix Disabling if Shutdown Routine is Not Implemented

Note You need to log in before you can comment on or make changes to this bug.