Bug 187351

Summary: unhandled irq stack trace during shutdown
Product: Platform Specific/Hardware Reporter: Prarit Bhargava (prarit)
Component: x86-64Assignee: platform_x86_64 (platform_x86_64)
Status: NEW ---    
Severity: normal CC: bjorn, myron.stowe, rui.y.wang
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: 4.9.0 Subsystem:
Regression: No Bisected commit-id:
Attachments: Boot log for unhandled irq

Description Prarit Bhargava 2016-11-09 19:31:07 UTC
Created attachment 244081 [details]
Boot log for unhandled irq

The following stack trace for an unhandled irq occurs on system shutdown:

irq 16: nobody cared (try booting with the "irqpoll" option)
CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.9.0-0.rc4.git0.1.el7_UNSUPPORTED.x86_64 #1
Hardware name: Hewlett-Packard HP Z820 Workstation/158B, BIOS J63 v03.90 06/01/2016
 ffff88041f803eb0 ffffffff81337765 ffff8804186dbc00 ffff8804186dbc9c
 ffff88041f803ed8 ffffffff810db935 ffff8804186dbc00 0000000000000000
 0000000000000010 ffff88041f803f10 ffffffff810dbc8f ffff8804186dbc00
Call Trace:
 <IRQ>  [<ffffffff81337765>] dump_stack+0x63/0x8e
 [<ffffffff810db935>] __report_bad_irq+0x35/0xd0
 [<ffffffff810dbc8f>] note_interrupt+0x20f/0x260
 [<ffffffff810d8ff5>] handle_irq_event_percpu+0x45/0x60
 [<ffffffff810d903c>] handle_irq_event+0x2c/0x50
 [<ffffffff810dc9ba>] handle_fasteoi_irq+0x8a/0x150
 [<ffffffff8102ee0b>] handle_irq+0xab/0x130
 [<ffffffff8109de6a>] ? atomic_notifier_call_chain+0x1a/0x20
 [<ffffffff8103631d>] ? __exit_idle+0x2d/0x30
 [<ffffffff817158ad>] do_IRQ+0x4d/0xd0
 [<ffffffff81713902>] common_interrupt+0x82/0x82
 <EOI>  [<ffffffff815e34d1>] ? cpuidle_enter_state+0xc1/0x280
 [<ffffffff815e34c4>] ? cpuidle_enter_state+0xb4/0x280
 [<ffffffff815e36c7>] cpuidle_enter+0x17/0x20
 [<ffffffff810c1813>] call_cpuidle+0x23/0x40
 [<ffffffff810c1a79>] cpu_startup_entry+0x149/0x240
 [<ffffffff81705cd7>] rest_init+0x77/0x80
 [<ffffffff81d8a147>] start_kernel+0x495/0x4a2
 [<ffffffff81d89aa0>] ? set_init_arg+0x55/0x55
 [<ffffffff81d89120>] ? early_idt_handler_array+0x120/0x120
 [<ffffffff81d895d6>] x86_64_start_reservations+0x2a/0x2c
 [<ffffffff81d89715>] x86_64_start_kernel+0x13d/0x14c
handlers:
[<ffffffff8155ace0>] usb_hcd_irq
Disabling IRQ #16

P.
Comment 2 Prarit Bhargava 2016-11-09 19:39:30 UTC
Bjorn, I have attached a full boot file from an affected system as part of the bugzilla Description.  If this is not sufficient to debug, please let me know.

P.
Comment 3 Prarit Bhargava 2016-11-09 19:46:05 UTC
Some debug information:

The issue is that pci_device_shutdown() is called on each PCI device, and does

        if (drv && drv->shutdown)
                drv->shutdown(pci_dev);
        pci_msi_shutdown(pci_dev);
        pci_msix_shutdown(pci_dev);

The pci_msi_shutdown() and pci_msix_shutdown() functions both call
pci_intx_for_msi() which enables the INTx interrupt asynchronously of the
driver.

The problem is that the driver may not have a shutdown function and the
device remains active.  The driver continues to operate the PCI device and the
device interrupts to generate INTx.  The driver, however, has not registered a
handler for INTx and the interrupt line remains set which leads to an unhandled
IRQ warning.

This can be debugged by adding a 3 second delay in pci_device_shutdown()

diff --git a/drivers/pci/pci-driver.c b/drivers/pci/pci-driver.c
index 1ccce1cd6aca..74776ca35e0c 100644
--- a/drivers/pci/pci-driver.c
+++ b/drivers/pci/pci-driver.c
@@ -20,6 +20,7 @@
 #include <linux/pm_runtime.h>
 #include <linux/suspend.h>
 #include <linux/kexec.h>
+#include <linux/delay.h>
 #include "pci.h"
 
 struct pci_dynid {
@@ -461,10 +462,19 @@ static void pci_device_shutdown(struct device *dev)
 
 	pm_runtime_resume(dev);
 
+	/*
+	 * Mika, on the 4S the device that had the problem was the
+	 * GT218 [GeForce 210] Video card running the nouveau driver.
+	 * The nouveau driver does not use INTX at all IIRC.
+	 */
+	printk(KERN_EMERG "Calling shutdown for device %s\n", dev_name(dev));
 	if (drv && drv->shutdown)
 		drv->shutdown(pci_dev);
+
 	pci_msi_shutdown(pci_dev);
 	pci_msix_shutdown(pci_dev);
+	mdelay(3000); /* wait for unhandled irq */
+	printk(KERN_EMERG "Completed shutdown for device %s\n", dev_name(dev));
 
 	/*
 	 * If this is a kexec reboot, turn off Bus Master bit on the


and is highly reproducible on systems that have NVIDIA cards in them.

P.
Comment 4 Rui Wang 2016-11-30 13:49:40 UTC
Enabling INTx may be needed for every PCI device because system shutdown is passing control to unknown OS or firmware. Here's a patch trying to fix it from a different perspective.

From 8f6d102e0a1ebd906f113532ec805fa395d12a3e Mon Sep 17 00:00:00 2001
From: Rui Wang <rui.y.wang@intel.com>
Date: Fri, 25 Nov 2016 03:26:07 -0500
Subject: [PATCH] pci/msi: Setup INTx before pci_intx_for_msi()

Migrate MSI actions to INTx if the driver doesn't have shutdown()
to quiesce the hardware, in order to prevent IRQ storm during system
shutdown.

Signed-off-by: Rui Wang <rui.y.wang@intel.com>
---
 drivers/pci/msi.c |   35 +++++++++++++++++++++++++++++++++++
 1 files changed, 35 insertions(+), 0 deletions(-)

diff --git a/drivers/pci/msi.c b/drivers/pci/msi.c
index 98f1222..1be5645 100644
--- a/drivers/pci/msi.c
+++ b/drivers/pci/msi.c
@@ -886,6 +886,39 @@ int pci_msi_vec_count(struct pci_dev *dev)
 }
 EXPORT_SYMBOL(pci_msi_vec_count);

+void pci_msi_to_intx(struct pci_dev *dev)
+{
+       struct irq_desc *desc;
+       struct msi_desc *entry;
+       struct irqaction *act;
+       unsigned int intx_irq;
+       struct pci_driver *drv = dev->driver;
+
+       /*
+        * If it has shutdown() it's supposed to quiesce the hw.
+        */
+       if (drv && drv->shutdown)
+               return;
+
+       /* Only the first entry since the device can only assert 1 INTx */
+       entry = first_pci_msi_entry(dev);
+       if (!entry)
+               return;
+       intx_irq = entry->msi_attrib.default_irq;
+       if (entry->irq) {
+               desc = irq_to_desc(entry->irq);
+               act = desc->action;
+               while (act) {
+                       irq_set_affinity_hint(entry->irq, NULL);
+                       /* Remove & reuse the action without freeing it */
+                       remove_irq(entry->irq, act);
+                       act->flags |= IRQF_SHARED;
+                       setup_irq(intx_irq, act);
+                       act = act->next;
+               }
+       }
+}
+
 void pci_msi_shutdown(struct pci_dev *dev)
 {
        struct msi_desc *desc;
@@ -898,6 +931,7 @@ void pci_msi_shutdown(struct pci_dev *dev)
        desc = first_pci_msi_entry(dev);

        pci_msi_set_enable(dev, 0);
+       pci_msi_to_intx(dev);
        pci_intx_for_msi(dev, 1);
        dev->msi_enabled = 0;

@@ -1006,6 +1040,7 @@ void pci_msix_shutdown(struct pci_dev *dev)
        }

        pci_msix_clear_and_set_ctrl(dev, PCI_MSIX_FLAGS_ENABLE, 0);
+       pci_msi_to_intx(dev);
        pci_intx_for_msi(dev, 1);
        dev->msix_enabled = 0;
        pcibios_alloc_irq(dev);
--
1.7.5.4
Comment 5 Prarit Bhargava 2016-11-30 13:59:48 UTC
(In reply to Rui Wang from comment #4)
> Enabling INTx may be needed for every PCI device because system shutdown is
> passing control to unknown OS or firmware. Here's a patch trying to fix it
> from a different perspective.
> 

Thanks Rui.  A few questions.

1.  Do you know of any systems that have problems like this?  My argument is that prior to d52877c7b1af ("pci/irq: let pci_device_shutdown to call pci_msi_shutdown v2") there were no reports of faulty OS or firmware switching that I can find.

2.  Shouldn't the reset (at least on systems within the past 10 years) cover this situation?

P.
Comment 6 Rui Wang 2016-11-30 15:06:57 UTC
(In reply to Prarit Bhargava from comment #5)
> 1.  Do you know of any systems that have problems like this?  My argument is
> that prior to d52877c7b1af ("pci/irq: let pci_device_shutdown to call
> pci_msi_shutdown v2") there were no reports of faulty OS or firmware
> switching that I can find.
> 

No I don't. I was just theorizing.

> 2.  Shouldn't the reset (at least on systems within the past 10 years) cover
> this situation?
> 

kexec doesn't do reset. As for firmware, I think the firmware can choose not to do a reset immediately, but to continue executing for a while. e.g. sense the temperature to decide if the fan needs to continue turning. And it is not limited to PC but can be anything (automobile, drone...)