Bug 101301

Summary: The tulip driver fails to initialize in a Linux VM running on Hyper-V.
Product: ACPI Reporter: Nick Meier (nmeier)
Component: Config-InterruptsAssignee: Jiang Liu (jiang.liu)
Status: RESOLVED CODE_FIX    
Severity: normal CC: aaron.lu, jiang.liu, mike, rjw, rjw
Priority: P1    
Hardware: x86-64   
OS: Linux   
Kernel Version: 3.18.0+ Subsystem:
Regression: No Bisected commit-id:
Attachments: Output from acpidump utility

Description Nick Meier 2015-07-09 18:08:33 UTC
Build and install Linux-next. After rebooting the VM, the following messages are logged in syslog when trying to load the tulip driver:

  tulip: Linux Tulip drivers version 1.1.15 (Feb 27, 2007)
  tulip: 0000:00:0a.0: PCI INT A: failed to register GSI
  tulip: Cannot enable tulip board #0, aborting
  tulip: probe of 0000:00:0a.0 failed with error -16

Errors occur in 3.19.0 kernel
Works in 3.17 kernel.

Debugging notes:

The tulip drivers initialization function tulip_init_one() calls pci_enable_device().  The call to pci_enable_device() eventually results in a call to alloc_isa_irq_from_domain(), which returns -EBUSY (-16)

The message "tulip: 0000:00:0a.0: PCI INT A: failed to register GSI" is
logged by acpi_pci_irq_enable().

The flow of calls to where the error is returned is:

acpi_pci_irq_enable()
  acpi_register_gsi()
    __acpi_register_gsi() is afunction pointer -> acpi_register_gsi_ioapic()
    acpi_register_gsi_ioapic()
      mp_map_gsi_to_irq()
        mp_map_ppin_to_irq()
          alloc_isa_irq_from_domain()
            Returning -EBUSY when mp_check_pin_attr() returns false.

            mp_check_pin_attr()
              data->trigger  = 1
              data->polarity = 0
              info->ioapic_trigger  = 1
              info->ioapic_polarity = 1
        mp_check_pin_attr() returns false because the two polarity values do not match.

acpi_register_gsi_ioapic() is setting trigger and polarity to:
    trigger = 1
    polarity = 1
mp_check_pin_attr() is calling irq_get_chip_data(irq) which is returning a mp_chip_data structure with
    trigger  = 1
    polarity = 0
Since the two polarity values do not match, mp_check_pin_attr() returns false.

Using the commits to arch/x86/kernel/acpi/boot.c as reference, the commit where the issue appears is:
    cd68f6bd53cf89d1d5ed889b8aaf65e9c3574a079
    x86, irq, acpi: Git rid of special handling of GSI for ACPI SCI
Comment 1 Aaron Lu 2015-07-14 06:42:47 UTC
Assign to Jiang.
Comment 2 Jiang Liu 2015-07-14 07:37:05 UTC
Hi Nick,
According to the bug report, it seems that the tulip PCI device shares the same GSI with ACPI SCI interrupt, but these two doesn't agree on the polarity setting. So could you please help to check:
1) whether the tulip PCI device shares the same GSI with ACPI SCI
2) whether it helps to tune the "acpi_sci" boot parameter
And please also help to provide an ACPI dump from the machine, so I could check related ACPI configuration.
Thanks!
Gerry
Comment 3 Nick Meier 2015-07-14 16:32:57 UTC
Created attachment 182651 [details]
Output from acpidump utility

Output from the acpidump utility.
Comment 4 Nick Meier 2015-07-14 16:34:14 UTC
1. Researching if Tulip GSI is shared with ACPI SCI.
2. The boot option acpi_sci=low works! Tulip driver successfully loaded.

Attached is the output from acpidump when VM was booted without an acpi_sci option.

thanks,

-Nick
Comment 5 Jiang Liu 2015-07-15 03:11:13 UTC
According to the ACPI dump file posted by nmeier at https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1440072

The ACPI MADT table includes an interrupt source overridden entry for ACPI SCI:
[236h 0566  1]                Subtable Type : 02 <Interrupt Source Override>
[237h 0567  1]                       Length : 0A
[238h 0568  1]                          Bus : 00
[239h 0569  1]                       Source : 09
[23Ah 0570  4]                    Interrupt : 00000009
[23Eh 0574  2]        Flags (decoded below) : 000D
                                   Polarity : 1
                               Trigger Mode : 3
That means ACPI SCI interrupt(Interrupt : 00000009) works in 
level(Trigger Mode : 3), high(Polarity : 1) mode. 
For more information, please refer to Table 5-50 in ACPI spec 5.a.

And in DSDT table, we have _PRT method to define PCI interrupts, which eventually goes to:
        Name (PRSA, ResourceTemplate ()
        {
            IRQ (Level, ActiveLow, Shared, )
                {3,4,5,7,9,10,11,12,14,15}
        })
        Name (PRSB, ResourceTemplate ()
        {
            IRQ (Level, ActiveLow, Shared, )
                {3,4,5,7,9,10,11,12,14,15}
        })
        Name (PRSC, ResourceTemplate ()
        {
            IRQ (Level, ActiveLow, Shared, )
                {3,4,5,7,9,10,11,12,14,15}
        })
        Name (PRSD, ResourceTemplate ()
        {
            IRQ (Level, ActiveLow, Shared, )
                {3,4,5,7,9,10,11,12,14,15}
        })
which means it's also possible to use IRQ9 for PCI interrupt, but works in
Level, ActiveLow mode. And this conflicts with ACPI SCI definition.

So it's an ACPI BIOS bug. There may several ways out here:
1) Change ACPI SCI IRQ working mode to level, low.
2) Exclude IRQ9 from Interrupt Link Device A-D
3) Use acpi_sci=low to override BIOS configuration.
4) revert the patch
Thanks!
Gerry
Comment 6 Nick Meier 2015-07-16 14:46:29 UTC
A solution which is transparent to users is preferred.  Asking users to modify their system (e.g. add a kernel boot option) would be least desirable.
Comment 7 Jiang Liu 2015-07-16 15:15:54 UTC
Hi Nick,
   Given the big deploy number of HyperV, seems the only acceptable solution is to revert commit cd68f6bd53cf. I can't figure out other solutions which are transparent to user.

Hi Rafael,
   Could you please help to give some advices on the normal way to deal with such a regression?

Thanks!
Gerry
Comment 8 Nick Meier 2015-07-22 17:36:31 UTC
Any decisions/recommendations on how to proceed?
Comment 9 Jiang Liu 2015-07-27 02:44:52 UTC
Hi Nick,
   We may use following patch to work around the interrupt polarity issue. But I need help to identify those affected systems by DMI information. So could you please help to point out the exact items from "dmidecode" which we could use to exactly identify affected systems?
Thanks!
Gerry

diff --git a/arch/x86/kernel/acpi/boot.c b/arch/x86/kernel/acpi/boot.c
index e49ee24da85e..ea332074a50a 100644
--- a/arch/x86/kernel/acpi/boot.c
+++ b/arch/x86/kernel/acpi/boot.c
@@ -1308,6 +1308,13 @@ static int __init dmi_ignore_irq0_timer_override(const struct dmi_system_id *d)
        return 0;
 }

+static int __init acpi_force_hyperv_sci_attr(void)
+{
+       acpi_sci_flags = ACPI_MADT_POLARITY_ACTIVE_LOW |
+               (acpi_sci_flags & ~ACPI_MADT_POLARITY_MASK);
+       return 0;
+}
+
 /*
  * ACPI offers an alternative platform interface model that removes
  * ACPI hardware requirements for platforms that do not implement
@@ -1458,6 +1465,14 @@ static struct dmi_system_id __initdata acpi_dmi_table_late[] = {
                     DMI_MATCH(DMI_PRODUCT_NAME, "AMILO PRO V2030"),
                     },
         },
+       {
+        .callback = acpi_force_hyperv_sci_attr,
+        //.ident = "HyperV",
+        .matches = {
+                    //DMI_MATCH(DMI_SYS_VENDOR, ""),
+                    //DMI_MATCH(DMI_PRODUCT_NAME, ""),
+                    },
+        },
        {}
 };
Comment 10 Nick Meier 2015-08-06 19:10:56 UTC
Hi Gerry,

The following information was collected via dmidecode on a number of our VMs.

DMI Type: 1
Manufacturer: Microsoft Corporation
Product Name: Virtual Machine

-Nick
Comment 11 Nick Meier 2015-08-06 20:31:02 UTC
Applied the above proposed patch with the DMI values substituted.  The tulip driver loaded, and an address was assigned via DHCP.

e.g.
       {
        .callback = acpi_force_hyperv_sci_attr,
        .ident = "HyperV",
        .matches = {
                    DMI_MATCH(DMI_SYS_VENDOR, "Microsoft Corporation"),
                    DMI_MATCH(DMI_PRODUCT_NAME, "Virtual Machine"),
                    },
        },

-Nick
Comment 12 Aaron Lu 2015-08-24 05:21:12 UTC
What's the plan for the acpi_force_hyperv_sci_attr patch? Is it submitted/merged?
Comment 13 Jiang Liu 2015-08-24 06:00:41 UTC
Rafael and Thomas has concerns about the solution, so we are trying to find other solutions. New patch has been sent for testing.
Comment 14 Aaron Lu 2015-08-24 06:04:23 UTC
Thanks for the update, and is there a link for the new patch?
Comment 15 Jiang Liu 2015-08-24 06:08:14 UTC
Link for V3 https://patchwork.kernel.org/patch/7049421/
Comment 16 Aaron Lu 2015-08-24 06:20:20 UTC
Mark it as resolved since the patch is sent out, will close it once it's merged. If something is wrong, please re-open the bug.
Comment 17 Nick Meier 2015-08-25 15:36:46 UTC
Tested the v3 patch - it works in a Hyper-V environment.  The Tulip driver loads and the NIC was successfully assigned an IP address via DHCP.