Bug 94661

Summary: \_SB_.PCI0:_OSC invalid UUID
Product: Drivers Reporter: Bjorn Helgaas (bjorn)
Component: PCIAssignee: drivers_pci (drivers_pci)
Status: NEW ---    
Severity: normal CC: bugzilla, deprez.maarten, gabriele.mzt, luto
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: v3.19 Subsystem:
Regression: No Bisected commit-id:
Attachments: _OSC debug info without CONFIG_HOTPLUG_PCI_ACPI
_OSC debug info with CONFIG_HOTPLUG_PCI_ACPI
_OSC debug patch
Dell XPS13 9333 - dmesg
Dell XPS13 9333 - dmesg with patch
debug patch for https://bugzilla.kernel.org/show_bug.cgi?id=94661
Winbook TW100 4.3.0-0.rc0.git11.2.fc22.i686.txt

Description Bjorn Helgaas 2015-03-10 15:33:57 UTC
We have many reports of errors like this:

  acpi PNP0A08:00: _OSC: OS supports [ExtendedConfig ASPM ClockPM Segments MSI]
  \_SB_.PCI0:_OSC invalid UUID
  _OSC request data:1 1f 0 
  acpi PNP0A08:00: _OSC failed (AE_ERROR); disabling ASPM

The "invalid UUID" error might be a BIOS bug, or it might be a kernel bug, or maybe we just need a more informative message.  We supply the correct UUID, and we aren't changing it, so the message is not literally true (unless it's getting corrupted somewhere, which I don't think is likely).

I think we need to do *something* to stop the flood of bug reports.

Here are some of the bug reports that mention this error (most of these reports are not specifically *about* this error, but they might contain clues about what's going wrong):

https://bugzilla.kernel.org/show_bug.cgi?id=17792 unknown
https://bugzilla.kernel.org/show_bug.cgi?id=34192 Dell XPS L502X
https://bugzilla.kernel.org/show_bug.cgi?id=35532 Acer Aspire 7750G
https://bugzilla.kernel.org/show_bug.cgi?id=36932 Sony VGN-NS130FE/VAIO, Toshiba Satellite L670, Dell XPS L501X, Lenovo 20078
https://bugzilla.kernel.org/show_bug.cgi?id=42604 Dell XPS 15Z
https://bugzilla.kernel.org/show_bug.cgi?id=43229 Asus P8P67-EVO
https://bugzilla.kernel.org/show_bug.cgi?id=54951 Avatar AVIU-145A2, aka Intel IC4I/IC4I
https://bugzilla.kernel.org/show_bug.cgi?id=54981 Fujitsu Lifebook AH531 (includes analysis that might be useful)
https://bugzilla.kernel.org/show_bug.cgi?id=59311 Acer AO756/Mimic
https://bugzilla.kernel.org/show_bug.cgi?id=70891 HP Pavilion 15 Notebook
https://bugzilla.kernel.org/show_bug.cgi?id=73241 Acer Aspire V3-571G (SDHCI)
https://bugzilla.kernel.org/show_bug.cgi?id=85931 Acer Aspire E5-511
https://bugzilla.kernel.org/show_bug.cgi?id=87491 Onda v975w Baytrail tablet
https://bugzilla.kernel.org/show_bug.cgi?id=90351 Samsung 530U3C
https://bugzilla.kernel.org/show_bug.cgi?id=93561 Dell XPS L502X (SDHCI)

https://bugzilla.redhat.com/show_bug.cgi?id=649181 Toshiba Qosmio X500
https://bugzilla.redhat.com/show_bug.cgi?id=699156 Lenovo IdeaPad Y560p
https://bugzilla.redhat.com/show_bug.cgi?id=833597 Gateway LT40
https://bugzilla.redhat.com/show_bug.cgi?id=929217 Lenovo HuronRiver Platform/Emerald Lake
https://bugzilla.redhat.com/show_bug.cgi?id=969550 Acer Aspire 5742
https://bugzilla.redhat.com/show_bug.cgi?id=1039396 Toshiba dynabook R734

https://bugs.launchpad.net/ubuntu/+source/linux/+bug/926016 Sony VPCCW1S1E/VAIO
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1093308 HP Pavilion dv6
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1213572 Samsung 530U3C
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1213575 Samsung 530U3C
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1219669 Dell Latitude 3330, Dell OptiPlex 3011
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1334192 Dell Optiplex 3011
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1335367 Samsung 530U3C
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1337083 HP 240 G2 Notebook
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1341023 Samsung 530U3C
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1346760 HP 200
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1349738 Samsung 530U3C
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1349740 Samsung 530U3C, Acer Aspire E5-511, Dell Inspiron 17R, Dell XPS L502X
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1355593 Samsung 530U3C
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1404173 Dell Inspiron 7537

https://bugs.launchpad.net/ubuntu/+source/linux-lts-raring/+bug/1263863 HP Pavilion 14
https://bugs.launchpad.net/ubuntu/+source/linux-lts-saucy/+bug/1279169 Dell Latitude 3330
https://bugs.launchpad.net/ubuntu/+source/linux-lts-saucy/+bug/1287539 Dell Inspiron 7737
https://bugs.launchpad.net/ubuntu/+source/linux-lts-trusty/+bug/1367084 HP InsydeH2O
https://bugs.launchpad.net/ubuntu/+source/linux-lts-trusty/+bug/1413849 Dell Inspiron 5420

https://bugs.launchpad.net/fwts/+bug/1255681 Cisco UCSC-C220-M3S

I found these (and many more) reports with Google queries like this:

  "_OSC invalid UUID" site:bugs.launchpad.net
Comment 1 Maarten Deprez 2015-03-16 19:36:32 UTC
Created attachment 170811 [details]
_OSC debug info without CONFIG_HOTPLUG_PCI_ACPI

As requested in bug #93561, boot log with _OSC debug patch applied without CONFIG_HOTPLUG_PCI_ACPI set...
Comment 2 Maarten Deprez 2015-03-16 19:37:33 UTC
Created attachment 170821 [details]
_OSC debug info with CONFIG_HOTPLUG_PCI_ACPI

... and with CONFIG_HOTPLUG_PCI_ACPI set.
Comment 3 Bjorn Helgaas 2015-07-08 15:59:58 UTC
Created attachment 182181 [details]
_OSC debug patch

If you see messages like:

  \_SB_.PCI0:_OSC invalid UUID
  _OSC request data:1 1f 0

in your dmesg log, please attach the dmesg log here, then apply this debug patch and also attach the new dmesg log.

This patch is based on v4.2-rc1, but should work on older kernels, too.
Comment 4 Gabriele Mazzotta 2015-07-08 19:28:01 UTC
Created attachment 182201 [details]
Dell XPS13 9333 - dmesg

Dell XPS13 9333, dmesg without the patch applied.
Comment 5 Gabriele Mazzotta 2015-07-08 19:28:42 UTC
Created attachment 182211 [details]
Dell XPS13 9333 - dmesg with patch

Dell XPS13 9333, dmesg with the patch applied.
Comment 6 Bastien Nocera 2015-09-10 11:11:44 UTC
Created attachment 187251 [details]
debug patch for https://bugzilla.kernel.org/show_bug.cgi?id=94661
Comment 7 Bastien Nocera 2015-09-11 17:33:05 UTC
Created attachment 187431 [details]
Winbook TW100 4.3.0-0.rc0.git11.2.fc22.i686.txt
Comment 8 Andy Lutomirski 2015-11-18 18:37:14 UTC
I'm pretty sure this is a DSDT bug combined with poor handling on our part.  With improved debugging (and I'll send the patch out to the list shortly), I see:

[    0.236336] ACPI: PCI Root Bridge [PCI0] (domain 0000 [bus 00-fe])
[    0.236342] acpi PNP0A08:00: _OSC: OS supports [ExtendedConfig ASPM ClockPM S
egments MSI]
[    0.236390] \_SB_.PCI0 (33DB4D5B-1FF7-401C-9657-7441C03DD766): _OSC invalid U
UID
[    0.236391] _OSC request data: 1 1f 0
[    0.236395] acpi PNP0A08:00: _OSC failed (AE_ERROR); disabling ASPM
[    0.237093] PCI host bridge to bus 0000:00

This appears to match both the spec and my DSDT's expectation.  *However*, my DSDT (Dell XPS 13 9350) actually checks:

If (((Arg0 == GUID) && NEXP)) { success; } else { fail (and return "invalid UUID"; }

Can we *please* include the evaluate-any-ACPI-method patches upstream so I can just read NEXP directly from the command line?  Pretty please?  It's okay if it's behind a debug option.

Anyway, we should probably respond by just disabling ASPM and such for the part of the hierarchy behind the bridge that failed the request -- from my reading of the spec, this error should not be considered a global problem.

Meanwhile, if anyone has a Dell contact, we should consider asking them to change their error code.  I'm going to try to figure out what NEXP is.
Comment 9 Andy Lutomirski 2015-11-18 18:40:10 UTC
See also bug 36932.  This is a longstanding issue on Dell laptops, apparently.  We may want to add a quirk if we can figure out what's going on.
Comment 10 Andy Lutomirski 2015-11-18 18:52:26 UTC
Uh, WTF?

OperationRegion (GNVS, SystemMemory, 0x37718000, 0x05F5)

vs.

BIOS-e820: [mem 0x0000000000100000-0x000000007829dfff] usable

Unless I'm missing something, this is dangerously wrong.  /proc/iomem does *not* have a reservation for this opregion.  This could easily cause data corruption if ACPI writes to GNVS, and it could cause screwups (maybe like the one here) when ACPI reads it.

Why aren't we throwing a giant warning at boot here?

I can't tell yet whether this is a GRUB bug (why does anyone still use GRUB?), a Linux bug, or a firmware bug.
Comment 11 Bjorn Helgaas 2015-11-18 18:56:49 UTC
(In reply to Andy Lutomirski from comment #8)
> I'm pretty sure this is a DSDT bug combined with poor handling on our part. 

> my DSDT (Dell XPS 13 9350) actually checks:
> 
> If (((Arg0 == GUID) && NEXP)) { success; } else { fail (and return "invalid
> UUID"; }

I'm not a firmware guy, but I think NEXP is a "native PCIe support" flag in a global ACPI memory region ("GNVS"), called "NPCE" here:
http://review.coreboot.org/gitweb?p=coreboot.git;a=blob;f=src/southbridge/intel/bd82x6x/acpi/globalnvs.asl

Many BIOSes from Dell, Fujitsu, HP, Intel, Lenovo, Apple, Panasonic, etc., have a similar flag.  It might be just a bug that got copied everywhere.  We claim we're "disabling ASPM", but I think that really means "Linux won't touch ASPM configuration".  So if the BIOS enabled ASPM, we'll leave it enabled, and ASPM "works".  But I don't think we'll enable ASPM on hot-added devices.  Maybe nobody tested that part.