Bug 197207 - Module level code - ACPI Error: [_SB_.PCI0.XHC_.RHUB.HS11] Namespace lookup failure, AE_NOT_FOUND - Clevo N350DW
Summary: Module level code - ACPI Error: [_SB_.PCI0.XHC_.RHUB.HS11] Namespace lookup ...
Status: CLOSED UNREPRODUCIBLE
Alias: None
Product: ACPI
Classification: Unclassified
Component: ACPICA-Core (show other bugs)
Hardware: Intel Linux
: P1 normal
Assignee: acpi_acpica-core@kernel-bugs.osdl.org
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2017-10-12 13:32 UTC by Tim Mohlmann
Modified: 2017-12-18 03:20 UTC (History)
5 users (show)

See Also:
Kernel Version: 4.14.0-rc4
Subsystem:
Regression: No
Bisected commit-id:


Attachments
dmesg log (67.00 KB, text/plain)
2017-10-12 13:32 UTC, Tim Mohlmann
Details
Output of acpidump (559.41 KB, text/plain)
2017-10-12 13:34 UTC, Tim Mohlmann
Details
acpidump -c off output (557.66 KB, text/plain)
2017-10-13 10:21 UTC, Tim Mohlmann
Details
dmesg log after patch (115.46 KB, text/plain)
2017-10-19 04:20 UTC, Tim Mohlmann
Details

Description Tim Mohlmann 2017-10-12 13:32:34 UTC
Created attachment 258803 [details]
dmesg log

I have various issues wrt ACPI on my Clevo N350DW. This issues are at least present in kernel versions 4.10 till 4.14.0-r4.
* Battery info missing
* AC adapter status missing
* Some hotkeys not working
* ACPI errors in dmesg
* IRQ errors for smBus device

I'm a bit chicken and egg here. Can't determine if I mis ACPI info because of smBUS IRQ error. Or is smBus IRQ error a result of ACPI problems?

Research already done:
* BIOS and EC firmware are updated to latest version available
* Enabled PCI debugging option in kernel config
* Tried a big number of different kernel configs and tried different patched kernels (gentoo, ubuntu); all with the same results
* Poked around with kernel parameters: noapic, acpi=off, acpi_osi. IRQ errors remain, although with different numbers.
* Enabled EC read/write access driver as build in, but /sys/kernel/debug/ec does not exist
* dmesg log inspection, will note the highlights below and attach the full log

[    0.006464] ACPI Error: [_SB_.PCI0.XHC_.RHUB.HS11] Namespace lookup failure, AE_NOT_FOUND (20170728/dswload-210)
[    0.006470] ACPI Exception: AE_NOT_FOUND, During name lookup/catalog (20170728/psobject-252)
[    0.006650] ACPI Exception: AE_NOT_FOUND, [DSDT] table load failed (20170728/tbxfload-198)
[    0.006658] ACPI Error: [\_SB_.PCI0.SAT0] Namespace lookup failure, AE_NOT_FOUND (20170728/dswload-210)
[    0.006661] ACPI Exception: AE_NOT_FOUND, During name lookup/catalog (20170728/psobject-252)
[    0.006665] ACPI Exception: AE_NOT_FOUND, (SSDT:SataTabl) while loading table (20170728/tbxfload-228)
[    0.006784] ACPI Error: [\_SB_.PCI0.PEG0] Namespace lookup failure, AE_NOT_FOUND (20170728/dswload-210)
[    0.006787] ACPI Exception: AE_NOT_FOUND, During name lookup/catalog (20170728/psobject-252)
[    0.006795] ACPI Exception: AE_NOT_FOUND, (SSDT: SaSsdt ) while loading table (20170728/tbxfload-228)
[    0.006826] ACPI Error: [\_PR_.CPU0] Namespace lookup failure, AE_NOT_FOUND (20170728/dswload-210)
[    0.006829] ACPI Exception: AE_NOT_FOUND, During name lookup/catalog (20170728/psobject-252)
[    0.006834] ACPI Exception: AE_NOT_FOUND, (SSDT: CpuSsdt) while loading table (20170728/tbxfload-228)
[    0.006837] ACPI Error: 4 table load failures, 1 successful (20170728/tbxfload-246)
....
[    1.551726] i801_smbus 0000:00:1f.4: runtime IRQ mapping not provided by arch
[    1.551731] i801_smbus 0000:00:1f.4: can't derive routing for PCI INT A
[    1.551732] i801_smbus 0000:00:1f.4: PCI INT A: not connected
[    1.551746] i801_smbus 0000:00:1f.4: SPD Write Disable is set
[    1.551758] i801_smbus 0000:00:1f.4: An interrupt is pending!
[    1.551762] i801_smbus 0000:00:1f.4: Failed to allocate irq -2147483648: -107
.....
Runtime IRQ message is shown allover dmesg:
[    0.576972] skl_uncore 0000:00:00.0: runtime IRQ mapping not provided by arch
[    0.582825] pcieport 0000:00:1c.0: runtime IRQ mapping not provided by arch
[    0.582894] pcieport 0000:00:1c.3: runtime IRQ mapping not provided by arch
[    0.582953] pcieport 0000:00:1c.5: runtime IRQ mapping not provided by arch
[    0.583066] pcieport 0000:00:1d.0: runtime IRQ mapping not provided by arch
[    0.863785] i915 0000:00:02.0: runtime IRQ mapping not provided by arch
[    0.949107] nvme 0000:04:00.0: runtime IRQ mapping not provided by arch
[    0.949114] mei_me 0000:00:16.0: runtime IRQ mapping not provided by arch
[    0.964866] ahci 0000:00:17.0: runtime IRQ mapping not provided by arch
[    0.983655] xhci_hcd 0000:00:14.0: runtime IRQ mapping not provided by arch
[    1.551726] i801_smbus 0000:00:1f.4: runtime IRQ mapping not provided by arch
[    4.629213] rtsx_pci 0000:02:00.0: runtime IRQ mapping not provided by arch
[    4.634056] e1000e 0000:00:1f.6: runtime IRQ mapping not provided by arch
[    4.634705] iwlwifi 0000:01:00.0: runtime IRQ mapping not provided by arch
[    4.908006] snd_hda_intel 0000:00:1f.3: runtime IRQ mapping not provided by arch
Comment 1 Tim Mohlmann 2017-10-12 13:34:10 UTC
Created attachment 258805 [details]
Output of acpidump
Comment 2 Lv Zheng 2017-10-13 01:32:16 UTC
I suppose you've uploaded an acpidump of "acpidump -c off" output as I can see tables that are failed to be loaded there.
If not, please refresh the acpidump with an "acpidump -c off" output.

Thanks in advance.
Comment 3 Lv Zheng 2017-10-13 01:34:37 UTC
> This issues are at least present in kernel versions 4.10 till 4.14.0-r4.

Can you also help to confirm if it is a regression?
And let us know the latest good kernel.

Thanks
Lv
Comment 4 Tim Mohlmann 2017-10-13 10:21:55 UTC
Created attachment 258817 [details]
acpidump -c off output

It appears I have 2 version of acpidump installed. (in /usr/sbin and /usr/bin) The first one used is an old version from pmtools package, 2011. This command also doesn't support "-c off".

Attached in the output from "acpidump -c off", from the iasl package.
Comment 5 Tim Mohlmann 2017-10-13 11:26:21 UTC
(In reply to Lv Zheng from comment #3)
> > This issues are at least present in kernel versions 4.10 till 4.14.0-r4.
> 
> Can you also help to confirm if it is a regression?
> And let us know the latest good kernel.
> 
> Thanks
> Lv

I have available in Gentoo portage (sys-kernel/vanilla-sources): versions 4.4.91 and 4.9.54.

4.4.91 and before is not able to fully boot my system due to bug in NVMe, which has been solved somewhere in 4.5. I just tried compile and boot the 4.4 kernel to search for the same errors, but the dmesg log on the screen is flooded from the panic caused by NVMe.

Looking back in my mail archives, I had some other functionality issues up to and including 4.8. Let's say that 4.10 was the first version where things started to work a bit "normal". (I skipped 4.9 at the time)

I compiled and booted 4.9.54. It behaves the same as versions >=4.10.

If you want me to manually download intermediate versions, let me know. But I will need a bit more time to do all that.
Comment 6 Tim Mohlmann 2017-10-13 23:05:53 UTC
I've booted 4.4.91 with busybox initramfs and inspected dmesg. Same table load errors exist. It ACPI error list is a bit longer on 4.4.91, but I couldn't mount my root device to dump it. But I guess it answers your question regarding regression.

The smbus error is also there, but it is complaining about IRQ 255: 22 instead of - 2147483648: 107
Comment 7 Lv Zheng 2017-10-17 07:23:25 UTC
Looks like a known issue due to:

    If (LEqual (PCHV (), SPTH))
    {
        Scope (_SB.PCI0.XHC.RHUB)
        {
            Device (HS11)

Please try if the following commit can fix the problem:
https://patchwork.kernel.org/patch/9347349/

We'll about to make it landing to the upstream now.

Linking this bug to:
https://bugs.acpica.org/show_bug.cgi?id=963

Thanks
Lv
Comment 8 Tim Mohlmann 2017-10-19 04:19:05 UTC
The proposed patch works for me. Great!

Verified fixes all below issues:
(In reply to Tim Mohlmann from comment #0)
> Created attachment 258803 [details]
> dmesg log
> 
> I have various issues wrt ACPI on my Clevo N350DW. This issues are at least
> present in kernel versions 4.10 till 4.14.0-r4.
> * Battery info missing
> * AC adapter status missing
> * Some hotkeys not working
> * ACPI errors in dmesg
> * IRQ errors for smBus device
> 

SO functionality is okay. However, dmesg is showing some new ACPI exceptions. I will attach the new dmesg. If that is a separate issue and you need new bug report, let me know.

Again, for me personally everything is fine like this. Many thanks for your help!
Comment 9 Tim Mohlmann 2017-10-19 04:20:11 UTC
Created attachment 260285 [details]
dmesg log after patch
Comment 10 Lv Zheng 2017-10-19 04:50:27 UTC
> dmesg is showing some new ACPI exceptions.

[    0.513702] ACPI Exception: Could not find/resolve named package element: LNKA (20170728/dspkginit-381)

It's just because a wrong message level.
Introduced by the following commit:
  Commit: a62a7117d91ca83d319566cbe16039f4e9f413c2
  Subject: ACPICA: Implement deferred resolution of reference package
           elements

The commit meant to fix compliance issue in dealing with in-package name strings.
ACPICA upstream originally false treated such incompliance as a forward reference support in Windows AML interpreter and tried to solve it by non-deferred object resolution.

But actually Windows never solves in-package name strings to object references, (and in fact there is no object reference object type defined by AML spec, this is internal to implementation), but leaves them as strings.

The commit is trying to add a deferred resolution for Linux so that fixing this problem won't affect Linux drivers that have already been dependent on the old behavior.

However the commit marked unresolved name strings (hence forward referenced) as exceptions.

This is a proof that it functions as old Linux drivers' expectation:
[    0.559190] ACPI: PCI Interrupt Link [LNKA] (IRQs 3 4 5 6 10 *11 12 14 15)
[    0.559227] ACPI: PCI Interrupt Link [LNKB] (IRQs 3 4 5 6 *10 11 12 14 15)
[    0.559261] ACPI: PCI Interrupt Link [LNKC] (IRQs 3 4 5 6 10 *11 12 14 15)
[    0.559294] ACPI: PCI Interrupt Link [LNKD] (IRQs 3 4 5 6 10 *11 12 14 15)
[    0.559328] ACPI: PCI Interrupt Link [LNKE] (IRQs 3 4 5 6 10 *11 12 14 15)
[    0.559361] ACPI: PCI Interrupt Link [LNKF] (IRQs 3 4 5 6 10 *11 12 14 15)
[    0.559394] ACPI: PCI Interrupt Link [LNKG] (IRQs 3 4 5 6 10 *11 12 14 15)
[    0.559428] ACPI: PCI Interrupt Link [LNKH] (IRQs 3 4 5 6 10 *11 12 14 15)

So please ignore these exceptions.

However we can ask ACPICA upstream to change this message to be a debugging message, rather than an exception.
Comment 11 Lv Zheng 2017-10-19 04:51:51 UTC
> If that is a separate issue and you need new bug report, let me know.

No need to file a new bug, it's not a blocking issue for Linux.
However you can help to raise this to the ACPICA bugzilla.
Comment 12 Tim Mohlmann 2017-10-19 20:01:21 UTC
(In reply to Lv Zheng from comment #11)
> > If that is a separate issue and you need new bug report, let me know.
> 
> No need to file a new bug, it's not a blocking issue for Linux.
> However you can help to raise this to the ACPICA bugzilla.

Okay, I will do so when I have time in the weekend.

(In reply to Lv Zheng from comment #7)
> Please try if the following commit can fix the problem:
> https://patchwork.kernel.org/patch/9347349/
> 
> We'll about to make it landing to the upstream now.

Just as a curiosity, which kernel version do you expect this change to be implemented?
Comment 13 Lv Zheng 2017-10-20 00:04:57 UTC
I expect it to be merged in 1-2 release cycles.
Comment 14 Erik Kaneda 2017-10-27 17:33:08 UTC
Tim, I've been looking at your original error and it stated this:

[    0.006464] ACPI Error: [_SB_.PCI0.XHC_.RHUB.HS11] Namespace lookup failure, AE_NOT_FOUND (20170728/dswload-210)
[    0.006470] ACPI Exception: AE_NOT_FOUND, During name lookup/catalog (20170728/psobject-252)

This is a bit concerning. It indicates that HS11 does not exist. From looking at the DSDT HS11 should be declared as a device. Device objects can be associated with many scope objects. The scope object is used to add different methods or named objects within the namespace of the device. I decided to grep your acpi tables for HS11 and found code that looked like this:

    If (LEqual (PCHV (), SPTH))
    {
        Scope (_SB.PCI0.XHC.RHUB)
        {
            Device (HS11)
            {
                ...
This code says if (LEqual (PCHV (), SPTH)) is true, then we add the HS11 device within the scope of _SB.PCI0.XHC.RHUB. It is very common to add devices conditionally so this is ok. However, at the bottom of the DSDT, I see something like this:
    Scope (_SB.PCI0.XHC.RHUB.HS11)
    {
         Device (CAM0)

This says that within the Scope of _SB.PCI0.XHC.RHUB.HS11, we create a device called CAM0. However, a scope object needs to be associated to some other named object. In this case, it should be associated with the HS11 device. The HS11 device was declared conditionally so there is a possibility that the HS11 or other devices declared within the _SB.PCI0.XHC.RHUB the was not added to the Namespace. We are looking at a few other issues right now but something tells me that this may be a firmware bug or something related to BIOS configuration.
Comment 15 Tim Mohlmann 2017-10-28 01:57:01 UTC
Maybe very noob from my side: are the device abbreviations linkable to anything real? I mean, if I read your story from the bottom up, there is CAM0 (webcam?). In the XHC domain (=connected to USB?). Which then again is in the PCI0 domain (=PCI). Which is exactly the case in my laptop.

There is also a kill switch on the keyboard, that should interupt the webcam from BIOS. That could be the condition?

I remember reading something in the brochure of this system that it is a security / privacy feature. I never actually used the webcam, so I don't know if and how this works exactly.

If it makes sense to you, let me know and I can play around with the Killswitch and BIOS settings to see if anything changes.
Comment 16 Lv Zheng 2017-10-30 03:02:28 UTC
Hi, Erik

That's a kind of commonly seen code in ASL.

If "If (LEqual (PCHV (), SPTH))" condition is not met, then "Scope (_SB.PCI0.XHC.RHUB.HS11)" beomes a fatal error with Windows interpreter.
So I guess it actually never fails.
Thus the table never fails on Windows, but on Linux, due to the wrong order of executing MLC, it fails as Linux requires the non-MLC ("Scope (_SB.PCI0.XHC.RHUB.HS11)") block to be interpreted before interpreting the MLC (the "If (LEqual (PCHV (), SPTH))") block.
Comment 17 Robert Moore 2017-11-02 17:42:57 UTC
All of these are seen to fail on windows -- with a blue-screen.

/* Blue-screens on windows
            If (1)
            {
                Device (DEVD) {}
            }
            Scope (DEVD)
            {
                Method (MTHD) {Return (0x1234)}
            }

/* Blue-screens on windows

            Name (INTD, 0)
            If (INTD)
            {
                Device (DEVD) {}
            }
            Scope (\_SB_.ABBU.DEVD)
            {
                Method (MTHD) {Return (0x1234)}
            }
*/

/* Blue-screens on windows
            Name (INTD, 1)
            If (INTD)
            {
                Device (DEVD) {}
            }
            Scope (\_SB_.ABBU.DEVD)
            {
                Method (MTHD) {Return (0x1234)}
            }
*/
Comment 18 Robert Moore 2017-11-02 17:51:52 UTC
by "blue-screen", I mean at table load time -- windows will not boot.
Comment 19 Robert Moore 2017-11-02 20:56:11 UTC
#17 could be an issue with the ABBU utility, however.
Comment 20 Robert Moore 2017-11-02 21:54:33 UTC
It looks like Windows *requires* a _HID under a device, else blue-screen.
Comment 21 Erik Kaneda 2017-11-02 21:57:11 UTC
For clarification, do all examples in comment #17 work if there is a _HID under the device?
Comment 22 Lv Zheng 2017-11-03 04:34:08 UTC
To Bob

Can we do this with ABBU now?
I usually use qemu to confirm such table load time bahavior.
qemu allows to inject user tables and we can see Debug output on qemue console if it correctly link to the serial console using qemu serial console options.

Thanks
Lv
Comment 23 Bioshi 2017-11-22 17:30:39 UTC
Hi!

I solved the problem by tweaking my bios with this link
http://s472165864.onlinehome.fr/anyware/index.php?dir=drivers/Clevo/N350DW/bios/

Everything is working !!
Comment 24 Zhang Rui 2017-12-18 03:20:10 UTC
So bug closed as it can not be reproduced after BIOS upgrade.

Note You need to log in before you can comment on or make changes to this bug.