Bug 104791
Summary: | ACPI errors on Lenovo C50-30 (AE_AML_INFINITE_LOOP, argument type mismatch) | ||
---|---|---|---|
Product: | Drivers | Reporter: | James Ettle (james) |
Component: | Video(DRI - non Intel) | Assignee: | drivers_video-dri |
Status: | RESOLVED CODE_FIX | ||
Severity: | normal | CC: | aaron.lu, peter |
Priority: | P1 | ||
Hardware: | All | ||
OS: | Linux | ||
Kernel Version: | 4.1.6 | Subsystem: | |
Regression: | No | Bisected commit-id: | |
Attachments: |
acpidump output
dmesg output |
Created attachment 188001 [details]
dmesg output
Still present with 4.1.7. An additional observation: this method is invoked whenever I switch between a VT and X, with a corresponding spike in CPU usage and a new report in the kernel buffer. Is there some way that AML methods that misbehave like this could be blacklisted? I presume it's not really doing what it ought to anyway... For the time-being, I can avoid this by disabling nouveau. Just to waffle on a bit more... The stuck AML method contains a loop of the form while(true) { T_0 = Arg2 if(T_0 == 0) { // do some stuff return Local0 } else { if(T_0 == 0x1A) { // do some more stuff return Local0 else { if(T_0 == 0x10) return GOBT(Arg3) } } // no break here } There's no "break" at the end of the while, as there are in other while(true) loops in this DSDT -- is this a common AML trick to get a local context? (In that case, if a loop's not needed, why not an if(true)?) None of the "do stuff" bits appear to modify T_0 or Arg2, as far as I understand them, so whether the loop exits depends only on Arg2's value at the start. I'm guessing under Windows this method is invoked at the behest of the Nvidia driver, which sets Arg2 to match one of the exit conditions. I wonder if the nouveau people should have a loot at this one, too... Moving to GPU for nouveau people to take a look. The problem is that nouveau_switcheroo_optimus_dsm() unconditionally calls function 0x1B ("NOUVEAU_DSM_OPTIMUS_FLAGS") without checking that this function is actually supported. Lenovo's firmware decides to run into an infinite loop in this case instead of returning an error code. Proposed patch: https://lkml.kernel.org/r/1463244575-3515-1-git-send-email-peter@lekensteyn.nl (In reply to Peter Wu from comment #6) > Proposed patch: > https://lkml.kernel.org/r/1463244575-3515-1-git-send-email-peter@lekensteyn. > nl Sorry I didn't immediately spot your response. (Bad Google spam filter.) I can't get the above link to work, please clarify and I'll try and remember how to build a patched kernel (it's been a while...). Thanks. Apparently the mail is not found on that list, try this one: https://lists.freedesktop.org/archives/nouveau/2016-May/025039.html This patch does stop the infinite loop on my machine. (Although now with or without it nouveau seems to cause delays starting GNOME 3, but that's another bug report.) (In reply to James from comment #9) > This patch does stop the infinite loop on my machine. Cool, can I add your Tested-by: James <...@googlemail.com> to the patch? > (Although now with or without it nouveau seems to cause delays starting > GNOME 3, but that's another bug report.) If it is a second or so, that is likely the card switching on or off. With or without this patch you can disable the power saving by setting the module option nouveau.runpm=0 (kernel cmdline) or "options nouveau runpm=0" (modprobe.conf snippet). Do you still experience a delay after that? (In reply to Peter Wu from comment #10) > (In reply to James from comment #9) > > This patch does stop the infinite loop on my machine. > > Cool, can I add your Tested-by: James <...@googlemail.com> to the patch? Yes you may. Please note I tested this patch against the current Fedora 23 kernel, 4.4.9-300.fc23.x86_64. > > (Although now with or without it nouveau seems to cause delays starting > > GNOME 3, but that's another bug report.) > > If it is a second or so, that is likely the card switching on or off. With > or without this patch you can disable the power saving by setting the module > option nouveau.runpm=0 (kernel cmdline) or "options nouveau runpm=0" > (modprobe.conf snippet). Do you still experience a delay after that? It was more like 10 seconds. However the delay did disappear after using runpm=0. (I generally don't use the nVidia chip on this machine even under Windows, it's an 820A on 4 PCIe lanes. I've not seen it show performance advantage over the built-in Intel HD5500, I don't know why Lenovo bothered with it.) I've not seen this for a while now, using the Nvidia 820A on this machine seems to be working OK. Closing, tested with kernel-5.6.16-300.fc32.x86_64. |
Created attachment 187991 [details] acpidump output I'm seeing a handful of occasional ACPI errors on a Lenovo C50-30. Currently using kernel 4.1.6-201.fc22.x86_64, but they've always been there to my knowledge [ 1.154595] ACPI Warning: \_SB_.PCI0.GFX0._DSM: Argument #4 type mismatch - Found [Buffer], ACPI requires [Package] (20150410/nsarguments-95) [ 1.154710] ACPI Warning: \_SB_.PCI0.RP05.PEGN._DSM: Argument #4 type mismatch - Found [Buffer], ACPI requires [Package] (20150410/nsarguments-95) [ 1.155068] ACPI Warning: \_SB_.PCI0.RP05.PEGN._DSM: Argument #4 type mismatch - Found [Buffer], ACPI requires [Package] (20150410/nsarguments-95) [ 1.167601] ACPI Exception: AE_NOT_FOUND, Evaluating _DOD (20150410/video-1290) [ 6.721508] ACPI Warning: \_SB_.PCI0.RP05.PEGN._DSM: Argument #4 type mismatch - Found [Buffer], ACPI requires [Package] (20150410/nsarguments-95) [ 7.284445] ACPI Error: Method parse/execution failed [\_SB_.PCI0.RP05.PEGN.NVOP] (Node ffff8802260a7be0), AE_AML_INFINITE_LOOP (20150410/psparse-536) [ 7.284455] ACPI Error: Method parse/execution failed [\_SB_.PCI0.RP05.PEGN._DSM] (Node ffff8802260a7d98), AE_AML_INFINITE_LOOP (20150410/psparse-536) [ 7.284463] ACPI: \_SB_.PCI0.RP05.PEGN: failed to evaluate _DSM (0x3021) [ 7.284465] ACPI: \_SB_.PCI0.RP05.PEGN: failed to evaluate _DSM Full dmesg and acpidump attached. I think PCI0.RP05 is something to do with the Nvidia chip in an Optimus set-up. I'm not using this GPU. I've not encountered any stuck kapcid or other functional problems, but would appreciate knowing whether this is a Linux bug or a broken BIOS that needs reporting.