Created attachment 210811 [details] dmesg Commit e64c952efb8e0c15ae82cec8e455ab4910690ef1 disables runpm when "laptops don't provide an ACPI method to control dGPU power". Up to now, runpm worked fine for me, but this commit disabled it also for me. There are some ACPI errors that maybe have something to do with it: [ 3.245994] vga_switcheroo: enabled [ 3.246040] ATPX version 1, functions 0x00000033 [ 3.278214] ACPI Error: Field [TMPB] at 294912 exceeds Buffer [ROM1] size 262144 (bits) (20160108/dsopcode-236) [ 3.278218] ACPI Error: Method parse/execution failed [\_SB.PCI0.GFX0.ATRM] (Node ffff88080e8daa50), AE_AML_BUFFER_LIMIT (20160108/psparse-542) [ 3.278224] failed to evaluate ATRM got AE_AML_BUFFER_LIMIT Full 4.6-rc1 dmesg is attached. Meanwhile I changed this function: bool radeon_has_atpx_dgpu_power_cntl(void) { return true; //radeon_atpx_priv.atpx.functions.power_cntl; } and runpm works again. The uvd and vce errors seen in the dmesg also vanished after this change and a reboot.
Created attachment 212671 [details] possible fix 1/2
Created attachment 212681 [details] possible fix 2/2
It does work for my laptop, so thanks for that. I guess if it was possible to test at runtime for it you'd have done it, so we can only hope that there are no other laptops with this problem..
Chris, can you attach your acpidump (part of the iasl package)?
Created attachment 223931 [details] acpidump This?
Yes, that is what I was looking for. [ 3.278214] ACPI Error: Field [TMPB] at 294912 exceeds Buffer [ROM1] size 262144 (bits) (20160108/dsopcode-236) [ 3.278218] ACPI Error: Method parse/execution failed [\_SB.PCI0.GFX0.ATRM] (Node ffff88080e8daa50), AE_AML_BUFFER_LIMIT (20160108/psparse-542) [ 3.278224] failed to evaluate ATRM got AE_AML_BUFFER_LIMIT This is harmless, apparently the radeon driver tries to fetch blocks of 4K until no more results are available. Your videorom is only 32 KiB, so fetching more than that will fail. ATPX Supported Functions is 0x33 if \SGMD == 0x02 and 0xBF otherwise. Do you have a BIOS option that allows you to select between hybrid graphics and discrete mode?
No, the stock BIOS of the Clevo P170EM has *very* limited settings and I believe the radeon GPU has no physical connectors connected at all.
Created attachment 224001 [details] possible patch (force integers into bools) Yay, I found documentation for the SGMD (SG Mode) field: 0 - None 1 - Switchable Graphics (Muxed) 2 - Switchable Graphics (Muxless) 3 - Discrete only (Related: SGFL stands for Switchable Graphics Feature List.) By the way, the functions mask 0x33 still covers ATPX_POWER_CONTROL_SUPPORTED ((1<<1) == 2). Something else must be broken. Based on the dmesg, we can see that: if (radeon_modeset == 1) { DRM_INFO("radeon kernel modesetting enabled.\n"); // <-- called driver = &kms_driver; // ... radeon_register_atpx_handler(); // called, see dmesg: // "[ 3.245994] vga_switcheroo: enabled" // radeon_atpx_init is called after the previous printk. // radeon_atpx_verify_interface is called and returns 0, see dmesg: // "[ 3.246040] ATPX version 1, functions 0x00000033" // now radeon_atpx_parse_functions(..., 0x33) is called which: // f->power_cntl = mask & ATPX_POWER_CONTROL_SUPPORTED; // = 0x33 & (1 << 1) = 2 (which is a truth value) } /* let modprobe override vga console setting */ return drm_pci_init(driver, pdriver); // driver.load (radeon_driver_load_kms) -> // radeon_device_init() -> // radeon_has_atpx_dgpu_power_cntl() -> // return radeon_atpx_priv.atpx.functions.power_cntl // 2 per above Maybe there is a compiler bug... can you try the attached patch?
For that to work on stable v4.6 you have to revert dfc4f59d901b and bfaddd9fc8ac. My .config and compiler generates the same (inlined) code for both sources: // (radeon.ko) mask & (1 << 1) => (mask >> 1) & 1 ed828: d1 ea shr edx,1 ed82a: 83 e2 01 and edx,0x1 ed82d: 88 15 00 00 00 00 mov BYTE PTR [rip+0x0],dl
Well, it does not help. I wanted to get you some more info, so I inserted some printks: in radeon_atpx_handler.c static void radeon_atpx_parse_functions(struct radeon_atpx_functions *f, u32 mask) { printk(KERN_ERR "ATPX mask: %u\n", mask); printk(KERN_ERR "ATPX power control supported: %u\n", ATPX_POWER_CONTROL_SUPPORTED); printk(KERN_ERR "mask & ATPX_POWER_CONTROL_SUPPORTED: %u\n", mask & ATPX_POWER_CONTROL_SUPPORTED); printk(KERN_ERR "!!(mask & ATPX_POWER_CONTROL_SUPPORTED): %u\n", !!(mask & ATPX_POWER_CONTROL_SUPPORTED)); in radeon_device.c printk(KERN_ERR "rdev->flags: %u\n", rdev->flags); printk(KERN_ERR "rdev->flags & RADEON_IS_PX: %u\n", rdev->flags & RADEON_IS_PX); printk(KERN_ERR "radeon_has_atpx_dgpu_power_cntl(): %u\n", radeon_has_atpx_dgpu_power_cntl()); if ((rdev->flags & RADEON_IS_PX) && radeon_has_atpx_dgpu_power_cntl()) runtime = true; Then I booted and got this: [ 6.235984] rdev->flags: 39911477 [ 6.238221] rdev->flags & RADEON_IS_PX: 33554432 [ 6.239947] radeon_has_atpx_dgpu_power_cntl(): 0 [ 6.241572] vga_switcheroo: enabled [ 6.241657] ATPX version 1, functions 0x00000033 [ 6.241659] ATPX mask: 51 [ 6.243153] ATPX power control supported: 2 [ 6.244508] mask & ATPX_POWER_CONTROL_SUPPORTED: 2 [ 6.245649] !!(mask & ATPX_POWER_CONTROL_SUPPORTED): 1 Uhm... so, does that mean the ATPX BIOS version has been fine in the first place and that radeon_has_atpx_dgpu_power_cntl() returns 0, just because the check whether to enable runpm is done BEFORE the atpx stuff is even parsed? I didn't want to trace everything back to the inits, so I just put another radeon_atpx_init(); call just before the runpm check, removed static from the function and added a forward declaration int radeon_atpx_init(void); ... radeon_atpx_init(); printk(KERN_ERR "rdev->flags: %u\n", rdev->flags); printk(KERN_ERR "rdev->flags & RADEON_IS_PX: %u\n", rdev->flags & RADEON_IS_PX); printk(KERN_ERR "radeon_has_atpx_dgpu_power_cntl(): %u\n", radeon_has_atpx_dgpu_power_cntl()); if ((rdev->flags & RADEON_IS_PX) && radeon_has_atpx_dgpu_power_cntl()) { runtime = true; printk(KERN_ERR "runpm is enabled. Yay\n"); } And I got [ 6.223693] ATPX mask: 51 [ 6.224823] ATPX power control supported: 2 [ 6.225937] mask & ATPX_POWER_CONTROL_SUPPORTED: 2 [ 6.227140] !!(mask & ATPX_POWER_CONTROL_SUPPORTED): 1 [ 6.228198] rdev->flags: 39911477 [ 6.229274] rdev->flags & RADEON_IS_PX: 33554432 [ 6.230181] radeon_has_atpx_dgpu_power_cntl(): 1 [ 6.231039] runpm is enabled. Yay [ 6.231889] vga_switcheroo: enabled [ 6.231956] ATPX version 1, functions 0x00000033 [ 6.231957] ATPX mask: 51 [ 6.232267] usbcore: registered new interface driver btusb [ 6.232932] ATPX power control supported: 2 [ 6.233849] mask & ATPX_POWER_CONTROL_SUPPORTED: 2 [ 6.234717] !!(mask & ATPX_POWER_CONTROL_SUPPORTED): 1 YES! IT DOES! Damn! Thank you so much for asking a follow up question so I can stop telling people that the BIOS on several laptops is broken and looking like an idiot. On the other side: It probably would have been a good idea to verify that there IS hardware where it actually works as intended before declaring the BIOS to be at fault.
Can you put a dump_stack() in the atpx_init function and radeon_has_atpx_dgpu_power_cntl functions such that we can see how it is called?
First is the one radeon_atpx_init(); call that I added manually to radeon_device_init in order to make it work: [ 6.083965] CPU: 5 PID: 295 Comm: systemd-udevd Not tainted 4.6.0-mainline #1 [ 6.083966] Hardware name: CLEVO P170EM/P170EM, BIOS 4.6.5 08/22/2012 [ 6.083968] 0000000000000286 00000000ab74fc8d ffff88007f8e7960 ffffffff8131c6e8 [ 6.083970] ffff88080a2c4000 ffff88080a2c5b80 ffff88007f8e79a0 ffffffffa093dc15 [ 6.083972] ffffffff81455aba ffff88080a2c4000 00000000ab74fc8d ffff88080a2c4000 [ 6.083974] Call Trace: [ 6.083978] [<ffffffff8131c6e8>] dump_stack+0x76/0x9e [ 6.084014] [<ffffffffa093dc15>] radeon_atpx_init+0x35/0x240 [radeon] [ 6.084017] [<ffffffff81455aba>] ? vga_client_register+0x8a/0xa0 [ 6.084039] [<ffffffffa08451a5>] radeon_device_init+0x875/0xcd0 [radeon] [ 6.084060] [<ffffffffa0847900>] radeon_driver_load_kms+0xb0/0x230 [radeon] [ 6.084066] [<ffffffffa000a34a>] drm_dev_register+0xba/0xd0 [drm] [ 6.084070] [<ffffffffa000c9b1>] drm_get_pci_dev+0xe1/0x1f0 [drm] [ 6.084090] [<ffffffffa08434c3>] radeon_pci_probe+0xc3/0xe0 [radeon] [ 6.084092] [<ffffffff81366d74>] local_pci_probe+0x54/0xb0 [ 6.084094] [<ffffffff81366ca3>] ? pci_match_device+0xf3/0x120 [ 6.084095] [<ffffffff813680e8>] pci_device_probe+0x118/0x170 [ 6.084098] [<ffffffff814613ef>] driver_probe_device+0x23f/0x450 [ 6.084100] [<ffffffff814616e0>] __driver_attach+0xe0/0x100 [ 6.084102] [<ffffffff81461600>] ? driver_probe_device+0x450/0x450 [ 6.084103] [<ffffffff8145ea1b>] bus_for_each_dev+0x7b/0xc0 [ 6.084105] [<ffffffff814609b1>] driver_attach+0x31/0x40 [ 6.084107] [<ffffffff81460366>] bus_add_driver+0x1d6/0x2a0 [ 6.084109] [<ffffffff814621f3>] driver_register+0x73/0xf0 [ 6.084110] [<ffffffff8136643f>] __pci_register_driver+0x5f/0x70 [ 6.084115] [<ffffffffa000cbaf>] drm_pci_init+0xef/0x120 [drm] [ 6.084117] [<ffffffff8145884d>] ? vga_switcheroo_register_handler+0x7d/0xa0 [ 6.084119] [<ffffffffa09b5000>] ? 0xffffffffa09b5000 [ 6.084140] [<ffffffffa09b50ab>] radeon_init+0xab/0xc0 [radeon] [ 6.084142] [<ffffffff81002152>] do_one_initcall+0xc2/0x200 [ 6.084144] [<ffffffff811d6e08>] ? __vunmap+0x98/0xe0 [ 6.084146] [<ffffffff811d6ee1>] ? vfree+0x41/0x80 [ 6.084149] [<ffffffff8118a881>] do_init_module+0x72/0x1f1 [ 6.084151] [<ffffffff81119cf2>] load_module+0x2192/0x29c0 [ 6.084153] [<ffffffff81116c50>] ? symbol_put_addr+0x60/0x60 [ 6.084155] [<ffffffff811c2c8b>] ? __pte_alloc_kernel+0xbb/0x110 [ 6.084158] [<ffffffff8111a682>] SyS_init_module+0x162/0x1a0 [ 6.084160] [<ffffffff81612ab2>] entry_SYSCALL_64_fastpath+0x1a/0xa4 Next there's a call that I did in printk() which I removed here because it's the same as the next: Next is radeon_device_init calling radeon_has_atpx_dgpu_power_cntl() and the result would be used to disable runpm without the atpx init before: [ 6.094760] CPU: 1 PID: 295 Comm: systemd-udevd Not tainted 4.6.0-mainline #1 [ 6.094762] Hardware name: CLEVO P170EM/P170EM, BIOS 4.6.5 08/22/2012 [ 6.094763] 0000000000000286 00000000ab74fc8d ffff88007f8e7990 ffffffff8131c6e8 [ 6.094766] ffff88080a2c4000 ffff88080a2c5b80 ffff88007f8e79a0 ffffffffa093de71 [ 6.094767] ffff88007f8e79e0 ffffffffa0845349 ffff880000000000 ffff880809dc5000 [ 6.094770] Call Trace: [ 6.094774] [<ffffffff8131c6e8>] dump_stack+0x76/0x9e [ 6.094808] [<ffffffffa093de71>] radeon_has_atpx_dgpu_power_cntl+0x21/0x30 [radeon] [ 6.094829] [<ffffffffa0845349>] radeon_device_init+0xa19/0xcd0 [radeon] [ 6.094849] [<ffffffffa0847900>] radeon_driver_load_kms+0xb0/0x230 [radeon] [ 6.094855] [<ffffffffa000a34a>] drm_dev_register+0xba/0xd0 [drm] [ 6.094860] [<ffffffffa000c9b1>] drm_get_pci_dev+0xe1/0x1f0 [drm] [ 6.094880] [<ffffffffa08434c3>] radeon_pci_probe+0xc3/0xe0 [radeon] [ 6.094882] [<ffffffff81366d74>] local_pci_probe+0x54/0xb0 [ 6.094884] [<ffffffff81366ca3>] ? pci_match_device+0xf3/0x120 [ 6.094885] [<ffffffff813680e8>] pci_device_probe+0x118/0x170 [ 6.094888] [<ffffffff814613ef>] driver_probe_device+0x23f/0x450 [ 6.094890] [<ffffffff814616e0>] __driver_attach+0xe0/0x100 [ 6.094892] [<ffffffff81461600>] ? driver_probe_device+0x450/0x450 [ 6.094893] [<ffffffff8145ea1b>] bus_for_each_dev+0x7b/0xc0 [ 6.094895] [<ffffffff814609b1>] driver_attach+0x31/0x40 [ 6.094897] [<ffffffff81460366>] bus_add_driver+0x1d6/0x2a0 [ 6.094899] [<ffffffff814621f3>] driver_register+0x73/0xf0 [ 6.094900] [<ffffffff8136643f>] __pci_register_driver+0x5f/0x70 [ 6.094905] [<ffffffffa000cbaf>] drm_pci_init+0xef/0x120 [drm] [ 6.094907] [<ffffffff8145884d>] ? vga_switcheroo_register_handler+0x7d/0xa0 [ 6.094909] [<ffffffffa09b5000>] ? 0xffffffffa09b5000 [ 6.094929] [<ffffffffa09b50ab>] radeon_init+0xab/0xc0 [radeon] [ 6.094931] [<ffffffff81002152>] do_one_initcall+0xc2/0x200 [ 6.094934] [<ffffffff811d6e08>] ? __vunmap+0x98/0xe0 [ 6.094936] [<ffffffff811d6ee1>] ? vfree+0x41/0x80 [ 6.094938] [<ffffffff8118a881>] do_init_module+0x72/0x1f1 [ 6.094941] [<ffffffff81119cf2>] load_module+0x2192/0x29c0 [ 6.094943] [<ffffffff81116c50>] ? symbol_put_addr+0x60/0x60 [ 6.094944] [<ffffffff811c2c8b>] ? __pte_alloc_kernel+0xbb/0x110 [ 6.094947] [<ffffffff8111a682>] SyS_init_module+0x162/0x1a0 [ 6.094951] [<ffffffff81612ab2>] entry_SYSCALL_64_fastpath+0x1a/0xa4 And finally here is where the kernel inits the atpx stuff by default: [ 6.096398] CPU: 1 PID: 295 Comm: systemd-udevd Not tainted 4.6.0-mainline #1 [ 6.096400] Hardware name: CLEVO P170EM/P170EM, BIOS 4.6.5 08/22/2012 [ 6.096401] 0000000000000286 00000000ab74fc8d ffff88007f8e78e0 ffffffff8131c6e8 [ 6.096406] ffff880809001700 0000000000000001 ffff88007f8e7920 ffffffffa093dc15 [ 6.096408] ffff88007f8e7900 00000000ab74fc8d 00000000ab74fc8d ffff880809001700 [ 6.096411] Call Trace: [ 6.096416] [<ffffffff8131c6e8>] dump_stack+0x76/0x9e [ 6.096461] [<ffffffffa093dc15>] radeon_atpx_init+0x35/0x240 [radeon] [ 6.096466] [<ffffffff814ea83c>] vga_switcheroo_enable+0x2e/0x11e [ 6.096470] [<ffffffff8145896d>] register_client+0xfd/0x110 [ 6.096473] [<ffffffff814589fb>] vga_switcheroo_register_client+0x4b/0x60 [ 6.096511] [<ffffffffa0845372>] radeon_device_init+0xa42/0xcd0 [radeon] [ 6.096549] [<ffffffffa0847900>] radeon_driver_load_kms+0xb0/0x230 [radeon] [ 6.096558] [<ffffffffa000a34a>] drm_dev_register+0xba/0xd0 [drm] [ 6.096566] [<ffffffffa000c9b1>] drm_get_pci_dev+0xe1/0x1f0 [drm] [ 6.096603] [<ffffffffa08434c3>] radeon_pci_probe+0xc3/0xe0 [radeon] [ 6.096606] [<ffffffff81366d74>] local_pci_probe+0x54/0xb0 [ 6.096609] [<ffffffff81366ca3>] ? pci_match_device+0xf3/0x120 [ 6.096612] [<ffffffff813680e8>] pci_device_probe+0x118/0x170 [ 6.096615] [<ffffffff814613ef>] driver_probe_device+0x23f/0x450 [ 6.096618] [<ffffffff814616e0>] __driver_attach+0xe0/0x100 [ 6.096621] [<ffffffff81461600>] ? driver_probe_device+0x450/0x450 [ 6.096624] [<ffffffff8145ea1b>] bus_for_each_dev+0x7b/0xc0 [ 6.096627] [<ffffffff814609b1>] driver_attach+0x31/0x40 [ 6.096630] [<ffffffff81460366>] bus_add_driver+0x1d6/0x2a0 [ 6.096633] [<ffffffff814621f3>] driver_register+0x73/0xf0 [ 6.096636] [<ffffffff8136643f>] __pci_register_driver+0x5f/0x70 [ 6.096644] [<ffffffffa000cbaf>] drm_pci_init+0xef/0x120 [drm] [ 6.096647] [<ffffffff8145884d>] ? vga_switcheroo_register_handler+0x7d/0xa0 [ 6.096650] [<ffffffffa09b5000>] ? 0xffffffffa09b5000 [ 6.096686] [<ffffffffa09b50ab>] radeon_init+0xab/0xc0 [radeon] [ 6.096689] [<ffffffff81002152>] do_one_initcall+0xc2/0x200 [ 6.096693] [<ffffffff811d6e08>] ? __vunmap+0x98/0xe0 [ 6.096696] [<ffffffff811d6ee1>] ? vfree+0x41/0x80 [ 6.096700] [<ffffffff8118a881>] do_init_module+0x72/0x1f1 [ 6.096703] [<ffffffff81119cf2>] load_module+0x2192/0x29c0 [ 6.096706] [<ffffffff81116c50>] ? symbol_put_addr+0x60/0x60 [ 6.096708] [<ffffffff811c2c8b>] ? __pte_alloc_kernel+0xbb/0x110 [ 6.096713] [<ffffffff8111a682>] SyS_init_module+0x162/0x1a0 [ 6.096719] [<ffffffff81612ab2>] entry_SYSCALL_64_fastpath+0x1a/0xa4
This seems like a circular dependency. if ((rdev->flags & RADEON_IS_PX) && radeon_has_atpx_dgpu_power_cntl()) runtime = true; vga_switcheroo_register_client(rdev->pdev, &radeon_switcheroo_ops, runtime); vga_switcheroo_register_client needs to know if runpm is enabled, but the kernel only knows whether to enable runpm after vga_switcheroo_register_client has caused radeon_atpx_init to be executed.
Ah... vga_switcheroo_ready() actually checks for clients.. makes sense. As documented[1], the init handler is called when there is a dependence on vga[_switcheroo] clients. Radeon however does not seem to need this, so the initialization of functions is better done elsewhere. radeon_atpx_detect sounds like an appropriate location. [1]: https://www.kernel.org/doc/htmldocs/gpu/API-struct-vga-switcheroo-handler.html
Bug can be closed, a fix is in upstream (v4.8-rc1) commit 69ee9742f945cda8bd0081961770cd2e3192a77a Author: Alex Deucher <alexander.deucher@amd.com> Date: Wed Jul 27 14:56:12 2016 -0400 drm/radeon: init atpx at switcheroo register time v2 Chris, can you validate this?