Bug 115321 - radeon runpm falsely disabled on Clevo P170EM
Summary: radeon runpm falsely disabled on Clevo P170EM
Status: NEW
Alias: None
Product: Drivers
Classification: Unclassified
Component: Video(DRI - non Intel) (show other bugs)
Hardware: All Linux
: P1 normal
Assignee: drivers_video-dri
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2016-03-27 10:17 UTC by Christoph Haag
Modified: 2016-08-24 14:07 UTC (History)
4 users (show)

See Also:
Kernel Version: 4.6-rc1
Subsystem:
Regression: No
Bisected commit-id:


Attachments
dmesg (72.75 KB, application/octet-stream)
2016-03-27 10:17 UTC, Christoph Haag
Details
possible fix 1/2 (2.96 KB, patch)
2016-04-14 03:26 UTC, Alex Deucher
Details | Diff
possible fix 2/2 (1.06 KB, patch)
2016-04-14 03:26 UTC, Alex Deucher
Details | Diff
acpidump (241.64 KB, text/plain)
2016-07-15 14:19 UTC, Christoph Haag
Details
possible patch (force integers into bools) (1.56 KB, patch)
2016-07-15 19:30 UTC, Peter Wu
Details | Diff

Description Christoph Haag 2016-03-27 10:17:50 UTC
Created attachment 210811 [details]
dmesg

Commit e64c952efb8e0c15ae82cec8e455ab4910690ef1 disables runpm when "laptops don't provide an ACPI method to control dGPU power".

Up to now, runpm worked fine for me, but this commit disabled it also for me. There are some ACPI errors that maybe have something to do with it:

[    3.245994] vga_switcheroo: enabled
[    3.246040] ATPX version 1, functions 0x00000033
[    3.278214] ACPI Error: Field [TMPB] at 294912 exceeds Buffer [ROM1] size 262144 (bits) (20160108/dsopcode-236)
[    3.278218] ACPI Error: Method parse/execution failed [\_SB.PCI0.GFX0.ATRM] (Node ffff88080e8daa50), AE_AML_BUFFER_LIMIT (20160108/psparse-542)
[    3.278224] failed to evaluate ATRM got AE_AML_BUFFER_LIMIT

Full 4.6-rc1 dmesg is attached.

Meanwhile I changed this function:

bool radeon_has_atpx_dgpu_power_cntl(void) {
	return true; //radeon_atpx_priv.atpx.functions.power_cntl;
}

and runpm works again. The uvd and vce errors seen in the dmesg also vanished after this change and a reboot.
Comment 1 Alex Deucher 2016-04-14 03:26:00 UTC
Created attachment 212671 [details]
possible fix 1/2
Comment 2 Alex Deucher 2016-04-14 03:26:19 UTC
Created attachment 212681 [details]
possible fix 2/2
Comment 3 Christoph Haag 2016-04-14 15:27:49 UTC
It does work for my laptop, so thanks for that. I guess if it was possible to test at runtime for it you'd have done it, so we can only hope that there are no other laptops with this problem..
Comment 4 Peter Wu 2016-07-15 13:57:16 UTC
Chris, can you attach your acpidump (part of the iasl package)?
Comment 5 Christoph Haag 2016-07-15 14:19:43 UTC
Created attachment 223931 [details]
acpidump

This?
Comment 6 Peter Wu 2016-07-15 15:21:34 UTC
Yes, that is what I was looking for.

[    3.278214] ACPI Error: Field [TMPB] at 294912 exceeds Buffer [ROM1] size 262144 (bits) (20160108/dsopcode-236)
[    3.278218] ACPI Error: Method parse/execution failed [\_SB.PCI0.GFX0.ATRM] (Node ffff88080e8daa50), AE_AML_BUFFER_LIMIT (20160108/psparse-542)
[    3.278224] failed to evaluate ATRM got AE_AML_BUFFER_LIMIT

This is harmless, apparently the radeon driver tries to fetch blocks of 4K until no more results are available. Your videorom is only 32 KiB, so fetching more than that will fail.


ATPX Supported Functions is 0x33 if \SGMD == 0x02 and 0xBF otherwise. Do you have a BIOS option that allows you to select between hybrid graphics and discrete mode?
Comment 7 Christoph Haag 2016-07-15 15:26:17 UTC
No, the stock BIOS of the Clevo P170EM has *very* limited settings and I believe the radeon GPU has no physical connectors connected at all.
Comment 8 Peter Wu 2016-07-15 19:30:51 UTC
Created attachment 224001 [details]
possible patch (force integers into bools)

Yay, I found documentation for the SGMD (SG Mode) field:
0 - None
1 - Switchable Graphics (Muxed)
2 - Switchable Graphics (Muxless)
3 - Discrete only

(Related: SGFL stands for Switchable Graphics Feature List.)

By the way, the functions mask 0x33 still covers ATPX_POWER_CONTROL_SUPPORTED ((1<<1) == 2). Something else must be broken.

Based on the dmesg, we can see that:

    if (radeon_modeset == 1) {
        DRM_INFO("radeon kernel modesetting enabled.\n"); // <-- called
        driver = &kms_driver;
        // ...
        radeon_register_atpx_handler(); // called, see dmesg:
        // "[    3.245994] vga_switcheroo: enabled"
        // radeon_atpx_init is called after the previous printk.
        // radeon_atpx_verify_interface is called and returns 0, see dmesg:
        // "[    3.246040] ATPX version 1, functions 0x00000033"
        // now radeon_atpx_parse_functions(..., 0x33) is called which:
	// f->power_cntl = mask & ATPX_POWER_CONTROL_SUPPORTED;
        // = 0x33 & (1 << 1) = 2 (which is a truth value)
    }

    /* let modprobe override vga console setting */
    return drm_pci_init(driver, pdriver);
    // driver.load (radeon_driver_load_kms) ->
    //   radeon_device_init() ->
    //     radeon_has_atpx_dgpu_power_cntl() ->
    //       return radeon_atpx_priv.atpx.functions.power_cntl // 2 per above

Maybe there is a compiler bug... can you try the attached patch?
Comment 9 Peter Wu 2016-07-15 19:42:55 UTC
For that to work on stable v4.6 you have to revert dfc4f59d901b and bfaddd9fc8ac. My .config and compiler generates the same (inlined) code for both sources:

// (radeon.ko) mask & (1 << 1) => (mask >> 1) & 1
   ed828:       d1 ea                   shr    edx,1
   ed82a:       83 e2 01                and    edx,0x1
   ed82d:       88 15 00 00 00 00       mov    BYTE PTR [rip+0x0],dl
Comment 10 Christoph Haag 2016-07-16 13:20:57 UTC
Well, it does not help. I wanted to get you some more info, so I inserted some printks:

in radeon_atpx_handler.c
static void radeon_atpx_parse_functions(struct radeon_atpx_functions *f, u32 mask)
{
        printk(KERN_ERR "ATPX mask: %u\n", mask);
        printk(KERN_ERR "ATPX power control supported: %u\n", ATPX_POWER_CONTROL_SUPPORTED);
        printk(KERN_ERR "mask & ATPX_POWER_CONTROL_SUPPORTED: %u\n", mask & ATPX_POWER_CONTROL_SUPPORTED);
        printk(KERN_ERR "!!(mask & ATPX_POWER_CONTROL_SUPPORTED): %u\n", !!(mask & ATPX_POWER_CONTROL_SUPPORTED));

in radeon_device.c
        printk(KERN_ERR "rdev->flags: %u\n", rdev->flags);
        printk(KERN_ERR "rdev->flags & RADEON_IS_PX: %u\n", rdev->flags & RADEON_IS_PX);
        printk(KERN_ERR "radeon_has_atpx_dgpu_power_cntl(): %u\n", radeon_has_atpx_dgpu_power_cntl());
	if ((rdev->flags & RADEON_IS_PX) && radeon_has_atpx_dgpu_power_cntl())
		runtime = true;

Then I booted and got this:

[    6.235984] rdev->flags: 39911477
[    6.238221] rdev->flags & RADEON_IS_PX: 33554432
[    6.239947] radeon_has_atpx_dgpu_power_cntl(): 0
[    6.241572] vga_switcheroo: enabled
[    6.241657] ATPX version 1, functions 0x00000033
[    6.241659] ATPX mask: 51
[    6.243153] ATPX power control supported: 2
[    6.244508] mask & ATPX_POWER_CONTROL_SUPPORTED: 2
[    6.245649] !!(mask & ATPX_POWER_CONTROL_SUPPORTED): 1

Uhm... so, does that mean the ATPX BIOS version has been fine in the first place and that radeon_has_atpx_dgpu_power_cntl() returns 0, just because the check whether to enable runpm is done BEFORE the atpx stuff is even parsed?

I didn't want to trace everything back to the inits, so I just put another radeon_atpx_init(); call just before the runpm check, removed static from the function and added a forward declaration
int radeon_atpx_init(void);
...
        radeon_atpx_init();
        printk(KERN_ERR "rdev->flags: %u\n", rdev->flags);
        printk(KERN_ERR "rdev->flags & RADEON_IS_PX: %u\n", rdev->flags & RADEON_IS_PX);
        printk(KERN_ERR "radeon_has_atpx_dgpu_power_cntl(): %u\n", radeon_has_atpx_dgpu_power_cntl());
	if ((rdev->flags & RADEON_IS_PX) && radeon_has_atpx_dgpu_power_cntl()) {
		runtime = true;
                printk(KERN_ERR "runpm is enabled. Yay\n");
        }

And I got
[    6.223693] ATPX mask: 51
[    6.224823] ATPX power control supported: 2
[    6.225937] mask & ATPX_POWER_CONTROL_SUPPORTED: 2
[    6.227140] !!(mask & ATPX_POWER_CONTROL_SUPPORTED): 1
[    6.228198] rdev->flags: 39911477
[    6.229274] rdev->flags & RADEON_IS_PX: 33554432
[    6.230181] radeon_has_atpx_dgpu_power_cntl(): 1
[    6.231039] runpm is enabled. Yay
[    6.231889] vga_switcheroo: enabled
[    6.231956] ATPX version 1, functions 0x00000033
[    6.231957] ATPX mask: 51
[    6.232267] usbcore: registered new interface driver btusb
[    6.232932] ATPX power control supported: 2
[    6.233849] mask & ATPX_POWER_CONTROL_SUPPORTED: 2
[    6.234717] !!(mask & ATPX_POWER_CONTROL_SUPPORTED): 1

YES! IT DOES!

Damn!
Thank you so much for asking a follow up question so I can stop telling people that the BIOS on several laptops is broken and looking like an idiot.

On the other side: It probably would have been a good idea to verify that there IS hardware where it actually works as intended before declaring the BIOS to be at fault.
Comment 11 Peter Wu 2016-07-16 13:39:29 UTC
Can you put a dump_stack() in the atpx_init function and radeon_has_atpx_dgpu_power_cntl functions such that we can see how it is called?
Comment 12 Christoph Haag 2016-07-16 13:55:18 UTC
First is the one radeon_atpx_init(); call that I added manually to radeon_device_init in order to make it work:
[    6.083965] CPU: 5 PID: 295 Comm: systemd-udevd Not tainted 4.6.0-mainline #1
[    6.083966] Hardware name: CLEVO                             P170EM/P170EM, BIOS 4.6.5 08/22/2012
[    6.083968]  0000000000000286 00000000ab74fc8d ffff88007f8e7960 ffffffff8131c6e8
[    6.083970]  ffff88080a2c4000 ffff88080a2c5b80 ffff88007f8e79a0 ffffffffa093dc15
[    6.083972]  ffffffff81455aba ffff88080a2c4000 00000000ab74fc8d ffff88080a2c4000
[    6.083974] Call Trace:
[    6.083978]  [<ffffffff8131c6e8>] dump_stack+0x76/0x9e
[    6.084014]  [<ffffffffa093dc15>] radeon_atpx_init+0x35/0x240 [radeon]
[    6.084017]  [<ffffffff81455aba>] ? vga_client_register+0x8a/0xa0
[    6.084039]  [<ffffffffa08451a5>] radeon_device_init+0x875/0xcd0 [radeon]
[    6.084060]  [<ffffffffa0847900>] radeon_driver_load_kms+0xb0/0x230 [radeon]
[    6.084066]  [<ffffffffa000a34a>] drm_dev_register+0xba/0xd0 [drm]
[    6.084070]  [<ffffffffa000c9b1>] drm_get_pci_dev+0xe1/0x1f0 [drm]
[    6.084090]  [<ffffffffa08434c3>] radeon_pci_probe+0xc3/0xe0 [radeon]
[    6.084092]  [<ffffffff81366d74>] local_pci_probe+0x54/0xb0
[    6.084094]  [<ffffffff81366ca3>] ? pci_match_device+0xf3/0x120
[    6.084095]  [<ffffffff813680e8>] pci_device_probe+0x118/0x170
[    6.084098]  [<ffffffff814613ef>] driver_probe_device+0x23f/0x450
[    6.084100]  [<ffffffff814616e0>] __driver_attach+0xe0/0x100
[    6.084102]  [<ffffffff81461600>] ? driver_probe_device+0x450/0x450
[    6.084103]  [<ffffffff8145ea1b>] bus_for_each_dev+0x7b/0xc0
[    6.084105]  [<ffffffff814609b1>] driver_attach+0x31/0x40
[    6.084107]  [<ffffffff81460366>] bus_add_driver+0x1d6/0x2a0
[    6.084109]  [<ffffffff814621f3>] driver_register+0x73/0xf0
[    6.084110]  [<ffffffff8136643f>] __pci_register_driver+0x5f/0x70
[    6.084115]  [<ffffffffa000cbaf>] drm_pci_init+0xef/0x120 [drm]
[    6.084117]  [<ffffffff8145884d>] ? vga_switcheroo_register_handler+0x7d/0xa0
[    6.084119]  [<ffffffffa09b5000>] ? 0xffffffffa09b5000
[    6.084140]  [<ffffffffa09b50ab>] radeon_init+0xab/0xc0 [radeon]
[    6.084142]  [<ffffffff81002152>] do_one_initcall+0xc2/0x200
[    6.084144]  [<ffffffff811d6e08>] ? __vunmap+0x98/0xe0
[    6.084146]  [<ffffffff811d6ee1>] ? vfree+0x41/0x80
[    6.084149]  [<ffffffff8118a881>] do_init_module+0x72/0x1f1
[    6.084151]  [<ffffffff81119cf2>] load_module+0x2192/0x29c0
[    6.084153]  [<ffffffff81116c50>] ? symbol_put_addr+0x60/0x60
[    6.084155]  [<ffffffff811c2c8b>] ? __pte_alloc_kernel+0xbb/0x110
[    6.084158]  [<ffffffff8111a682>] SyS_init_module+0x162/0x1a0
[    6.084160]  [<ffffffff81612ab2>] entry_SYSCALL_64_fastpath+0x1a/0xa4

Next there's a call that I did in printk() which I removed here because it's the same as the next:

Next is radeon_device_init calling radeon_has_atpx_dgpu_power_cntl() and the result would be used to disable runpm without the atpx init before:
[    6.094760] CPU: 1 PID: 295 Comm: systemd-udevd Not tainted 4.6.0-mainline #1
[    6.094762] Hardware name: CLEVO                             P170EM/P170EM, BIOS 4.6.5 08/22/2012
[    6.094763]  0000000000000286 00000000ab74fc8d ffff88007f8e7990 ffffffff8131c6e8
[    6.094766]  ffff88080a2c4000 ffff88080a2c5b80 ffff88007f8e79a0 ffffffffa093de71
[    6.094767]  ffff88007f8e79e0 ffffffffa0845349 ffff880000000000 ffff880809dc5000
[    6.094770] Call Trace:
[    6.094774]  [<ffffffff8131c6e8>] dump_stack+0x76/0x9e
[    6.094808]  [<ffffffffa093de71>] radeon_has_atpx_dgpu_power_cntl+0x21/0x30 [radeon]
[    6.094829]  [<ffffffffa0845349>] radeon_device_init+0xa19/0xcd0 [radeon]
[    6.094849]  [<ffffffffa0847900>] radeon_driver_load_kms+0xb0/0x230 [radeon]
[    6.094855]  [<ffffffffa000a34a>] drm_dev_register+0xba/0xd0 [drm]
[    6.094860]  [<ffffffffa000c9b1>] drm_get_pci_dev+0xe1/0x1f0 [drm]
[    6.094880]  [<ffffffffa08434c3>] radeon_pci_probe+0xc3/0xe0 [radeon]
[    6.094882]  [<ffffffff81366d74>] local_pci_probe+0x54/0xb0
[    6.094884]  [<ffffffff81366ca3>] ? pci_match_device+0xf3/0x120
[    6.094885]  [<ffffffff813680e8>] pci_device_probe+0x118/0x170
[    6.094888]  [<ffffffff814613ef>] driver_probe_device+0x23f/0x450
[    6.094890]  [<ffffffff814616e0>] __driver_attach+0xe0/0x100
[    6.094892]  [<ffffffff81461600>] ? driver_probe_device+0x450/0x450
[    6.094893]  [<ffffffff8145ea1b>] bus_for_each_dev+0x7b/0xc0
[    6.094895]  [<ffffffff814609b1>] driver_attach+0x31/0x40
[    6.094897]  [<ffffffff81460366>] bus_add_driver+0x1d6/0x2a0
[    6.094899]  [<ffffffff814621f3>] driver_register+0x73/0xf0
[    6.094900]  [<ffffffff8136643f>] __pci_register_driver+0x5f/0x70
[    6.094905]  [<ffffffffa000cbaf>] drm_pci_init+0xef/0x120 [drm]
[    6.094907]  [<ffffffff8145884d>] ? vga_switcheroo_register_handler+0x7d/0xa0
[    6.094909]  [<ffffffffa09b5000>] ? 0xffffffffa09b5000
[    6.094929]  [<ffffffffa09b50ab>] radeon_init+0xab/0xc0 [radeon]
[    6.094931]  [<ffffffff81002152>] do_one_initcall+0xc2/0x200
[    6.094934]  [<ffffffff811d6e08>] ? __vunmap+0x98/0xe0
[    6.094936]  [<ffffffff811d6ee1>] ? vfree+0x41/0x80
[    6.094938]  [<ffffffff8118a881>] do_init_module+0x72/0x1f1
[    6.094941]  [<ffffffff81119cf2>] load_module+0x2192/0x29c0
[    6.094943]  [<ffffffff81116c50>] ? symbol_put_addr+0x60/0x60
[    6.094944]  [<ffffffff811c2c8b>] ? __pte_alloc_kernel+0xbb/0x110
[    6.094947]  [<ffffffff8111a682>] SyS_init_module+0x162/0x1a0
[    6.094951]  [<ffffffff81612ab2>] entry_SYSCALL_64_fastpath+0x1a/0xa4

And finally here is where the kernel inits the atpx stuff by default:
[    6.096398] CPU: 1 PID: 295 Comm: systemd-udevd Not tainted 4.6.0-mainline #1
[    6.096400] Hardware name: CLEVO                             P170EM/P170EM, BIOS 4.6.5 08/22/2012
[    6.096401]  0000000000000286 00000000ab74fc8d ffff88007f8e78e0 ffffffff8131c6e8
[    6.096406]  ffff880809001700 0000000000000001 ffff88007f8e7920 ffffffffa093dc15
[    6.096408]  ffff88007f8e7900 00000000ab74fc8d 00000000ab74fc8d ffff880809001700
[    6.096411] Call Trace:
[    6.096416]  [<ffffffff8131c6e8>] dump_stack+0x76/0x9e
[    6.096461]  [<ffffffffa093dc15>] radeon_atpx_init+0x35/0x240 [radeon]
[    6.096466]  [<ffffffff814ea83c>] vga_switcheroo_enable+0x2e/0x11e
[    6.096470]  [<ffffffff8145896d>] register_client+0xfd/0x110
[    6.096473]  [<ffffffff814589fb>] vga_switcheroo_register_client+0x4b/0x60
[    6.096511]  [<ffffffffa0845372>] radeon_device_init+0xa42/0xcd0 [radeon]
[    6.096549]  [<ffffffffa0847900>] radeon_driver_load_kms+0xb0/0x230 [radeon]
[    6.096558]  [<ffffffffa000a34a>] drm_dev_register+0xba/0xd0 [drm]
[    6.096566]  [<ffffffffa000c9b1>] drm_get_pci_dev+0xe1/0x1f0 [drm]
[    6.096603]  [<ffffffffa08434c3>] radeon_pci_probe+0xc3/0xe0 [radeon]
[    6.096606]  [<ffffffff81366d74>] local_pci_probe+0x54/0xb0
[    6.096609]  [<ffffffff81366ca3>] ? pci_match_device+0xf3/0x120
[    6.096612]  [<ffffffff813680e8>] pci_device_probe+0x118/0x170
[    6.096615]  [<ffffffff814613ef>] driver_probe_device+0x23f/0x450
[    6.096618]  [<ffffffff814616e0>] __driver_attach+0xe0/0x100
[    6.096621]  [<ffffffff81461600>] ? driver_probe_device+0x450/0x450
[    6.096624]  [<ffffffff8145ea1b>] bus_for_each_dev+0x7b/0xc0
[    6.096627]  [<ffffffff814609b1>] driver_attach+0x31/0x40
[    6.096630]  [<ffffffff81460366>] bus_add_driver+0x1d6/0x2a0
[    6.096633]  [<ffffffff814621f3>] driver_register+0x73/0xf0
[    6.096636]  [<ffffffff8136643f>] __pci_register_driver+0x5f/0x70
[    6.096644]  [<ffffffffa000cbaf>] drm_pci_init+0xef/0x120 [drm]
[    6.096647]  [<ffffffff8145884d>] ? vga_switcheroo_register_handler+0x7d/0xa0
[    6.096650]  [<ffffffffa09b5000>] ? 0xffffffffa09b5000
[    6.096686]  [<ffffffffa09b50ab>] radeon_init+0xab/0xc0 [radeon]
[    6.096689]  [<ffffffff81002152>] do_one_initcall+0xc2/0x200
[    6.096693]  [<ffffffff811d6e08>] ? __vunmap+0x98/0xe0
[    6.096696]  [<ffffffff811d6ee1>] ? vfree+0x41/0x80
[    6.096700]  [<ffffffff8118a881>] do_init_module+0x72/0x1f1
[    6.096703]  [<ffffffff81119cf2>] load_module+0x2192/0x29c0
[    6.096706]  [<ffffffff81116c50>] ? symbol_put_addr+0x60/0x60
[    6.096708]  [<ffffffff811c2c8b>] ? __pte_alloc_kernel+0xbb/0x110
[    6.096713]  [<ffffffff8111a682>] SyS_init_module+0x162/0x1a0
[    6.096719]  [<ffffffff81612ab2>] entry_SYSCALL_64_fastpath+0x1a/0xa4
Comment 13 Christoph Haag 2016-07-16 13:59:57 UTC
This seems like a circular dependency.

	if ((rdev->flags & RADEON_IS_PX) && radeon_has_atpx_dgpu_power_cntl())
		runtime = true;

	vga_switcheroo_register_client(rdev->pdev, &radeon_switcheroo_ops, runtime);

vga_switcheroo_register_client needs to know if runpm is enabled, but the kernel only knows whether to enable runpm after vga_switcheroo_register_client has caused radeon_atpx_init to be executed.
Comment 14 Peter Wu 2016-07-16 14:17:27 UTC
Ah... vga_switcheroo_ready() actually checks for clients.. makes sense.

As documented[1], the init handler is called when there is a dependence on vga[_switcheroo] clients. Radeon however does not seem to need this, so the initialization of functions is better done elsewhere. radeon_atpx_detect sounds like an appropriate location.

 [1]: https://www.kernel.org/doc/htmldocs/gpu/API-struct-vga-switcheroo-handler.html
Comment 15 Peter Wu 2016-08-24 14:07:19 UTC
Bug can be closed, a fix is in upstream (v4.8-rc1)

commit 69ee9742f945cda8bd0081961770cd2e3192a77a
Author: Alex Deucher <alexander.deucher@amd.com>
Date:   Wed Jul 27 14:56:12 2016 -0400

    drm/radeon: init atpx at switcheroo register time v2

Chris, can you validate this?

Note You need to log in before you can comment on or make changes to this bug.