Bug 217947

Summary: WARNING at drivers/acpi/platform_profile.c:74 in platform_profile_show()
Product: Drivers Reporter: Jiri Slaby (jirislaby)
Component: Platform_x86Assignee: drivers_platform_x86 (drivers_platform_x86)
Status: NEW ---    
Severity: normal CC: mpearson-lenovo
Priority: P3    
Hardware: All   
OS: Linux   
See Also: https://bugzilla.suse.com/show_bug.cgi?id=1215602
Kernel Version: Subsystem:
Regression: No Bisected commit-id:
Attachments: DSDT

Description Jiri Slaby 2023-09-25 09:12:13 UTC
Downstream report:
https://bugzilla.suse.com/show_bug.cgi?id=1215602

LKML link:
https://lore.kernel.org/all/047d3c51-0a9e-4c3e-beef-625a7aa4f3c3@kernel.org/

according to logs, since 6.3 (up to 6.5.4 now), I repeatedly see:
> WARNING: CPU: 14 PID: 962 at drivers/acpi/platform_profile.c:74
> platform_profile_show+0xb1/0x100 [platform_profile]
> Modules linked in: ccm michael_mic ...
> CPU: 14 PID: 962 Comm: power-profiles- Kdump: loaded Not tainted
> 6.5.4-6-default #1 openSUSE Tumbleweed (unreleased)
> dd37106c593be78644bb80e3c1534d801bf4cb36
> Hardware name: LENOVO 21CRS0K83K/21CRS0K83K, BIOS R22ET60W (1.30 ) 02/09/2023
> RIP: 0010:platform_profile_show+0xb1/0x100 [platform_profile]
> Code: d0 a8 ...
> RSP: 0018:ffff9c1ac0b97db0 EFLAGS: 00010296
> RAX: 0000000000000000 RBX: 0000000000000000 RCX: 000000008fc35be0
> RDX: 0000000000000000 RSI: ffff9c1ac0b97db4 RDI: ffffffffc0a8b0a0
> RBP: ffff8955ca540000 R08: ffff895b9f1ed180 R09: ffff895559ea1bc0
> R10: 00000000031a400e R11: 000000000003f680 R12: ffff895b9f1ed180
> R13: ffff9c1ac0b97e50 R14: 0000000000000001 R15: ffff9c1ac0b97ee8
> FS:  00007f71b0e71900(0000) GS:ffff895b9f100000(0000) knlGS:0000000000000000
> CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 00007fe402ea3400 CR3: 000000012004c000 CR4: 0000000000750ee0
> PKRU: 55555554
> Call Trace:
>  <TASK>
>  sysfs_kf_seq_show+0xab/0x100
>  seq_read_iter+0x123/0x480
>  vfs_read+0x1b8/0x300

It's:
WARN_ON((profile < 0) || (profile >= ARRAY_SIZE(profile_names))))

So I put there one more print:
dev_warn(dev, "profile=%d profile_get=%ps\n",
         profile, cur_profile->profile_get);

and I see:
: profile=-1883022368 profile_get=dytc_profile_get [thinkpad_acpi]
: profile=-1510173440 profile_get=dytc_profile_get [thinkpad_acpi]
: profile=-1510173440 profile_get=dytc_profile_get [thinkpad_acpi]
: profile=-966231712 profile_get=dytc_profile_get [thinkpad_acpi]
: profile=-1578420592 profile_get=dytc_profile_get [thinkpad_acpi]
: profile=-1578420592 profile_get=dytc_profile_get [thinkpad_acpi]
: profile=-1578420592 profile_get=dytc_profile_get [thinkpad_acpi]

I wonder about dev passed to dytc_profile_get() having empty name (nothing before colon above)? Is that expected?

Ah, convert_dytc_to_profile()'s retval is not checked in dytc_profile_refresh(). Adding:
--- a/drivers/platform/x86/thinkpad_acpi.c
+++ b/drivers/platform/x86/thinkpad_acpi.c
@@ -10418,7 +10418,14 @@ static void dytc_profile_refresh(void)
                return;

        perfmode = (output >> DYTC_GET_MODE_BIT) & 0xF;
-       convert_dytc_to_profile(funcmode, perfmode, &profile);
+       err = convert_dytc_to_profile(funcmode, perfmode, &profile);
+       if (err) {
+               pr_warn("%s: mmc=%u psc=%u mmc_get=%u funcmode=%d output=0x%x perfmode=%d\n",
+                          __func__, !!(dytc_capabilities & BIT(DYTC_FC_MMC)),
+                          !!(dytc_capabilities & BIT(DYTC_FC_PSC)),
+                          dytc_mmc_get_available, funcmode, output, perfmode);
+               return;
+       }
        if (profile != dytc_current_profile) {
                dytc_current_profile = profile;
                platform_profile_notify();

fixes the warning, of course.

The output is:
dytc_profile_refresh: mmc=0 psc=1 mmc_get=0 funcmode=0 output=0x1f001 perfmode=15

So using psc mode, it retrieves DYTC (0x1f001) as follows:
0x1.... -> VSTD=1 -> STD mode
0x.f... -> CICM=0xf, DYTC_GET_MODE_BIT(12) -> dunno what it is in STD
0x..0.. -> CICF=0x0, DYTC_GET_FUNCTION_BIT(8) -> DYTC_FUNCTION_STD
0x....1 -> GOOD

But convert_dytc_to_profile() doesn't handle this at all. Do I have a newer DYTC interface? Or a broken one?
                Case (0x00)
                {
                    Local1 = 0x0100
                    Local1 |= 0x80000000
                    Local1 |= 0x00
                    Local1 |= 0x01
                }

I.e. version 8.0, it seems.

My DYTC for CMD_GET looks like:
   Case (0x02)
   {
       Local5 = VSTD /* \VSTD */
       Local5 |= (VCQL << 0x01)
       Local5 |= (VSTP << 0x04)
       Local5 |= (VADM << 0x07)
       Local5 |= (VTMS << 0x09)
       Local5 |= (VDLS << 0x0A)
       Local5 |= (VMSC << 0x0C)
       Local5 |= (VPSC << 0x0D)
       Local1 = (CICF << 0x08)
       If ((CICF == 0x03))
       {
           CICM = SMYH /* \SMYH */
       }
       ElseIf ((CICF == 0x0B))
       {
           CICM = SMMC /* \SMMC */
       }
       ElseIf ((CICF == 0x0D))
       {
           CICM = SPSC /* \SPSC */
       }
       ElseIf ((CICF == 0x0F))
       {
           CICM = SAMT /* \SAMT */
       }
       ElseIf ((CICF == 0x07))
       {
           CICM = SADM /* \SADM */
       }
       Else
       {
           CICM = 0x0F
       }

       Local1 |= (CICM << 0x0C)
       Local1 |= (Local5 << 0x10)
       Local1 |= 0x01
   }
Comment 1 Jiri Slaby 2023-09-25 09:12:31 UTC
Created attachment 305147 [details]
DSDT
Comment 2 Mark Pearson 2023-09-25 12:54:13 UTC
Hi,

Can I confirm this is on a T14s G3 AMD platform?

Thanks for the detailed report - very helpful and very much appreciated. I'll need to reproduce this but I don't have one of these platforms myself so will need to ask a colleague for help.

It looks like this line is causing problems on this platform:
funcmode = (output >> DYTC_GET_FUNCTION_BIT) & 0xF;

Normally that should return PSC mode (in this case - unless you have AMT mode, which you don't on that gen platform). 
But it's saying it is in standard mode instead....which is peculiar. The FW is doing something non-standard to other platforms.

Do you still get the warning when:
 - you're in performance or low-power modes?
 - you're in lap mode?

As a quick test could you try adding:

 /* Check if we are PSC mode, or have AMT enabled */
 funcmode = (output >> DYTC_GET_FUNCTION_BIT) & 0xF;
+if (funcmode == DYTC_FUNCTION_STD)
+  funcmode = DYTC_FUNCTION_PSC;

To see if it behaves correctly please?

I'll have to check with the FW team and find out why this change in behaviour...I suspect I need to do a kernel patch to better handle this AMT case though.

Mark
Comment 3 Jiri Slaby 2023-09-26 06:57:50 UTC
(In reply to Mark Pearson from comment #2)
> Can I confirm this is on a T14s G3 AMD platform?

Yes.

> It looks like this line is causing problems on this platform:
> funcmode = (output >> DYTC_GET_FUNCTION_BIT) & 0xF;
> 
> Normally that should return PSC mode (in this case - unless you have AMT
> mode, which you don't on that gen platform). 
> But it's saying it is in standard mode instead....which is peculiar. The FW
> is doing something non-standard to other platforms.

Interestingly, it happens only initially and during suspend. Other than that, it's in PSC mode likely.

> Do you still get the warning when:
>  - you're in performance or low-power modes?

I am on always in low-power mode. This notebook is crap, it's noisy and very hot in balanced or performance modes even when idling.

>  - you're in lap mode?

What is that?

> As a quick test could you try adding:
> 
>  /* Check if we are PSC mode, or have AMT enabled */
>  funcmode = (output >> DYTC_GET_FUNCTION_BIT) & 0xF;
> +if (funcmode == DYTC_FUNCTION_STD)
> +  funcmode = DYTC_FUNCTION_PSC;
> 
> To see if it behaves correctly please?

Will try.
Comment 4 Jiri Slaby 2023-09-26 10:32:09 UTC
(In reply to Jiri Slaby from comment #3)
> > As a quick test could you try adding:
> > 
> >  /* Check if we are PSC mode, or have AMT enabled */
> >  funcmode = (output >> DYTC_GET_FUNCTION_BIT) & 0xF;
> > +if (funcmode == DYTC_FUNCTION_STD)
> > +  funcmode = DYTC_FUNCTION_PSC;
> > 
> > To see if it behaves correctly please?
> 
> Will try.

It only changes the error to EINVAL. 0xf is not handled as a correct state in PSC.

BTW the first (bad) log:
thinkpad_acpi: dytc_profile_refresh: err=-22 mmc=0 psc=1 mmc_get=0 funcmode=13 output=0x1f001 perfmode=15

subsequent refreshes (good):
thinkpad_acpi: dytc_profile_refresh: err=0 mmc=0 psc=1 mmc_get=0 funcmode=13 output=0x20013d01 perfmode=3
Comment 5 Mark Pearson 2023-09-27 00:06:00 UTC
Ah yeah - need to add:
case DYTC_MODE_MMC_BALANCE:
underneath 
case DYTC_MODE_PSC_BALANCE:
in function convert_dytc_to_profile
(same as is done for the MMC block above)

I think that will cure that error.

Still tracking down HW here :(
Mark
Comment 6 Jiri Slaby 2023-09-27 06:34:04 UTC
(In reply to Mark Pearson from comment #5)
> Ah yeah - need to add:
> case DYTC_MODE_MMC_BALANCE:
> underneath 
> case DYTC_MODE_PSC_BALANCE:
> in function convert_dytc_to_profile
> (same as is done for the MMC block above)
> 
> I think that will cure that error.

Of course:
dytc_profile_refresh: err=0 mmc=0 psc=1 mmc_get=0 funcmode=13 output=0x1f001 perfmode=15
Comment 7 Mark Pearson 2023-09-27 17:09:26 UTC
OK - so (just to be sure) with those changes it is working correctly?
Mark
Comment 8 Jiri Slaby 2023-09-28 06:10:40 UTC
(In reply to Mark Pearson from comment #7)
> OK - so (just to be sure) with those changes it is working correctly?

It appears so.
Comment 9 Jiri Slaby 2023-10-26 08:59:01 UTC
Gentle ping: any updates on this?
Comment 10 Mark Pearson 2023-10-26 17:21:41 UTC
Thanks for the ping - I had completely forgotten about this one.

I'll check on the status. My notes (internal ticket LO-2242) say we're waiting on the FW team for a fix, they did do one version but it didn't work. I've no idea why it's taking so long.

Mark