Bug 200989

Summary: ACPI / LPSS Braswell Lenovo 11e won't boot as of LPSS commit in 4.17.12 (with patch)
Product: ACPI Reporter: William Lieurance (william.lieurance)
Component: Power-OtherAssignee: Zhang Rui (rui.zhang)
Status: CLOSED CODE_FIX    
Severity: normal CC: jwrdegoede, nicola-smart, rui.zhang, youling257
Priority: P1    
Hardware: Intel   
OS: Linux   
Kernel Version: 4.17.12 Subsystem:
Regression: Yes Bisected commit-id:
Attachments: [PATCH] ACPI / LPSS: Ensure LPIOEP is always set on resume
patch to force lpss quirk during boot
debug patch
acpidump from a Braswell Lenovo 11e
ls -l /sys/bus/i2c/devices from a Braswell Lenovo 11e

Description William Lieurance 2018-09-01 07:45:22 UTC
Created attachment 278225 [details]
[PATCH] ACPI / LPSS: Ensure LPIOEP is always set on resume

Hi there.

I've got a 3rd generation Lenovo 11e that locks up on boot. The processor inside is an Intel N3150, which is family 6, model 0x4c, which is one that turns on lpss_quirks. I was able to bisect the problem commit to the merge of 12864ff8545f6b8144fdf1bb89b5663357f29ec4 which skipped over some IOSF updates when exiting D3 that wasn't matched by a call to the handler for entering D3.  

What I've been able to see is that on boot, the acpi_lpss_resume() function gets called, but the acpi_lpss_suspend() hasn't.  Because of this, the IOSF write to LPSS_IOSF_UNIT_LPIOEP never happens, and for some reason that freezes the computer.  I've got a minimal patch that allows my system to boot normally by ensuring that write happens whether the matching acpi_lpss_suspend() call takes place or not.

I don't have any other relevant systems to test this patch on so I can't say whether it introduces any regressions, nor do I have a very good handle on why this freezes the boot in the first place.  I'm happy to discuss in whatever way I can.
Comment 1 William Lieurance 2018-09-01 07:52:15 UTC
Comment on attachment 278225 [details]
[PATCH] ACPI / LPSS: Ensure LPIOEP is always set on resume

For clarity, this patch is against 4.19-rc1, where this problem is still happening.
Comment 2 Nicola Nicolov 2018-09-01 17:32:54 UTC
Why bother even reporting this bug with the patch instead of just committing to git? (Just to be clear - not any sort of critique, I'm just wondering)
Comment 3 William Lieurance 2018-09-01 18:00:10 UTC
No good reason.  I'm new to kernel development and wasn't sure what the best way was to start discussion of my issue.  I'll send this patch to the relevant dev lists as well.
Comment 4 Nicola Nicolov 2018-09-01 18:11:31 UTC
I'm pretty new, too, so I just came here to have a look at some bugs and think about solutions. In your case, you pretty much figured it out - congrats. Someone will probably commit that for you (I won't cause I don't want to be that guy stealing your patch authorship). Things can be faster though if you just push it. (Again, not a critique, trying to help as I can).
Comment 5 Zhang Rui 2018-09-03 02:40:17 UTC
Created attachment 278255 [details]
patch to force lpss quirk during boot

please confirm if the patch attached helps or not.
Comment 6 William Lieurance 2018-09-03 05:23:17 UTC
Yes, that patch also works to let the computer boot. :-)
Comment 7 Hans de Goede 2018-09-08 11:28:50 UTC
Hi William,

I'm afraid that Zhang Rui's patch breaks the usage of S0ix usage during suspend / resume on many Bay Trail / Cherry Trail devices, making them drain their battery quite fast while suspended. This has been broken for a long time, but the change which causes your issues fixed it, and Zhang Rui's patch breaks it again.

I'm not saying we should revert Zhang Rui's patch. There is another way to fix it, which I will start working on, but I would like to make sure that Zhang Rui's patch is really necessary and that you are not being bitten by the issue fixed by:

https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/drivers/i2c/busses/i2c-designware-platdrv.c?id=9d9a152ebaa86a9dede4624919566483c955d0a7

Can you run:

acpidump -o acpidump.lenovo-yoga11e-braswell

And then attach the generated acpidump.lenovo-yoga11e-braswell file here?

Please also provide the output of:

ls /sys/bus/i2c/devices

And perhaps you can try a kernel without Zhang Rui's )or your own) patch and with the patch I linked above added? I doubt this will work in your case, but it would be good to try it.

Regards,

Hans
Comment 8 youling257 2018-09-10 03:45:34 UTC
4.19 rc3 can't reach s0i3,Dev: 0  - LPSS1_F0_DMA     State: Enabled  [D0]

must Revert "ACPI / LPSS: Force LPSS quirks on boot" ! This reverts commit f11fc4bc669b8622510c1039499f5a9d24248fec.
Comment 9 Zhang Rui 2018-09-10 06:11:24 UTC
Created attachment 278399 [details]
debug patch

please check if this patch helps
Comment 10 Zhang Rui 2018-09-10 06:14:59 UTC
@youling257@gmail.com
please check if the patch in comment #9 helps or not (apply on top of clean 4.19-rc3).

and please check if the patches at
https://patchwork.kernel.org/patch/10593141/
and
https://patchwork.kernel.org/patch/10593137/
helps or not (apply on top of clean 4.19-rc3).
Comment 11 youling257 2018-09-10 07:12:09 UTC
(In reply to Zhang Rui from comment #10)
> @youling257@gmail.com
> please check if the patch in comment #9 helps or not (apply on top of clean
> 4.19-rc3).
> 
> and please check if the patches at
> https://patchwork.kernel.org/patch/10593141/
> and
> https://patchwork.kernel.org/patch/10593137/
> helps or not (apply on top of clean 4.19-rc3).

yes,ACPI / LPSS: Exclude I2C busses shared with PUNIT from pmc_atom_d3_mask is useful,can reach s0i3,but LPSS1_F0_DMA still D0,
i also test ACPI / LPSS: Remove overriding of powerstate for LPSS DMA device,can reach s0i3,and LPSS1_F0_DMA D3.
Comment 12 Zhang Rui 2018-09-10 14:04:25 UTC
Hi, Hans,

Thanks for catching this, but I'm still not quite clear about the full story.

The key point to reach S0ix on these platforms is to make DMA controller
a) powered on (by iosf quirks) during boot, and then keeps on during suspend/resume cycles (by invoking dev_pm_syscore_device())
or
b) leave it in default power state, but force resuming (iosf quirks) during early resume stage

Previously, we are using approach a), but it is broken by commit  12864ff8545f6b8144fdf1bb89b5663357f29ec4 (ACPI / LPSS: Avoid PM quirks on suspend and resume from hibernation) because it bypasses the iosf quirks during boot, and leaves the DMA controller in default state.
Thus commit 9d9a152ebaa86a9dede4624919566483c955d0a7 (i2c: designware: Re-init controllers with pm_disabled set on resume) is introduced to power on the DMA controller during resume to follow approach b).

Then the patch in comment #6 powers the DMA controllers on again, but it is not suspended properly during suspend (because of the suspend changes made by commit commit 9d9a152ebaa86a9dede4624919566483c955d0a7 (i2c: designware: Re-init controllers with pm_disabled set on resume)), and this breaks S0ix, right?
Comment 13 Zhang Rui 2018-09-10 14:14:39 UTC
(In reply to youling257 from comment #11)
> (In reply to Zhang Rui from comment #10)
> > @youling257@gmail.com
> > please check if the patch in comment #9 helps or not (apply on top of clean
> > 4.19-rc3).

at least the patch in comment #5 is broken, so it would be nice if you can give this patch a try.

> > 
> > and please check if the patches at
> > https://patchwork.kernel.org/patch/10593141/
> > and
> > https://patchwork.kernel.org/patch/10593137/
> > helps or not (apply on top of clean 4.19-rc3).
> 
> yes,ACPI / LPSS: Exclude I2C busses shared with PUNIT from pmc_atom_d3_mask
> is useful,can reach s0i3,but LPSS1_F0_DMA still D0,
> i also test ACPI / LPSS: Remove overriding of powerstate for LPSS DMA
> device,can reach s0i3,and LPSS1_F0_DMA D3.

where do you found this patch?

Hans, I see there are a couple of related patches, and I'm confused which is your preferred fix, could you please point it out?
Comment 14 Hans de Goede 2018-09-10 14:39:48 UTC
(In reply to Zhang Rui from comment #12)
> Hi, Hans,
> 
> Thanks for catching this, but I'm still not quite clear about the full story.
> 
> The key point to reach S0ix on these platforms is to make DMA controller
> a) powered on (by iosf quirks) during boot,

Correct.

> and then keeps on during
> suspend/resume cycles (by invoking dev_pm_syscore_device())

No, lpss_iosf_enter_d3_state() actually is meant to turn them off during
suspend, as they need to be OFF / in D3 to reach S0ix, the
dev_pm_syscore_device() stuff is related to one of the LPSS i2c controllers being connected to the PMIC and as such needing special treatment and is
only somewhat related to all this.

> b) leave it in default power state, but force resuming (iosf quirks) during
> early resume stage

Huh, this is the same as a), the code sofar has been force-resuming them indeed, which means that they are no left in  their default power-state, but rather forces them to be on / in D0.

The default power state for the DMA controllers is to be automatically put in
D3 by the hardware when not needed.

The lpss_iosf_exit_d3_state() code from drivers/acpi/acpi_lpss.c however disables the automatic hardware power-management and forces them on.

On suspend lpss_iosf_enter_d3_state() should put them back in D3 on suspend (and sometimes also during runtime pm), but it only does this if certain conditions become true and on many systems these conditions never become true.

Starting with commit 12864ff8545f6b8144fdf1bb89b5663357f29ec4 (ACPI / LPSS: Avoid PM quirks on suspend and resume from hibernation) lpss_iosf_exit_d3_state() become a no-op unless the conditions for enter_d3_state() became true at least once which never happens on many systems.

So this (for the first time in a long time) actually left the DMA controllers in their default "auto-managed" power-state.

This no longer forcing the DMA controllers to D0 all the time allowed these systems to reach S0ix.

> Previously, we are using approach a), but it is broken by commit 
> 12864ff8545f6b8144fdf1bb89b5663357f29ec4 (ACPI / LPSS: Avoid PM quirks on
> suspend and resume from hibernation) because it bypasses the iosf quirks
> during boot, and leaves the DMA controller in default state.

Correct.

> Thus commit 9d9a152ebaa86a9dede4624919566483c955d0a7 (i2c: designware:
> Re-init controllers with pm_disabled set on resume) is introduced to power
> on the DMA controller during resume to follow approach b).

No that does not touch the DMA controller at all, it is for the I2C controller, now that we properly reach S0ix the I2C controller needs to be re-initialized after a suspend/resume since it looses state.

> Then the patch in comment #6 powers the DMA controllers on again

It forces them to be ON / in D0 all the time again.

> but it is
> not suspended properly during suspend (because of the suspend changes made
> by commit commit 9d9a152ebaa86a9dede4624919566483c955d0a7 (i2c: designware:
> Re-init controllers with pm_disabled set on resume)), and this breaks S0ix,
> right?

No commit 9d9a152ebaa86a9dede4624919566483c955d0a7 (i2c: designware:
Re-init controllers with pm_disabled set on resume) is necessary for things to not break when the system reaches S0ix but it has nothing to do with whether the system reaches S0ix states or not.

> Hans, I see there are a couple of related patches, and I'm confused which is
> your preferred fix, could you please point it out?

I've 2 patches on the list related to this:

"ACPI / LPSS: Remove overriding of powerstate for LPSS DMA device"

This completely removes the lpss_iosf_exit_d3_state() / lpss_iosf_enter_d3_state() workaround, leaving the DMA controllers in their default auto-managed power-state.

This removes a lot of ugly code and also leads to the lowest power consumption so this is my prefered solution, but this bug suggests that the lpss_iosf_exit_d3_state() forcing of D0 is necessary on at least the Braswell Lenovo 11e since otherwise it will not boot.

So I wrote an other patch which makes the conditions for lpss_iosf_enter_d3_state() become true on devices where this sofar did not happen so that we can reach S0ix there:
"ACPI / LPSS: Exclude I2C busses shared with PUNIT from pmc_atom_d3_mask"

I hope we can find another fix for the Braswell Lenovo 11e not booting, but otherwise we will have to go with this fix (or something similar) rather then just remove the whole DMA controller workaround.

To be clear I prefer the option where we simply stop messing with the DMA controllers all together.
Comment 15 William Lieurance 2018-09-11 05:20:55 UTC
Created attachment 278445 [details]
acpidump from a Braswell Lenovo 11e
Comment 16 William Lieurance 2018-09-11 05:22:03 UTC
Created attachment 278447 [details]
ls -l /sys/bus/i2c/devices from a Braswell Lenovo 11e
Comment 17 William Lieurance 2018-09-11 05:30:38 UTC
Hi Hans,

I've attached the files you requested in comment #7.  I can also verify that the issue still happens (system freezes during boot) with the i2c changeset you recommended but without either my or Zhang Rui's patch.  I'm happy to keep testing as y'all see fit.
Comment 18 Hans de Goede 2018-09-11 06:38:24 UTC
(In reply to William Lieurance from comment #17)
> Hi Hans,
> 
> I've attached the files you requested in comment #7.  I can also verify that
> the issue still happens (system freezes during boot) with the i2c changeset
> you recommended but without either my or Zhang Rui's patch.  I'm happy to
> keep testing as y'all see fit.

Thank you for the files. So it seems that the 11e 3th gen is much more like a regular PC then other Bay Trail / Cherry Trail / Braswell devices and does not really expose the PMIC to the OS in anyway (their might be some ACPI AML code accessing it).

This also means that there really is not a lot which we can do about the issue you are seeing where the system will not boot without lpss_iosf_exit_d3_state() executing at least once and forcing the DMA controllers to be on / in D0. I assume this is necessary to workaround a firmware bug somewhere.

Zhang Rui, that means that it looks like we need to keep your patch and go with my "ACPI / LPSS: Exclude I2C busses shared with PUNIT from pmc_atom_d3_mask" on top to restore S0ix functionality on Bay and Cherry Trail devices.

I will send a reply to the relevant list to get that patch moving upstream.

I believe that this bug can be closed now, if you agree please close it.
Comment 19 Zhang Rui 2018-09-11 07:32:49 UTC
Hi Hans,
Given that you know the full history of this story, do you know why acpi_lpss_resume() can be called without acpi_lpss_suspend() have been called?

Plus, IMO, there is an error in my previous patch, and we need some changes like below
diff --git a/drivers/acpi/acpi_lpss.c b/drivers/acpi/acpi_lpss.c
index bf64cfa..c752062 100644
--- a/drivers/acpi/acpi_lpss.c
+++ b/drivers/acpi/acpi_lpss.c
@@ -895,6 +895,11 @@ static void lpss_iosf_enter_d3_state(void)
        u32 func_dis, d3_sts_0, pmc_status, pmc_mask = 0xfe000ffe;
        int ret;
 
+
+       mutex_lock(&lpss_iosf_mutex);
+       lpss_iosf_d3_entered = false;
+       mutex_unlock(&lpss_iosf_mutex);
+
        ret = pmc_atom_read(PMC_FUNC_DIS, &func_dis);
        if (ret)
                return;

or else, it is possible that, if acpi_lpss_suspend() is invoked for the first time, and lpss_iosf_enter_d3_state() returns early (without iosf code executed), lpss_iosf_d3_entered is not cleared, thus results in iosf code been executed during resume, could this bring any problem?
Comment 20 Hans de Goede 2018-09-11 07:50:13 UTC
Hi Zhang,

(In reply to Zhang Rui from comment #19)
> Hi Hans,
> Given that you know the full history of this story, do you know why
> acpi_lpss_resume() can be called without acpi_lpss_suspend() have been
> called?

What makes you think that acpi_lpss_suspend() doe snot have been called first? What I believe is happening is that one of the LPSS devices runtime suspends and resumes again in time for the DMA controllers to get forced on before their
power-state becomes a problem.

> Plus, IMO, there is an error in my previous patch, and we need some changes
> like below
> diff --git a/drivers/acpi/acpi_lpss.c b/drivers/acpi/acpi_lpss.c
> index bf64cfa..c752062 100644
> --- a/drivers/acpi/acpi_lpss.c
> +++ b/drivers/acpi/acpi_lpss.c
> @@ -895,6 +895,11 @@ static void lpss_iosf_enter_d3_state(void)
>         u32 func_dis, d3_sts_0, pmc_status, pmc_mask = 0xfe000ffe;
>         int ret;
>  
> +
> +       mutex_lock(&lpss_iosf_mutex);
> +       lpss_iosf_d3_entered = false;
> +       mutex_unlock(&lpss_iosf_mutex);
> +
>         ret = pmc_atom_read(PMC_FUNC_DIS, &func_dis);
>         if (ret)
>                 return;
> 
> or else, it is possible that, if acpi_lpss_suspend() is invoked for the
> first time, and lpss_iosf_enter_d3_state() returns early (without iosf code
> executed), lpss_iosf_d3_entered is not cleared, thus results in iosf code
> been executed during resume, could this bring any problem?

The whole purpose of your patch as merged is to make sure that lpss_iosf_exit_d3_state() runs at least once, to fix the Lenovo 11e not booting.

If you add the above changes then if lpss_iosf_enter_d3_state() runs once before  
lpss_iosf_exit_d3_state() runs which I believe it will, lpss_iosf_exit_d3_state() will once again never proceed and the 11e will not boot again.
Comment 21 William Lieurance 2018-09-14 05:02:04 UTC
I've verified that the current linux-stable which include Zhang Rui's patch works on the Braswell Lenovo 11e in question.  That resolves the immediate issue which caused me to open this bug.  If there's more discussion that needs to happen, feel free to reopen or continue the conversation in whatever way you feel best.

Thanks, all, for stepping in and helping with such expertise. I really appreciate it. :-)