Bug 218198 - Suspend/Resume Regression with attached ATA devices
Summary: Suspend/Resume Regression with attached ATA devices
Status: NEW
Alias: None
Product: SCSI Drivers
Classification: Unclassified
Component: Other (show other bugs)
Hardware: Intel Linux
: P3 high
Assignee: scsi_drivers-other
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2023-11-28 07:06 UTC by Dieter Mummenschanz
Modified: 2024-03-05 13:32 UTC (History)
2 users (show)

See Also:
Kernel Version: 6.7-rc1
Subsystem:
Regression: Yes
Bisected commit-id: d035e4eb38b3ea5ae9ead342f888fd3c394b0fe0


Attachments
dmesg partial log (7.09 KB, text/plain)
2023-11-28 08:24 UTC, Dieter Mummenschanz
Details
dmesg partial log stuck at pc2 (7.02 KB, text/plain)
2023-11-28 09:55 UTC, Dieter Mummenschanz
Details
dmesg 6.7-rc3 with device info debug (66.64 KB, text/plain)
2023-11-29 18:10 UTC, Dieter Mummenschanz
Details
logs from kernel 6.6 (66.29 KB, text/plain)
2023-11-29 18:11 UTC, Dieter Mummenschanz
Details
hdparm -I output for sda (3.43 KB, text/plain)
2023-12-18 10:20 UTC, Dieter Mummenschanz
Details
hdparm -I output for sdb (3.43 KB, text/plain)
2023-12-18 10:21 UTC, Dieter Mummenschanz
Details
dmesg with no pc8 transition (58.67 KB, text/plain)
2023-12-18 10:21 UTC, Dieter Mummenschanz
Details
dmesg with pc8 transition (66.29 KB, text/plain)
2023-12-18 10:22 UTC, Dieter Mummenschanz
Details
kernel config 6.7-rc (114.73 KB, text/plain)
2023-12-18 10:22 UTC, Dieter Mummenschanz
Details

Description Dieter Mummenschanz 2023-11-28 07:06:45 UTC
Hello,

the following commit from Kernel 6.7-rc1:

https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/diff/drivers/ata/libata-core.c?id=d035e4eb38b3ea5ae9ead342f888fd3c394b0fe0

introduced a regression on my system where after successful resuming the CPU won't enter lower Package Sates below pc2 even after letting it sit for 15+ minutes. Reverting this commit fixed the issue on my system with two ata drives. Anyone experiencing the same issue? 

I'm happy to try any troubleshooting steps or provide more details if needed.
Comment 1 dlemoal 2023-11-28 07:47:50 UTC
On 11/28/23 16:06, bugzilla-daemon@kernel.org wrote:
> https://bugzilla.kernel.org/show_bug.cgi?id=218198
> 
>             Bug ID: 218198
>            Summary: Suspend/Resume Regression with attached ATA devices
>            Product: SCSI Drivers
>            Version: 2.5
>           Hardware: Intel
>                 OS: Linux
>             Status: NEW
>           Severity: high
>           Priority: P3
>          Component: Other
>           Assignee: scsi_drivers-other@kernel-bugs.osdl.org
>           Reporter: dmummenschanz@web.de
>         Regression: No
> 
> Hello,
> 
> the following commit from Kernel 6.7-rc1:
> 
>
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/diff/drivers/ata/libata-core.c?id=d035e4eb38b3ea5ae9ead342f888fd3c394b0fe0
> 
> introduced a regression on my system where after successful resuming the CPU
> won't enter lower Package Sates below pc2 even after letting it sit for 15+
> minutes. Reverting this commit fixed the issue on my system with two ata
> drives. Anyone experiencing the same issue? 
> 
> I'm happy to try any troubleshooting steps or provide more details if needed.

Could you try adding this:

diff --git a/drivers/ata/libata-core.c b/drivers/ata/libata-core.c
index 09ed67772fae..8d4871fff099 100644
--- a/drivers/ata/libata-core.c
+++ b/drivers/ata/libata-core.c
@@ -6185,6 +6185,9 @@ void ata_pci_shutdown_one(struct pci_dev *pdev)
        for (i = 0; i < host->n_ports; i++) {
                struct ata_port *ap = host->ports[i];

+               /* Wait for EH to complete before freezing the port */
+               ata_port_wait_eh(ap);
+
                ap->pflags |= ATA_PFLAG_FROZEN;

                /* Disable port interrupts */

and see if this changes anything ? Beside that, I am at a loss seeing what is
going on. The commit you mention essentially reverted an earlier change that is
not necessary, bringing back ata_pci_shutdown_one() to the how it was for ages...

When you say "successful resuming", what exactly are you talking about ? System
resume from suspend-to-ram ? from hibernation (suspend to disk) ?

I can always revert this revert, but I would rather understand why that is
needed. Do you see any suspicious libata EH activity in dmesg ?

Also, are the lower Package Sates transitions automatic or driven by the kernel
PM core ? I do not know that. If it is the latter, does the pc2 state also
include adapters ? Isn't it limited to the CPU power state ? I ask because if it
is and libata EH is in a bad loop constantly running after resume, that would
explain why you cannot reach a lower CPU power state.
Comment 2 Dieter Mummenschanz 2023-11-28 08:24:01 UTC
Thanks for the swift reply.

I've applied your patch. Booted up my machine and waited until it transitions into lower package states (pc8 at the lowest). After that I closed the laptop LID and let the machine suspend to RAM (S3). After that I reopened the LID and gave the machine 1-3 minutes time to transition to lower package states which it now does.
I've attached the dmesg part including your patch when the machine enters suspend. One thing is odd though:

[  109.424369] ata5.00: qc timeout after 5000 msecs (cmd 0xe0)
[  109.424397] ata5.00: STANDBY IMMEDIATE failed (err_mask=0x4)

this shouldn't be there, right?

Regarding automatic transitioning I'm not sure how this works. However even though I've set CONFIG_SATA_MOBILE_LPM_POLICY=3 in the kernel config, I have to call an init script explicitly forcing the scsi host to use low power when idle:

for foo in /sys/class/scsi_host/host*/link_power_management_policy;
  do echo med_power_with_dipm > $foo;
done

Otherwise the machine is stuck at PC2 state always.

So we have a fix I guess?

Anyway if there is anything alse I can try or provide you with more debug output please let me know.
Comment 3 Dieter Mummenschanz 2023-11-28 08:24:34 UTC
Created attachment 305490 [details]
dmesg partial log
Comment 4 Dieter Mummenschanz 2023-11-28 09:54:57 UTC
Update: Please disregard my last report. After a second longer suspend I'm stuck at pc2 again :(. logs attached.
Comment 5 Dieter Mummenschanz 2023-11-28 09:55:50 UTC
Created attachment 305491 [details]
dmesg partial log stuck at pc2
Comment 6 Niklas.Cassel 2023-11-29 10:42:55 UTC
On Tue, Nov 28, 2023 at 08:24:01AM +0000, bugzilla-daemon@kernel.org wrote:
> https://bugzilla.kernel.org/show_bug.cgi?id=218198
> 
> --- Comment #2 from Dieter Mummenschanz (dmummenschanz@web.de) ---
> Thanks for the swift reply.
> 
> I've applied your patch. Booted up my machine and waited until it transitions
> into lower package states (pc8 at the lowest). After that I closed the laptop
> LID and let the machine suspend to RAM (S3). After that I reopened the LID
> and
> gave the machine 1-3 minutes time to transition to lower package states which
> it now does.
> I've attached the dmesg part including your patch when the machine enters
> suspend. One thing is odd though:
> 
> [  109.424369] ata5.00: qc timeout after 5000 msecs (cmd 0xe0)
> [  109.424397] ata5.00: STANDBY IMMEDIATE failed (err_mask=0x4)
> 
> this shouldn't be there, right?
>

Hello Dieter,


I took a look at your logs, but they are very stripped.

It would be nice if you could test with latest v6.7-rcX.

And then provide these messages at boot:

[   50.101909] ahci 0000:00:17.0: flags: 64bit ncq sntf pm led clo only pio slum part ems deso sadm sds apst 
[   50.375109] ata10: SATA max UDMA/133 abar m524288@0xa5700000 port 0xa5700480 irq 270 lpm-pol 4
[   50.783496] ata10.00: Features: Dev-Sleep NCQ-sndrcv NCQ-prio


Because, for devsleep to be enabled, you need support in:
1) The SATA device
2) The SATA controller
3) The SATA port (even if the controller supports devsleep,
                  not all SATA ports have to support it)



For devsleep to get enabled, you need "sadm sds" in the SATA controller print,
and "Dev-Sleep" in the SATA device print.
Unfortunately, there does not seem to be a print for the port.

Additionally, your lpm-policy (lpm-pol) has to be either ATA_LPM_MIN_POWER or
ATA_LPM_MIN_POWER_WITH_PARTIAL (i.e. lpm-pol has to print either 4 or 5).


> Regarding automatic transitioning I'm not sure how this works. However even
> though I've set CONFIG_SATA_MOBILE_LPM_POLICY=3 in the kernel config, I have
> to
> call an init script explicitly forcing the scsi host to use low power when
> idle:

Note that even if you have LPM_POLICY=3 (ATA_LPM_MED_POWER_WITH_DIPM) in your
Kconfig, ahci_update_initial_lpm_policy() will possibly override this by default
to either ATA_LPM_MIN_POWER_WITH_PARTIAL or ATA_LPM_MIN_POWER, see:
https://github.com/torvalds/linux/blob/master/drivers/ata/ahci.c#L1639

The best way is to show the:
[   50.375109] ata10: SATA max UDMA/133 abar m524288@0xa5700000 port 0xa5700480 irq 270 lpm-pol 4
print as this is printed after ahci_update_initial_lpm_policy().

The reason why many platforms do this override, is because:
"One of the requirement for modern x86 system to enter lowest power mode
(SLP_S0) is SATA IP block to be off. This is true even during when
platform is suspended to idle and not only in opportunistic (runtime)
suspend."

"SATA IP block doesn't get turned off till SATA is in DEVSLP mode. Here
user has to either use scsi-host sysfs or tools like powertop to set
the sata-host link_power_management_policy to min_power."

See:
https://github.com/torvalds/linux/commit/b1a9585cc396cac5a9e5a09b2721f3b8568e62d0


> 
> for foo in /sys/class/scsi_host/host*/link_power_management_policy;
>   do echo med_power_with_dipm > $foo;
> done

I would really not recommend you doing this, because when you force set
lpm policy via sysfs, ahci_update_initial_lpm_policy() is not called,
so if your platform requires ATA_LPM_MIN_* to enter lower power states,
you forcing lpm-policy to ATA_LPM_MED_POWER_WITH_DIPM will ensure that
you never enter lower power states.


Looking at your logs, we see:
[ 2022.700556] ahci 0000:00:17.0: port does not support device sleep

Which comes from:
https://github.com/torvalds/linux/blob/master/drivers/ata/libahci.c#L2260

So it appears that your port does not support devsleep...

PxDEVSLP.DSP (in AHCI specification) is the bit that determines if devsleep
is supported for a specific port. This bit is initialized by BIOS.

So this could be a BIOS bug...
But you said that it works if you revert Damien's patch...



So the question is, is PxDEVSLP.DSP always 0?
(Even on the first boot, before you have done any suspend/resume).
If so, it could be a BIOS bug...

Perhaps you could test this patch:

And show us all the prints, and tell us which prints are before/after
the suspend/resume.

--- a/drivers/ata/libahci.c
+++ b/drivers/ata/libahci.c
@@ -2255,6 +2255,9 @@ static void ahci_set_aggressive_devslp(struct ata_port *ap, bool sleep)
        int rc;
        unsigned int err_mask;
 
+       dev_info(ap->host->dev, "setting devsleep to: %d port support %d\n",
+                sleep, readl(port_mmio + PORT_DEVSLP));
+
        devslp = readl(port_mmio + PORT_DEVSLP);
        if (!(devslp & PORT_DEVSLP_DSP)) {
                dev_info(ap->host->dev, "port does not support device sleep\n");



You mentioned that it works when you revert Damien's patch.
It could be interesting to see these prints, before/after the
suspend/resume both with and without the revert.

I would expect us to be able to read PxDEVSLP even when the SATA device
is in suspend state... I could imagine that we could get a bogus value
back from the read if the from the SATA controller itself is in a suspend
state, but I don't see how Damien's patch that you bisected to:
https://github.com/torvalds/linux/commit/fd3a6837d8e18cb7be80dcca1283276290336a7a

changed any of that. It does touch ata_pci_shutdown_one(), which should
only get called on shutdown... not suspend/resume AFAICT.

So the fact that this patch changes things for you is weird in the first
place. Damien, is it possible that ata_pci_shutdown_one() is incorrectly
called during suspend/resume? Got any ideas?


Kind regards,
Niklas
Comment 7 Dieter Mummenschanz 2023-11-29 18:10:20 UTC
 (In reply to Niklas.Cassel from comment #6)

Hello Niklas,

thanks for looking into this.

> It would be nice if you could test with latest v6.7-rcX.

Okay I took the latest 6.7-rc3 from Linux's git and included your patch. Booted up my Laptop, put it back to sleep and brought it up again. See attached log.

I've disabed my link_power_management_policy override so all the time my system was stuck at pc2 package state according to powertop.
 
> For devsleep to get enabled, you need "sadm sds" in the SATA controller

Yep I can see that:
[    0.379531] ahci 0000:00:17.0: flags: 64bit ncq sntf stag pm clo only pio slum part ems sxs deso sadm sds apst 

> and "Dev-Sleep" in the SATA device print.

I guess this is it?
[    1.031169] ata5.00: Features: Trust Dev-Sleep NCQ-sndrcv

> Additionally, your lpm-policy (lpm-pol) has to be either ATA_LPM_MIN_POWER or
> ATA_LPM_MIN_POWER_WITH_PARTIAL (i.e. lpm-pol has to print either 4 or 5).

I'm afraid this is not the case here:
[    0.399005] ata1: SATA max UDMA/133 abar m2048@0x42233000 port 0x42233100 irq 125 lpm-pol 0
 
> Note that even if you have LPM_POLICY=3 (ATA_LPM_MED_POWER_WITH_DIPM) in your
> Kconfig, ahci_update_initial_lpm_policy() will possibly override this by
> default

That would explain why I'm seeing "max_performance" in link_power_management_policy without overriding it.


> The reason why many platforms do this override, is because:
> "One of the requirement for modern x86 system to enter lowest power mode
> (SLP_S0) is SATA IP block to be off. This is true even during when
> platform is suspended to idle and not only in opportunistic (runtime)
> suspend."
> 
> "SATA IP block doesn't get turned off till SATA is in DEVSLP mode. Here
> user has to either use scsi-host sysfs or tools like powertop to set
> the sata-host link_power_management_policy to min_power."
> 
> See:
> https://github.com/torvalds/linux/commit/
> b1a9585cc396cac5a9e5a09b2721f3b8568e62d0

Thanks for the insight.
 
> I would really not recommend you doing this, because when you force set
> lpm policy via sysfs, ahci_update_initial_lpm_policy() is not called,
> so if your platform requires ATA_LPM_MIN_* to enter lower power states,
> you forcing lpm-policy to ATA_LPM_MED_POWER_WITH_DIPM will ensure that
> you never enter lower power states.

I understand however I do not know of any other way to enable lower package states on my machine.

> So it appears that your port does not support devsleep...

Anything we can do to preserve the old bwhaviour until 6.6?
 
> PxDEVSLP.DSP (in AHCI specification) is the bit that determines if devsleep
> is supported for a specific port. This bit is initialized by BIOS.
> 
> So this could be a BIOS bug...
> But you said that it works if you revert Damien's patch...

Yes, I had no issues up to 6.7-rc1. 
 
> Perhaps you could test this patch:

Yweah, see logs.  

> You mentioned that it works when you revert Damien's patch.
> It could be interesting to see these prints, before/after the
> suspend/resume both with and without the revert.

I've attached the dmesg logs from kernel 6.6.
 

Thanks again for your help
Dieter
Comment 8 Dieter Mummenschanz 2023-11-29 18:10:50 UTC
Created attachment 305512 [details]
dmesg 6.7-rc3 with device info debug
Comment 9 Dieter Mummenschanz 2023-11-29 18:11:13 UTC
Created attachment 305513 [details]
logs from kernel 6.6
Comment 10 Phillip Susi 2023-11-29 18:47:26 UTC
Niklas Cassel <Niklas.Cassel@wdc.com> writes:

> The reason why many platforms do this override, is because:
> "One of the requirement for modern x86 system to enter lowest power mode
> (SLP_S0) is SATA IP block to be off. This is true even during when
> platform is suspended to idle and not only in opportunistic (runtime)
> suspend."
>
> "SATA IP block doesn't get turned off till SATA is in DEVSLP mode. Here
> user has to either use scsi-host sysfs or tools like powertop to set
> the sata-host link_power_management_policy to min_power."

What?  I'm confused.  Here are several things that come to mind that
when taken together with this statement confuse me.  Perhaps I am wrong
about one or more of them?

1)  DEVSLP is an odd thing they invented to have ACPI twiggle a GPIO bit
to cut the power rails to CDROM drives that aren't in active use.

2)  SATA ALPM has the SATA controller automatically transition the sata
link to lower power states when it isn't being used, and back again on
use.

3)  The SATA controller can not be suspended until all of its ports are
suspended.

4)  Suspending a SATA port does not happen during ALPM, but rather only
either during system suspend, or when you enable runtime pm on the disk
and the ata port in sysfs ( both are disabled by default ).

A quick google led me to this: https://smarthdd.com/device_sleep.htm

Which indicates that this DEVSLP thing is now being used to send an OOB
signal to a SATA SSD so that it can power down its PHY completely, but
isn't that what ATA SLEEP is for?  aka. hdparm -Y.  That command tells
the drive that it can power down its PHY and then the host can power
down its PHY and it takes a sata link reset to wake the drive back up.
The link reset is easy to detect with minimum circuitry that is far
simpler than the full sata PHY.

It sounds like somebody decided to hack an OOB signal into ALPM rather
than use ATA SLEEP, but why?
Comment 11 Niklas.Cassel 2023-11-30 08:49:12 UTC
On Wed, Nov 29, 2023 at 01:47:16PM -0500, Phillip Susi wrote:
> Niklas Cassel <Niklas.Cassel@wdc.com> writes:
> 
> > The reason why many platforms do this override, is because:
> > "One of the requirement for modern x86 system to enter lowest power mode
> > (SLP_S0) is SATA IP block to be off. This is true even during when
> > platform is suspended to idle and not only in opportunistic (runtime)
> > suspend."
> >
> > "SATA IP block doesn't get turned off till SATA is in DEVSLP mode. Here
> > user has to either use scsi-host sysfs or tools like powertop to set
> > the sata-host link_power_management_policy to min_power."

Note that I copied this text from the original patch authored by someone
at Intel.

I've seen similar claims from people at AMD.


> 
> What?  I'm confused.  Here are several things that come to mind that
> when taken together with this statement confuse me.  Perhaps I am wrong
> about one or more of them?
> 
> 1)  DEVSLP is an odd thing they invented to have ACPI twiggle a GPIO bit
> to cut the power rails to CDROM drives that aren't in active use.
> 
> 2)  SATA ALPM has the SATA controller automatically transition the sata
> link to lower power states when it isn't being used, and back again on
> use.

Yes, the HBA can initiate a transition to DevSleep automatically using ALPM,
however, the specs says that:
-DevSleep is disabled by the device on power up (and has to be enabled by a
SET FEATURES command).
-PxDEVSLP.ADSE (Aggressive Device Sleep Enable) for the port is 0 on reset
(the kernel resets the HBA on boot, so ADSE will be disabled for the port).

(DevSleep is never initiated by the device.)

The kernel will only do:
-SET FEATURES to enable the DevSleep feature on the device, and
-Set PxDEVSLP.ADSE to 1

If _all_ of the following are true:
1) SADM and SDS bits are set in the HBA
2) the device reports that it supports DevSleep
3) PxDEVSLP.DSP is set for the port
4) the kernel LPM policy is either ATA_LPM_MIN_POWER or
   ATA_LPM_MIN_POWER_WITH_PARTIAL

See:
https://github.com/torvalds/linux/blob/v6.7-rc3/drivers/ata/libahci.c#L2322-L2324


This means that if any of 1-5 it not true, the HBA ALPM will never transition
the device into DevSleep state.

AFAICT, the HBA asserts the DEVSLP signal as long as being in DevSleep state.

To exit from DevSleep state you can just issue an I/O like normal using PxCI,
or write to PxCMD.ICC to force a state change.


> 3)  The SATA controller can not be suspended until all of its ports are
> suspended.
> 
> 4)  Suspending a SATA port does not happen during ALPM, but rather only
> either during system suspend, or when you enable runtime pm on the disk
> and the ata port in sysfs ( both are disabled by default ).
> 
> A quick google led me to this: https://smarthdd.com/device_sleep.htm
> 
> Which indicates that this DEVSLP thing is now being used to send an OOB
> signal to a SATA SSD so that it can power down its PHY completely, but
> isn't that what ATA SLEEP is for?  aka. hdparm -Y.  That command tells
> the drive that it can power down its PHY and then the host can power
> down its PHY and it takes a sata link reset to wake the drive back up.
> The link reset is easy to detect with minimum circuitry that is far
> simpler than the full sata PHY.
> 
> It sounds like somebody decided to hack an OOB signal into ALPM rather
> than use ATA SLEEP, but why?

I guess you need to ask Intel.

I assume that their firmware simply requires the DEVSLP signal to be
asserted in order to enter lower CPU power states.

If you have a signal, it is easy for the HW logic to detect if the signal
is set or not.

If you just send a command to the device, if it not easy for HW logic
to determine which state is in. It would need to read some registers
or similar. Sounds way more complex than just having a logic gate.


Kind regards,
Niklas
Comment 12 Phillip Susi 2023-12-03 19:50:46 UTC
bugzilla-daemon@kernel.org writes:

> I guess you need to ask Intel.
>
> I assume that their firmware simply requires the DEVSLP signal to be
> asserted in order to enter lower CPU power states.

I would say it's the hardware that requires the ata link to be powered
down, not the firmware.  They seem to have decided that the way to get
the hardware to power down was to add a new state to ALPM instead of
using runtime pm to actually disable the port.  I would love to hear
from Intel why they went this route.

> If you just send a command to the device, if it not easy for HW logic
> to determine which state is in. It would need to read some registers
> or similar. Sounds way more complex than just having a logic gate.

I'm not quite sure what you are getting at here.  The point of the ATA
SLEEP command is to inform the device that it should power down
everything, including the SATA PHY, and only keep a simple logic gate
active that can detect the big and obvious RESET signal from the host.
It seems to accomplish the same goal as DEVSLP without requiring hacking
in another logic line somewhere that wasn't meant for that purpose.
Comment 13 Dieter Mummenschanz 2023-12-04 09:31:35 UTC
Maybe it helps to mention that if I issue 

hdparm -Y /dev/sda && hdparm -Y /dev/sdb

a couple of seconds after successful resume my system transitions down to pc8. hdparm manual doesn't explicitly tell what command is issued with the -Y switch.
Comment 14 Phillip Susi 2023-12-05 21:37:21 UTC
bugzilla-daemon@kernel.org writes:

> a couple of seconds after successful resume my system transitions down to
> pc8.
> hdparm manual doesn't explicitly tell what command is issued with the -Y
> switch.

SLEEP.  One problem with using it though is that several things
tend to wake the drive up from it soon.  I've had patches to address
this for years.  Maybe I'll finally get them merged soon.  Those things
are:

1)  smartd or udisks2 polling the SMART status of the drive.  They
already issue CHECK POWER CONDITION first to see if the drive is in
STANDBY and if so, don't bother with the SMART read.  That avoids waking
up a disk in STANDBY mode, but in SLEEP, you can't even ask the drive
whether it's in standby without waking it up.

2)  A mounted filesystem periodically issues a FLUSH CACHE command, even
if nothing has been written.  A drive in STANDBY just ignores it but in
SLEEP, it wakes up.

I would imagine at once you access the disk in some way and so it wakes
up, you won't get back to pc8 again without another hdparm -Y?  Is that
correct?
Comment 15 Dieter Mummenschanz 2023-12-06 06:51:06 UTC
(In reply to Phillip Susi from comment #14)
> bugzilla-daemon@kernel.org writes:
> 
> 
> 2)  A mounted filesystem periodically issues a FLUSH CACHE command, even
> if nothing has been written.  A drive in STANDBY just ignores it but in
> SLEEP, it wakes up.
> 
> I would imagine at once you access the disk in some way and so it wakes
> up, you won't get back to pc8 again without another hdparm -Y?  Is that
> correct?

No or at least I don't see this behaviour. However the drives occasional seem to have some trouble waking up after resume and hdparm -Y:

[59984.831894] ata5.00: exception Emask 0x0 SAct 0x2000000 SErr 0x10000 action 0x6
[59984.831907] ata5.00: waking up from sleep
[59984.831914] ata5: hard resetting link
[59985.145261] ata5: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
[59985.148840] ata5.00: supports DRM functions and may not be fully accessible
[59985.153322] ata5.00: supports DRM functions and may not be fully accessible
[59985.156822] ata5.00: configured for UDMA/133
[59985.166903] ahci 0000:00:17.0: port does not support device sleep
[59985.167045] ata5: EH complete
[59985.167268] ata5.00: Enabling discard_zeroes_data

When that happens the system is stuck in pc2/pc3 for a couple of minutes but then transitions back in partial to pc8.

Regarding your patches: Anything I can test for you? :)
Comment 16 Phillip Susi 2023-12-07 13:55:45 UTC
bugzilla-daemon@kernel.org writes:

> No or at least I don't see this behaviour. However the drives occasional seem
> to have some trouble waking up after resume and hdparm -Y:

That's strange.  Once the drive is woken back up that should once again,
prevent pc8, just like it did before being put to SLEEP.

> [59984.831894] ata5.00: exception Emask 0x0 SAct 0x2000000 SErr 0x10000
> action
> 0x6
> [59984.831907] ata5.00: waking up from sleep
> [59984.831914] ata5: hard resetting link
> [59985.145261] ata5: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
> [59985.148840] ata5.00: supports DRM functions and may not be fully
> accessible
> [59985.153322] ata5.00: supports DRM functions and may not be fully
> accessible
> [59985.156822] ata5.00: configured for UDMA/133
> [59985.166903] ahci 0000:00:17.0: port does not support device sleep
> [59985.167045] ata5: EH complete
> [59985.167268] ata5.00: Enabling discard_zeroes_data

That's normal.  The way to wake the drive up from SLEEP is to issue a
link hard reset.  That's just the kernel showing that it's doing that.
At least the first part is normal, I'm not sure about the bit about DRM
functions.  The bit about the port not supporting device sleep seems to
be your main issue with it not getting to pc8 using ALPM.

> Regarding your patches: Anything I can test for you? :)

Keep an eye on the libata list.  Hopefully I'll clean them up and post
them in the next few days.  Initially it will just be to keep hdparm -Y
from being woken up for silly reasons.  Sorting out the bigger problems
with runtime pm so that the kernel can automatically put things to sleep
when idle is going to be a bigger issue than I first thought.
Comment 17 The Linux kernel's regression tracker (Thorsten Leemhuis) 2023-12-15 10:51:44 UTC
What's the status of this? I have this in my list of tracked regressions and it seems nothing happened for about a week -- but I might have missed something?

Or is this not considered a regression for some good reason?
Comment 18 Niklas.Cassel 2023-12-15 15:28:36 UTC
Hello Dieter,


I'm really sorry for missing your reply.
The updates from bugzilla were not set to the libata mailing list
(only to the scsi mailing list).

Please consider writing to the libata mailing list instead of using
bugzilla, I'm quite sure that you will get more eyes on your problem
that way.

On Wed, Nov 29, 2023 at 06:10:20PM +0000, bugzilla-daemon@kernel.org wrote:
> https://bugzilla.kernel.org/show_bug.cgi?id=218198
> 
> --- Comment #7 from Dieter Mummenschanz (dmummenschanz@web.de) ---
>  (In reply to Niklas.Cassel from comment #6)
> 
> Hello Niklas,
> 
> thanks for looking into this.
> 
> > It would be nice if you could test with latest v6.7-rcX.
> 
> Okay I took the latest 6.7-rc3 from Linux's git and included your patch.
> Booted
> up my Laptop, put it back to sleep and brought it up again. See attached log.
> 
> I've disabed my link_power_management_policy override so all the time my
> system
> was stuck at pc2 package state according to powertop.
> 
> > For devsleep to get enabled, you need "sadm sds" in the SATA controller
> 
> Yep I can see that:
> [    0.379531] ahci 0000:00:17.0: flags: 64bit ncq sntf stag pm clo only pio
> slum part ems sxs deso sadm sds apst

Ok, good 1/4.

> 
> > and "Dev-Sleep" in the SATA device print.
> 
> I guess this is it?
> [    1.031169] ata5.00: Features: Trust Dev-Sleep NCQ-sndrcv

Yes, good, 2/4.

> 
> > Additionally, your lpm-policy (lpm-pol) has to be either ATA_LPM_MIN_POWER
> or
> > ATA_LPM_MIN_POWER_WITH_PARTIAL (i.e. lpm-pol has to print either 4 or 5).
> 
> I'm afraid this is not the case here:
> [    0.399005] ata1: SATA max UDMA/133 abar m2048@0x42233000 port 0x42233100
> irq 125 lpm-pol 0

It appears that you have built your kernel with:
CONFIG_SATA_MOBILE_LPM_POLICY=0

All major distros use:
CONFIG_SATA_MOBILE_LPM_POLICY=3

Could you please try with CONFIG_SATA_MOBILE_LPM_POLICY=3 ?

If you build with CONFIG_SATA_MOBILE_LPM_POLICY=0, then set_lpm() in libata
will never be called, which means that we will never enable (or disable
devsleep), so you will use whatever your boot firmware has configured.
(And boot firmware might have enabled devsleep in the device.)



In your v6.7-rcX dmesg, we don't see any:
"port does not support device sleep" or
"setting devsleep to %d" (from my debug patch)
prints in your dmesg from v6.7-rcX, so set_lpm() is never called at all,
most likely because you have built with CONFIG_SATA_MOBILE_LPM_POLICY=0,
so DevSleep could be enabled if platform firmware enabled it.



In your v6.6 dmesg, we see:
"port does not support device sleep"
(but not "setting devsleep to %d" from my debug patch, so you
probably forgot to apply it to your v6.6 kernel).

However, the fact that you see "port does not support device sleep"
suggests that you either built your kernel with
CONFIG_SATA_MOBILE_LPM_POLICY set to something other than 0,
or you have manually overriden the policy via sysfs.

Note that since your platform firmware claims that the port does
not support DevSleep (PxDEVSLP.DSP is set to 0), this means that
ahci_set_aggressive_devslp() will return early:
https://github.com/torvalds/linux/blob/v6.6/drivers/ata/libahci.c#L2258-L2262
So it will neither enable nor disable DevSleep in the device,
so DevSleep could be enabled if platform firmware enabled it.


In both of your cases, it looks like you should have DevSleep set
to whatever platform firmware has set it to.
But you say that, for these logs, v6.6 can enter low CPU power states,
but v6.7-rcX can not?


You can run:
hdparm -I /dev/<your_device>
to see if DEVSLP is enabled in your device (hdparm prints a * in front
of the feature if the feature is enabled).

It could also be worth checking your BIOS, I've seen some cases where you had
to enable aggressive devsleep in BIOS for it to get enabled for the port.


> 
> > Note that even if you have LPM_POLICY=3 (ATA_LPM_MED_POWER_WITH_DIPM) in
> your
> > Kconfig, ahci_update_initial_lpm_policy() will possibly override this by
> > default
> 
> That would explain why I'm seeing "max_performance" in
> link_power_management_policy without overriding it.

The sysfs reporting for LPM is really broken in libata...

$ git grep max_performance drivers/ata/libata-sata.c
drivers/ata/libata-sata.c:      [ATA_LPM_UNKNOWN]               = "max_performance",
drivers/ata/libata-sata.c:      [ATA_LPM_MAX_POWER]             = "max_performance",

So it reports "max_performance" both for LPM_POLICY == 0 (ATA_LPM_UNKNOWN)
and LPM_POLICY == 1 (ATA_LPM_MAX_POWER).

In your case, it is because of ATA_LPM_UNKNOWN, which means use whatever
platform firmware has configured.

ahci_update_initial_lpm_policy() will only override your policy if your
LPM_POLICY >= 3, so it will not override it for your case.


> > I would really not recommend you doing this, because when you force set
> > lpm policy via sysfs, ahci_update_initial_lpm_policy() is not called,
> > so if your platform requires ATA_LPM_MIN_* to enter lower power states,
> > you forcing lpm-policy to ATA_LPM_MED_POWER_WITH_DIPM will ensure that
> > you never enter lower power states.
> 
> I understand however I do not know of any other way to enable lower package
> states on my machine.

I do not understand why that helps you.

If you set:
ATA_LPM_MED_POWER_WITH_DIPM (LPM_POLICY=3) on v6.6 in sysfs,
you can enter low CPU power states?

If you set 
CONFIG_SATA_MOBILE_LPM_POLICY=3 on v6.6,
can you enter low CPU power states?

Looking at AHCI 1.3.1
PM:Aggr:

State PM:DevSleep is gated with:

PxDEVSLP.ADSE = ‘1’ and CAP2.SDS = ‘1’ and CAP2.SADM
= ‘1’ and PxDEVSLP.DSP = 1’ and PxSCTL.IPM != ‘4h’ and
PxSCTL.IPM != ‘5h’ and PxSCTL.IPM != ‘6h’ and PxSCTL.IPM
!= ‘7h’ and CAP2.DESO = ‘0’ and pDitoTimeout = ‘1’

So AFAICT, ALPM by the HBA should never be able to transition
to DevSleep if PxDSP.DSP == 0.

Perhaps your platform actually does NOT support devsleep and
simply requires the device to enter slumber or partial in order
to enter the lower CPU power states?

We could test this by you disabling devslp on the device via hdparm,
and see if you can still enter the lower CPU power states.


Kind regards,
Niklas
Comment 19 Dieter Mummenschanz 2023-12-18 10:18:43 UTC
Hallo Niklas,

thanks for getting back to me on that.

> Please consider writing to the libata mailing list instead of using
> bugzilla, I'm quite sure that you will get more eyes on your problem
> that way.

Could point me to that mailing list please. I can't seem to find it. Do you want me to repost the issue there or is it okay to just copy & past the messages from this bug?
 

> It appears that you have built your kernel with:
> CONFIG_SATA_MOBILE_LPM_POLICY=0


Definately not! I've attached my kernel config. LPM policy is definately set tp "3" CONFIG_SATA_MOBILE_LPM_POLICY=3. Thats puzzles me because after booting the 
link_power_management_policy is set to performance. Somehow the kernel overrides the config or uses parameters suggested by the BIOS I don't know. I've also tried to to switch the ASPM polity to Powersave instead of BIOS default. Didn't help.

> In your v6.7-rcX dmesg, we don't see any:
> "port does not support device sleep" or
> "setting devsleep to %d" (from my debug patch)
> prints in your dmesg from v6.7-rcX, so set_lpm() is never called at all,
> most likely because you have built with CONFIG_SATA_MOBILE_LPM_POLICY=0,
> so DevSleep could be enabled if platform firmware enabled it.

The print is not called, right but not because of CONFIG_SATA_MOBILE_LPM_POLICY=0 since It is set to "3". However when I force 
link_power_management_policy to "echo med_power_with_dipm" the message appears in dmesg:

[    9.434663] ahci 0000:00:17.0: setting devsleep to: 0 port support 0
[    9.434668] ahci 0000:00:17.0: port does not support device sleep
[    9.434674] ahci 0000:00:17.0: setting devsleep to: 0 port support 0
[    9.434678] ahci 0000:00:17.0: port does not support device sleep

so somehow the LPM Policy is either not set or reset by the kernel I assume.

> However, the fact that you see "port does not support device sleep"
> suggests that you either built your kernel with
> CONFIG_SATA_MOBILE_LPM_POLICY set to something other than 0,
> or you have manually overriden the policy via sysfs.
> 
> Note that since your platform firmware claims that the port does
> not support DevSleep (PxDEVSLP.DSP is set to 0), this means that
> ahci_set_aggressive_devslp() will return early:
> https://github.com/torvalds/linux/blob/v6.6/drivers/ata/libahci.c#L2258-L2262
> So it will neither enable nor disable DevSleep in the device,
> so DevSleep could be enabled if platform firmware enabled it.
> 
> 
> In both of your cases, it looks like you should have DevSleep set
> to whatever platform firmware has set it to.
> But you say that, for these logs, v6.6 can enter low CPU power states,
> but v6.7-rcX can not?

Not quite. In 6.6 the PC8 transition worked only when forcing link_power_management_policy to "echo med_power_with_dipm". The policy was still performance after bootup like it is in 6.7-rc. The only difference I see is that after resume my laptop won't enter PC8 again. For 6.7-rc I have to call hdparm -Y for sda and sdb after resume to enable the transition. 

 
> You can run:
> hdparm -I /dev/<your_device>
> to see if DEVSLP is enabled in your device (hdparm prints a * in front
> of the feature if the feature is enabled).

Attached to this bug for both (identical) drives.

> It could also be worth checking your BIOS, I've seen some cases where you had
> to enable aggressive devsleep in BIOS for it to get enabled for the port.

My Laptop BIOS (INSYDE) does not offer any knobs to tune dleep for any ATA or PCI devices. Only thing I have set is the powersave profile.

> The sysfs reporting for LPM is really broken in libata...
> 
> $ git grep max_performance drivers/ata/libata-sata.c
> drivers/ata/libata-sata.c:      [ATA_LPM_UNKNOWN]               =
> "max_performance",
> drivers/ata/libata-sata.c:      [ATA_LPM_MAX_POWER]             =
> "max_performance",
> 
> So it reports "max_performance" both for LPM_POLICY == 0 (ATA_LPM_UNKNOWN)
> and LPM_POLICY == 1 (ATA_LPM_MAX_POWER).
> 
> In your case, it is because of ATA_LPM_UNKNOWN, which means use whatever
> platform firmware has configured.

Thats confusing :(
 
> ahci_update_initial_lpm_policy() will only override your policy if your
> LPM_POLICY >= 3, so it will not override it for your case.
 
> > I understand however I do not know of any other way to enable lower package
> > states on my machine.
> 
> I do not understand why that helps you.

Me neither but it works. :(
 
> If you set:
> ATA_LPM_MED_POWER_WITH_DIPM (LPM_POLICY=3) on v6.6 in sysfs,
> you can enter low CPU power states?

> If you set 
> CONFIG_SATA_MOBILE_LPM_POLICY=3 on v6.6,
> can you enter low CPU power states?

As described above I'm doing this for years during bootup otherwise I'm stuck at PC2.

> 
> Looking at AHCI 1.3.1
> PM:Aggr:
> 
> State PM:DevSleep is gated with:
> 
> PxDEVSLP.ADSE = ‘1’ and CAP2.SDS = ‘1’ and CAP2.SADM
> = ‘1’ and PxDEVSLP.DSP = 1’ and PxSCTL.IPM != ‘4h’ and
> PxSCTL.IPM != ‘5h’ and PxSCTL.IPM != ‘6h’ and PxSCTL.IPM
> != ‘7h’ and CAP2.DESO = ‘0’ and pDitoTimeout = ‘1’
> 
> So AFAICT, ALPM by the HBA should never be able to transition
> to DevSleep if PxDSP.DSP == 0.
> 
> Perhaps your platform actually does NOT support devsleep and
> simply requires the device to enter slumber or partial in order
> to enter the lower CPU power states?

How can I tell?
 
> We could test this by you disabling devslp on the device via hdparm,
> and see if you can still enter the lower CPU power states.

Okay and how do I so that?
 
Kind regards,
Dieter
Comment 20 Dieter Mummenschanz 2023-12-18 10:20:44 UTC
Created attachment 305626 [details]
hdparm -I output for sda

hdparm -I output for sda
Comment 21 Dieter Mummenschanz 2023-12-18 10:21:11 UTC
Created attachment 305627 [details]
hdparm -I output for sdb

hdparm -I output for sdb
Comment 22 Dieter Mummenschanz 2023-12-18 10:21:44 UTC
Created attachment 305628 [details]
dmesg with no pc8 transition

dmesg with no pc8 transition
Comment 23 Dieter Mummenschanz 2023-12-18 10:22:06 UTC
Created attachment 305629 [details]
dmesg with pc8 transition

dmesg with pc8 transition
Comment 24 Dieter Mummenschanz 2023-12-18 10:22:55 UTC
Created attachment 305630 [details]
kernel config 6.7-rc

kernel config 6.7-rc
Comment 25 dlemoal 2023-12-18 10:29:52 UTC
On 2023/12/18 19:18, bugzilla-daemon@kernel.org wrote:
> https://bugzilla.kernel.org/show_bug.cgi?id=218198
> 
> --- Comment #19 from Dieter Mummenschanz (dmummenschanz@web.de) ---
> Hallo Niklas,
> 
> thanks for getting back to me on that.
> 
>> Please consider writing to the libata mailing list instead of using
>> bugzilla, I'm quite sure that you will get more eyes on your problem
>> that way.
> 
> Could point me to that mailing list please. I can't seem to find it. Do you
> want me to repost the issue there or is it okay to just copy & past the
> messages from this bug?

For any subsystem, you can see the emails to use by running:

scripts/get_maintainer.pl

So for ATA:

scripts/get_maintainer.pl drivers/ata/
Damien Le Moal <dlemoal@kernel.org> (maintainer:LIBATA SUBSYSTEM (Serial and
Parallel ATA drivers))
linux-ide@vger.kernel.org (open list:LIBATA SUBSYSTEM (Serial and Parallel ATA
drivers))
linux-kernel@vger.kernel.org (open list)

Given the high traffic, no need to add linux-kernel@vger.kernel.org. But please
add linux-ide@vger.kernel.org to the bugzilla case.

And also please add Niklas and myself.

Niklas Cassel <Niklas.Cassel@wdc.com>
Damien Le Moal <dlemoal@kernel.org>
Comment 26 The Linux kernel's regression tracker (Thorsten Leemhuis) 2023-12-18 10:36:55 UTC
Please also include the regressions list:

Linux kernel regressions list <regressions@lists.linux.dev>
Comment 27 The Linux kernel's regression tracker (Thorsten Leemhuis) 2024-03-05 13:32:36 UTC
Dieter (or Damien/Niklas), what is the status of this? Was this fixed?

Note You need to log in before you can comment on or make changes to this bug.