Bug 215880

Summary: Resume process hangs for 5-6 seconds starting sometime in 5.16
Product: SCSI Drivers Reporter: ericspero
Component: OtherAssignee: scsi_drivers-other
Status: NEW ---    
Severity: normal CC: bvanassche, damien.lemoal, damien.lemoal, gzhqyz, jason600.groome, kernel, nilsdev, paula, regressions, rui.zhang
Priority: P1    
Hardware: AMD   
OS: Linux   
Kernel Version: 5.16 Subsystem:
Regression: No Bisected commit-id:
Attachments: dmesg after suspend-resume cycle using patched kernel
Script that enables kernel debugging options

Description ericspero 2022-04-24 15:39:22 UTC
Previously reported on the Gentoo bug tracker: https://bugs.gentoo.org/837500

Basic issue: 

Starting sometime in 5.16, my GUI freezes up during resume for 5-6 seconds: the screen is frozen, and it seems key presses and mouse movements are not registered. If I suspend from a tty, the 'loginctl suspend' command takes about 5-6 seconds to finish after resuming, however here my key presses are registered. 

It seems like the first thing it does on resume is to suspend the device, as if the system powered off too quickly or something.

I used git bisect to identify the commit where the problem begins: 

========
commit a19a93e4c6a98c9c0f2f5a6db76846f10d7d1f85
Author: Bart Van Assche <bvanassche@acm.org>
Date:   Wed Oct 6 14:54:51 2021 -0700

    scsi: core: pm: Rely on the device driver core for async power management
    
    Instead of implementing asynchronous resume support in the SCSI core, rely
    on the device driver core for resuming SCSI devices asynchronously.
    Instead of only supporting asynchronous resumes, also support asynchronous
    suspends.
    
    Link: https://lore.kernel.org/r/20211006215453.3318929-2-bvanassche@acm.org
    Cc: Alan Stern <stern@rowland.harvard.edu>
    Cc: Dan Williams <dan.j.williams@intel.com>
    Cc: Hannes Reinecke <hare@suse.com>
    Cc: Adrian Hunter <adrian.hunter@intel.com>
    Cc: Martin Kepplinger <martin.kepplinger@puri.sm>
    Signed-off-by: Bart Van Assche <bvanassche@acm.org>
    Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

 drivers/scsi/hosts.c      |  1 +
 drivers/scsi/scsi.c       |  8 --------
 drivers/scsi/scsi_pm.c    | 44 ++------------------------------------------
 drivers/scsi/scsi_priv.h  |  4 +---
 drivers/scsi/scsi_scan.c  | 17 +++++++++++++++++
 drivers/scsi/scsi_sysfs.c |  1 +
 drivers/scsi/sd.c         |  1 -
 7 files changed, 22 insertions(+), 54 deletions(-)
========

For git bisect I followed the instructions provided on the gentoo wiki: https://wiki.gentoo.org/wiki/Kernel_git-bisect, which uses the linux-stable repository.

Here is a sample dmesg output when I do not experience the bug ('good') and when I do ('bad'). Nothing to me seems obviously wrong. You do see some 'extra' lines in the 'bad' case at the end, but these lines also appear in the 'good' case later on; they don't have enough time to appear in the 'good' case when the system resumes quickly as normal. There are xHCI errors related to USB buses, but they appear in both cases.

Good: https://dpaste.org/S90gh
Bad:  https://dpaste.org/9KQ6C

Hardware info:
AMD Ryzen 3600
AMD Radeon RX580 
AMD Radeon 270X (usually bound to vfio-pci, otherwise amdgpu)
ASUS Strix ROG B450-F (latest BIOS)
Seasonic Prime GX-750 (brand new)
I also have an M.2 NVME drive (boot), a SATA SSD, and one HDD.


I'm happy to try any troubleshooting steps.
Comment 1 ericspero 2022-04-24 15:44:58 UTC
Forgot to mention some software info:

OS: Gentoo
WM: sway
init: OpenRC
login manager: elogind
Comment 2 Bart Van Assche 2022-05-01 19:16:23 UTC
In the Gentoo bug report there are links to the dmesg output for the 'good' and 'bad' cases. In the 'bad' dmesg output I found the following:

ata2: link is slow to respond, please be patient (ready=0)

Does that message still appear if the following three patches are reverted on top of kernel v5.16:
* 9131bff6a9f1 ("scsi: core: pm: Only runtime resume if necessary").
* 1c9575326a4a ("scsi: sd: Rename sd_resume() into sd_resume_system()").
* a19a93e4c6a9 ("scsi: core: pm: Rely on the device driver core for async power management").

If so, Damien, can you please help with checking whether or not this is perhaps an ATA regression?
Comment 3 ericspero 2022-05-01 20:02:49 UTC
Thank you for the response.

Sorry that I didn't make this clear in the original post, but I do get this message in the 'good' case as well - I just did not wait long enough for it to appear before copying the dmesg output. In the 'bad' case, the 5-6 second pause I get on resume gives it enough time for it to appear before I call dmesg. 

Here is a dmesg output from a 'good' case (5.15.33) where I waited a few seconds after resuming: 

https://dpaste.org/rdvvD

I assume those patches are not present in 5.15.33. If you would still like me to try reverting these patches please let me know.

I'm also noticing that in both 'good' and 'bad' cases the first line of output I see in dmesg after starting the resume process is "Suspending system...". I waited 10-15 seconds before resuming, but there are no gaps between messages in the timestamps. This is not how it works on my laptop that uses plain old systemd, where I get the equivalent message "PM: suspend entry (s2idle)" before the system suspends.
Comment 4 Damien Le Moal 2022-05-02 12:40:21 UTC
Hmmm... The message comes from ata_wait_ready(), which as the name suggests waits for the link to become ready (PHY and device present detected). And if the link does not become ready after 5s, that message is printed but the wait still continues. And given that the link *does* become ready, it looks like the drive/adapter are simply slow to wake-up. What is on ata2 ? Is it an HDD or a SSD ? If it is an HDD, I am not suprised that it is slow. Need time to spin-up :)
Comment 5 ericspero 2022-05-02 13:35:59 UTC
Why yes, it is a 4TB HDD. When I unplug it and then do suspend-resume I do not get the 'Waiting for ata2' error, and I don't get the 5-6 second system hang either! 

The drive is probably around 5 years old now, but it seems to work fine, and a smartctl short test shows no errors.
Comment 6 Damien Le Moal 2022-05-02 13:40:18 UTC
HDD disk spinup can take more than 10s. That is not something strange. The HDD controller electronic will "boot" faster than that and may start replying to commands without waiting for the full spin-up to complete. However, depending on the HDD internal implementation, the HDD controller may have to wait for the spinup to complete first to access its metadata (drive information) stored on the disk platters. For such HDDs, the drive will not respond until it is fully spun-up.

All things considered, I would say that the message is "normal" and not to be worried about.
Comment 7 ericspero 2022-05-02 14:04:40 UTC
I see, good to know. Thanks for the explanation, and for the original pointer as well.
Comment 8 Bart Van Assche 2022-05-05 22:24:18 UTC
I'm working on a patch.
Comment 9 ericspero 2022-05-06 20:49:15 UTC
Ok. Please let me know if you need any testing done.
Comment 10 The Linux kernel's regression tracker (Thorsten Leemhuis) 2022-05-30 13:20:01 UTC
(In reply to Bart Van Assche from comment #8)
> I'm working on a patch.

Bart, was there any progress? From here it looks like there wasn't any to get this regression fixed, but maybe I just didn't look at the right place.
Comment 11 Bart Van Assche 2022-05-30 16:58:53 UTC
(In reply to The Linux kernel's regression tracker (Thorsten Leemhuis) from comment #10)
> Bart, was there any progress? From here it looks like there wasn't any to
> get this regression fixed, but maybe I just didn't look at the right place.

The past two weeks I was on vacation. I plan to continue working on this bug soon.
Comment 12 Bart Van Assche 2022-06-06 00:50:50 UTC
Can anyone help with testing this kernel branch: https://github.com/bvanassche/linux/tree/sd-resume ? The following patches exist on that kernel branch on top of kernel v5.18:

      block: Export blk_status_to_str()
      scsi: core: Move the definition of SCSI_QUEUE_DELAY
      scsi: core: Retry after a delay if the device is becoming ready
      scsi: sd: Rework asynchronous resume support
      scsi: sd: Add asynchronous resume debugging code
Comment 13 ericspero 2022-06-06 16:48:08 UTC
Created attachment 301109 [details]
dmesg after suspend-resume cycle using patched kernel
Comment 14 ericspero 2022-06-06 16:52:13 UTC
Thanks for sharing the branch. I've just done a few tests which I think I've done correctly, but I'll provide the steps I followed just in case.

1. git clone https://github.com/bvanassche/linux.git into /usr/src/linux
2. git checkout sd-resume
3. make, etc. etc.

To test I called `loginctl suspend`, and then hit enter to resume. I did this both in and out of a graphical environment.

After the first suspend -> resume cycle the system comes back without any 5-6 second delay. Here is a snippet of a dmesg output when this is the case (also attached in previous message):

https://dpaste.org/v5tFk/raw

However, when I `loginctl suspend` again, my screen goes black and the system does not respond to any keyboard inputs that normally cause it to resume, but the system seems to stay powered on (all fans still running). It seems to behave as if it is in S2 sleep, except it can't be resumed.

I will try to fetch a log snippet of when it is in this un-resumeable state.
Comment 15 ericspero 2022-06-06 18:19:18 UTC
These are the only log entries I see after I attempt to `loginctl suspend` a second time during a single boot:

Jun 6 13:54:36 gentoo sudo[2098]: pam_unix(sudo:session): session closed for user root
Jun 6 13:54:43 gentoo polkitd[1946]: Registered Authentication Agent for unix-process:2105:14523 (system bus name :1.7 [/usr/bin/pkttyagent --notify-fd 6 --fallback], object path /org/freedesktop/PolicyKit1/AuthenticationAgent, locale en_CA.utf8)
Comment 16 Bart Van Assche 2022-06-07 02:38:27 UTC
Thank you for having tested the sd-resume branch and also for having shared these results. I will see whether I can reproduce the reported behavior in a VM.
Comment 17 Bart Van Assche 2022-06-07 21:47:01 UTC
Please retest with the new version of the sd-resume branch. The deadlock triggered by the previous version has been fixed as follows:

diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
index 836e13dae4d5..88f2d4737b47 100644
--- a/drivers/scsi/sd.c
+++ b/drivers/scsi/sd.c
@@ -3576,9 +3576,11 @@ static void sd_start_done(struct request *req, blk_status_t status)
        }
 
 start_failed:
-       mutex_lock(&sdev->state_mutex);
+       /*
+        * This function may be called from interrupt context so it is not
+        * allowed here to call mutex_lock(&sdev->state_mutex).
+        */
        scsi_device_set_state(sdev, SDEV_OFFLINE);
-       mutex_unlock(&sdev->state_mutex);
 }
Comment 18 ericspero 2022-06-07 23:40:32 UTC
I am unfortunately still seeing the same 'deadlock' after updating to the latest on the sd-resume branch. 

I tried it twice, the second time I wiped the git repo and started a new pull from scratch just to make sure I wasn't somehow rebuilding the old sd-resume.

And again it looks like nothing is logged when I do the second suspend.
Comment 19 Bart Van Assche 2022-06-07 23:50:39 UTC
How about comparing git commit IDs to make sure that we are testing the same kernel branch? The topmost commit of the sd-resume branch on my development system is as follows: 1d48600df88f ("scsi: sd: Add asynchronous resume debugging code"). If we are using the same kernel branch, please enable kernel debugging in the kernel configuration file by running the attached enable-kernel-debugging-options script.
Comment 20 Bart Van Assche 2022-06-07 23:51:30 UTC
Created attachment 301120 [details]
Script that enables kernel debugging options
Comment 21 ericspero 2022-06-08 00:15:34 UTC
git log shows the same commit:

commit 1d48600df88f2c14b68e4884ceddc2f2d3e5bb4f (HEAD -> sd-resume, origin/sd-resume)
Author: Bart Van Assche <bvanassche@acm.org>
Date:   Sun May 8 16:31:52 2022 -0700

    scsi: sd: Add asynchronous resume debugging code

    Signed-off-by: Bart Van Assche <bvanassche@acm.org>

I have run the attached script in the kernel source directory, which makes some changes to the .config file. It looks like some of the options in "enable_y_options" are not being set to y, which I assume is because that option is absent from the .config file. Just thought I would mention. 

I will compile this kernel and we will see how it goes.
Comment 22 ericspero 2022-06-08 00:43:50 UTC
It looks like running 'make oldconfig' after running the script introduced most of the necessary options to .config, and running the script again set them '=y'. Some, like CONFIG_PM_TRACE, were not added, and are still '=n'.

When I make and install this kernel it has a '-dbg+' suffix. 

Strangely, when I try to resume from 'loginctl suspend' when running this kernel, the system either reboots or does a cold boot. The first thing I see is my BIOS splash. Sorry, I wish I better able to understand what is going on.
Comment 23 Bart Van Assche 2022-06-08 01:53:34 UTC
The reboot or cold boot may indicate that a kernel crash happened in interrupt context. How about enabling netconsole to send the kernel messages over the network and capturing these messages on a second system with Wireshark?

A possible alternative is to apply the patches at the top of the sd-resume branch with git cherry-pick on top of a kernel version that is known to work on your setup, e.g. v5.16.20. I can prepare a branch that has the result of this process if that would help.
Comment 24 ericspero 2022-06-08 02:52:44 UTC
Thanks for the suggestions. I've not done either of these things before but both seem doable. It's getting late here so it will have to wait until tomorrow. 

I will try netconsole first, and then the cherry pick method. My usual kernel is 5.18.2, so it shouldn't be an issue with 5.18. If you think this means the cherry pick option won't be helpful then please let me know.
Comment 25 ericspero 2022-06-16 17:02:07 UTC
Sorry for the delay - I got busy with other things, and it took me a while to figure out how to set up netconsole.

I made sure I was up to date with the latest on your github repo. The kernel version appears to be 5.19 now. When attempting to resume from suspend, it still boots up cold.

As soon as I got netconsole to print messages to the other machine I suspended the sender (with the problematic kernel). Here are the logs it spits out. No further messages were emitted after attempting to resume from suspend.

https://dpaste.org/Lg1jc

Is it maybe time to try the other method of applying your patches on top of a known working kernel? I mentioned last time that my day-to-day kernel is pretty up-to-date: 5.18.2 (I am a couple patch releases behind).
Comment 26 jason600 2022-06-28 21:17:18 UTC
I have been experiencing the same problem since 5.16. I have a hd and ssd (boot from the ssd). Since 5.16 resume from suspend to ram is taking 6-8 seconds (with 5.15 and before, it was less then 2 seconds. 

I narrowed it down to the hd waiting for spinup to finish suspend. Hdparm had not effect, also unmounting the drive had no effect. The only was to fix it was to remove the hd (mounted on sda) with 

# echo 1 > /sys/block/sda/device/delete

Then it resumed in 2 seconds again, but I couldn’t access the hd again until reboot.

I installed the patches from https://github.com/bvanassche/linux/tree/sd-resume on Opensuse Tumbleweed kernel 5.18.6.

I can confirm the patches work, resume from suspend is now back to under 2 seconds, with no obvious problems.

Thank you very much.
Comment 27 Bart Van Assche 2022-06-28 22:30:05 UTC
Thanks for testing! The patches from the sd-resume branch have been posted on the linux-scsi mailing list. See also https://lore.kernel.org/linux-scsi/20220628222131.14780-1-bvanassche@acm.org/T/#t
Comment 28 Nils 2022-07-15 18:44:27 UTC
Possibly related:

https://bugzilla.kernel.org/show_bug.cgi?id=216087

"Only Toshiba MG08/MG09 SATA hard disks disappear after resume from S3 suspend (reproducible) - AMD Ryzen 5 PRO 4650G"

on an X570 board (0x7901).
Comment 29 Nils 2022-08-15 11:02:50 UTC
(In reply to Bart Van Assche from comment #27)
> Thanks for testing! The patches from the sd-resume branch have been posted
> on the linux-scsi mailing list. See also
> https://lore.kernel.org/linux-scsi/20220628222131.14780-1-bvanassche@acm.org/
> T/#t

When will the changes be merged into mainline please?
Comment 30 Bart Van Assche 2022-08-15 13:36:06 UTC
I think that commit 88f1669019bd ("scsi: sd: Rework asynchronous resume support") got merged in Linus' tree (kernel v6.0-rc1) about two days ago.
Comment 31 gzhqyz 2022-08-16 11:08:54 UTC
Hi, on my laptop (500GB SATA HDD, Disk model: WDC WD5000LPVX-2), commit 88f1669019bd cause system hang after resume from suspend and reverting the commit fixes my problem.
Comment 32 Damien Le Moal 2022-08-16 15:44:59 UTC
(In reply to gzhqyz from comment #31)
> Hi, on my laptop (500GB SATA HDD, Disk model: WDC WD5000LPVX-2), commit
> 88f1669019bd cause system hang after resume from suspend and reverting the
> commit fixes my problem.

Can you try kernel 6.0-rc1 ?
Comment 33 Nils 2022-08-16 16:10:26 UTC
(In reply to Bart Van Assche from comment #30)
> I think that commit 88f1669019bd ("scsi: sd: Rework asynchronous resume
> support") got merged in Linus' tree (kernel v6.0-rc1) about two days ago.

Great, thanks.  Looking forward to testing it once my distributor ships it.
Comment 34 Damien Le Moal 2022-08-16 16:14:23 UTC
(In reply to Nils from comment #33)
> (In reply to Bart Van Assche from comment #30)
> > I think that commit 88f1669019bd ("scsi: sd: Rework asynchronous resume
> > support") got merged in Linus' tree (kernel v6.0-rc1) about two days ago.
> 
> Great, thanks.  Looking forward to testing it once my distributor ships it.

The patch has a fixes tag, so it will get backported to the latest 5.19 stable too.
Comment 35 gzhqyz 2022-08-16 17:11:55 UTC
(In reply to Damien Le Moal from comment #32)
> (In reply to gzhqyz from comment #31)
> > Hi, on my laptop (500GB SATA HDD, Disk model: WDC WD5000LPVX-2), commit
> > 88f1669019bd cause system hang after resume from suspend and reverting the
> > commit fixes my problem.
> 
> Can you try kernel 6.0-rc1 ?

Hi, the issue I reported is found in 6.0-rc1. Here is the kernel I am using (without commit 88f1669019bd):

Linux archlinux 6.0.0-rc1-2-mainline #1 SMP Tue, 16 Aug 2022 03:18:24 +0000 x86_64 GNU/Linux
Comment 36 jason600 2022-08-25 20:01:41 UTC
(In reply to Bart Van Assche from comment #27)
> Thanks for testing! The patches from the sd-resume branch have been posted
> on the linux-scsi mailing list. See also
> https://lore.kernel.org/linux-scsi/20220628222131.14780-1-bvanassche@acm.org/
> T/#t

Hi Bart, just an update for you. I noticed this had been removed from the 6.0-rc1 for freezing after suspend.

I've been compiling my kernel with this fix on various 5.18 kernels (with opensuse tumbleweed), it has worked fine, no freezing on resume as others have mentioned.

Yesterday, I updated to 5.19.2 kernel, applied the fix, recompiled, and it froze after the first suspend. Rebooted and the same thing happened again. I recompiled the kernel with the fix, just to make sure i didn't mess it up, and the same happened again.

When you originally did this fix, you based it on 5.18, and indeed, it works fine on 5.18 for me. There were a lot of changes to the drivers/scsi/sd.c file for 5.19, presumably it was those changes that made this fix start freezing after suspend.

Perhaps you could check if the other people that experienced freezing were using either 5.19 or 6.0-rc1.

Hopefully, this information might help in finding the cause and re-fixing this issue.

Thanks for all you help
Comment 37 Bart Van Assche 2022-08-25 20:31:02 UTC
On 8/25/22 13:01, bugzilla-daemon@kernel.org wrote:
> https://bugzilla.kernel.org/show_bug.cgi?id=215880
> 
> --- Comment #36 from jason600 (jason600.groome@gmail.com) ---
> (In reply to Bart Van Assche from comment #27)
>> Thanks for testing! The patches from the sd-resume branch have been posted
>> on the linux-scsi mailing list. See also
>>
>> https://lore.kernel.org/linux-scsi/20220628222131.14780-1-bvanassche@acm.org/
>> T/#t
> 
> Hi Bart, just an update for you. I noticed this had been removed from the
> 6.0-rc1 for freezing after suspend.
> 
> I've been compiling my kernel with this fix on various 5.18 kernels (with
> opensuse tumbleweed), it has worked fine, no freezing on resume as others
> have
> mentioned.
> 
> Yesterday, I updated to 5.19.2 kernel, applied the fix, recompiled, and it
> froze after the first suspend. Rebooted and the same thing happened again. I
> recompiled the kernel with the fix, just to make sure i didn't mess it up,
> and
> the same happened again.
> 
> When you originally did this fix, you based it on 5.18, and indeed, it works
> fine on 5.18 for me. There were a lot of changes to the drivers/scsi/sd.c
> file
> for 5.19, presumably it was those changes that made this fix start freezing
> after suspend.
> 
> Perhaps you could check if the other people that experienced freezing were
> using either 5.19 or 6.0-rc1.

Multiple people reported issues with freezes during suspend with kernel 
v6.0-rc1. Please take a look at the following report: 
https://lore.kernel.org/all/dd6844e7-f338-a4e9-2dad-0960e25b2ca1@redhat.com/. 
It shows that if zoned ATA disks are present that blk_mq_freeze_queue() 
may be called from inside ata_scsi_dev_rescan() on the context of a work 
queue. ATA rescanning happens from inside the SCSI error handler. So 
there is potential for a lockup because of the following:
* Execution of the START command being postponed because the SCSI error 
handler is active.
* blk_mq_freeze_queue() waiting for the START command to finish.
* The START completion handler not being executed because it got queued 
on the same work queue as the ATA rescan work.

Unfortunately I do not know enough about the ATA core to proceed. I need 
help from an ATA expert.
Comment 38 damien.lemoal 2022-08-25 22:15:41 UTC
On 8/26/22 05:31, bugzilla-daemon@kernel.org wrote:
> https://bugzilla.kernel.org/show_bug.cgi?id=215880
> 
> --- Comment #37 from Bart Van Assche (bvanassche@acm.org) ---
> On 8/25/22 13:01, bugzilla-daemon@kernel.org wrote:
>> https://bugzilla.kernel.org/show_bug.cgi?id=215880
>>
>> --- Comment #36 from jason600 (jason600.groome@gmail.com) ---
>> (In reply to Bart Van Assche from comment #27)
>>> Thanks for testing! The patches from the sd-resume branch have been posted
>>> on the linux-scsi mailing list. See also
>>>
>>>
>>> https://lore.kernel.org/linux-scsi/20220628222131.14780-1-bvanassche@acm.org/
>>> T/#t
>>
>> Hi Bart, just an update for you. I noticed this had been removed from the
>> 6.0-rc1 for freezing after suspend.
>>
>> I've been compiling my kernel with this fix on various 5.18 kernels (with
>> opensuse tumbleweed), it has worked fine, no freezing on resume as others
>> have
>> mentioned.
>>
>> Yesterday, I updated to 5.19.2 kernel, applied the fix, recompiled, and it
>> froze after the first suspend. Rebooted and the same thing happened again. I
>> recompiled the kernel with the fix, just to make sure i didn't mess it up,
>> and
>> the same happened again.
>>
>> When you originally did this fix, you based it on 5.18, and indeed, it works
>> fine on 5.18 for me. There were a lot of changes to the drivers/scsi/sd.c
>> file
>> for 5.19, presumably it was those changes that made this fix start freezing
>> after suspend.
>>
>> Perhaps you could check if the other people that experienced freezing were
>> using either 5.19 or 6.0-rc1.
> 
> Multiple people reported issues with freezes during suspend with kernel 
> v6.0-rc1. Please take a look at the following report: 
> https://lore.kernel.org/all/dd6844e7-f338-a4e9-2dad-0960e25b2ca1@redhat.com/. 
> It shows that if zoned ATA disks are present that blk_mq_freeze_queue() 
> may be called from inside ata_scsi_dev_rescan() on the context of a work 
> queue. ATA rescanning happens from inside the SCSI error handler. So 
> there is potential for a lockup because of the following:
> * Execution of the START command being postponed because the SCSI error 
> handler is active.
> * blk_mq_freeze_queue() waiting for the START command to finish.
> * The START completion handler not being executed because it got queued 
> on the same work queue as the ATA rescan work.
> 
> Unfortunately I do not know enough about the ATA core to proceed. I need 
> help from an ATA expert.

I will have a look today.
Comment 39 damien.lemoal 2022-08-26 07:00:38 UTC
On 8/26/22 07:15, bugzilla-daemon@kernel.org wrote:
> https://bugzilla.kernel.org/show_bug.cgi?id=215880
> 
> --- Comment #38 from damien.lemoal@opensource.wdc.com ---
> On 8/26/22 05:31, bugzilla-daemon@kernel.org wrote:
>> https://bugzilla.kernel.org/show_bug.cgi?id=215880
>>
>> --- Comment #37 from Bart Van Assche (bvanassche@acm.org) ---
>> On 8/25/22 13:01, bugzilla-daemon@kernel.org wrote:
>>> https://bugzilla.kernel.org/show_bug.cgi?id=215880
>>>
>>> --- Comment #36 from jason600 (jason600.groome@gmail.com) ---
>>> (In reply to Bart Van Assche from comment #27)
>>>> Thanks for testing! The patches from the sd-resume branch have been posted
>>>> on the linux-scsi mailing list. See also
>>>>
>>>>
>>>>
>>>> https://lore.kernel.org/linux-scsi/20220628222131.14780-1-bvanassche@acm.org/
>>>> T/#t
>>>
>>> Hi Bart, just an update for you. I noticed this had been removed from the
>>> 6.0-rc1 for freezing after suspend.
>>>
>>> I've been compiling my kernel with this fix on various 5.18 kernels (with
>>> opensuse tumbleweed), it has worked fine, no freezing on resume as others
>>> have
>>> mentioned.
>>>
>>> Yesterday, I updated to 5.19.2 kernel, applied the fix, recompiled, and it
>>> froze after the first suspend. Rebooted and the same thing happened again.
>>> I
>>> recompiled the kernel with the fix, just to make sure i didn't mess it up,
>>> and
>>> the same happened again.
>>>
>>> When you originally did this fix, you based it on 5.18, and indeed, it
>>> works
>>> fine on 5.18 for me. There were a lot of changes to the drivers/scsi/sd.c
>>> file
>>> for 5.19, presumably it was those changes that made this fix start freezing
>>> after suspend.
>>>
>>> Perhaps you could check if the other people that experienced freezing were
>>> using either 5.19 or 6.0-rc1.
>>
>> Multiple people reported issues with freezes during suspend with kernel 
>> v6.0-rc1. Please take a look at the following report: 
>>
>> https://lore.kernel.org/all/dd6844e7-f338-a4e9-2dad-0960e25b2ca1@redhat.com/. 
>> It shows that if zoned ATA disks are present that blk_mq_freeze_queue() 
>> may be called from inside ata_scsi_dev_rescan() on the context of a work 
>> queue. ATA rescanning happens from inside the SCSI error handler. So 
>> there is potential for a lockup because of the following:
>> * Execution of the START command being postponed because the SCSI error 
>> handler is active.
>> * blk_mq_freeze_queue() waiting for the START command to finish.
>> * The START completion handler not being executed because it got queued 
>> on the same work queue as the ATA rescan work.

Checking the code, the dev pm resume call chain for ATA look like this.

ata_port_resume() -> ata_port_request_pm() -> ata_port_schedule_eh() ->
scsi_error_handler() thread runs -> shost->transportt->eh_strategy_handler
== ata_scsi_error() -> ata_scsi_port_error_handler() ->
ata_eh_handle_port_resume() -> ap->ops->port_resume() ==
ahci_port_resume() for AHCI adapters.

There are no commands issued to the device by this chain, only
registers/port settings being changed. So this should always complete
quickly and in itself not be the reason for the START_STOP command issued
by the sd driver to get stuck.

After calling ata_eh_handle_port_resume(), ata_scsi_port_error_handler()
will trigger a reset through ata_std_error_handler() -> ata_do_eh() and
that will cause the device rescan in EH context. Device rescan will
definitely spin up the device so the START_STOP command that sd_resume()
issues seems rather useless...

As a hack, it would be good to try something like this:

diff --git a/drivers/ata/libata-scsi.c b/drivers/ata/libata-scsi.c
index 29e2f55c6faa..1bc92c04f048 100644
--- a/drivers/ata/libata-scsi.c
+++ b/drivers/ata/libata-scsi.c
@@ -1081,6 +1081,7 @@ int ata_scsi_dev_config(struct scsi_device *sdev,
struct ata_device *dev)
 	} else {
 		sdev->sector_size = ata_id_logical_sector_size(dev->id);
 		sdev->manage_start_stop = 1;
+		sdev->no_start_on_resume = 1;
 	}

 	/*
diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
index 8f79fa6318fe..4c28ca4d038b 100644
--- a/drivers/scsi/sd.c
+++ b/drivers/scsi/sd.c
@@ -3761,7 +3761,7 @@ static int sd_suspend_runtime(struct device *dev)
 static int sd_resume(struct device *dev)
 {
 	struct scsi_disk *sdkp = dev_get_drvdata(dev);
-	int ret;
+	int ret = 0;

 	if (!sdkp)	/* E.g.: runtime resume at the start of sd_probe() */
 		return 0;
@@ -3770,7 +3770,8 @@ static int sd_resume(struct device *dev)
 		return 0;

 	sd_printk(KERN_NOTICE, sdkp, "Starting disk\n");
-	ret = sd_start_stop_device(sdkp, 1);
+	if (!sdkp->device->no_start_on_resume)
+		ret = sd_start_stop_device(sdkp, 1);
 	if (!ret)
 		opal_unlock_from_suspend(sdkp->opal_dev);
 	return ret;
diff --git a/include/scsi/scsi_device.h b/include/scsi/scsi_device.h
index 3113471ca375..92e141536c6c 100644
--- a/include/scsi/scsi_device.h
+++ b/include/scsi/scsi_device.h
@@ -192,6 +192,7 @@ struct scsi_device {
 	unsigned no_start_on_add:1;	/* do not issue start on add */
 	unsigned allow_restart:1; /* issue START_UNIT in error handler */
 	unsigned manage_start_stop:1;	/* Let HLD (sd) manage start/stop */
+	unsigned no_start_on_resume:1; /* Do not issue START_STOP_UNIT on resume */
 	unsigned start_stop_pwr_cond:1;	/* Set power cond. in START_STOP_UNIT */
 	unsigned no_uld_attach:1; /* disable connecting to upper level drivers */
 	unsigned select_no_atn:1;

I am not sure at all this is correct though. This actually may break other
suspend/resume flavors. If we could somehow synchronize scsi pm resume to
run *after* ata pm resume, all problems should go away I think.
Comment 40 jason600 2022-10-06 15:37:22 UTC
I have been using Damien Lemoal's patch for the last couple of days on kernel 5.19.8, it works fine for me, and no obvious issues.

I also tried in on kernel 6.0, again works fine and no obvious issues.

(In reply to damien.lemoal from comment #39)
> I am not sure at all this is correct though. This actually may break other
> suspend/resume flavors. If we could somehow synchronize scsi pm resume to
> run *after* ata pm resume, all problems should go away I think.

With regard to Damiens comments, and the problems with this patch last time it was submitted, I think it would be a good idea to have more people testing this patch before re-submitting it.
Comment 41 Damien Le Moal 2022-10-06 23:48:06 UTC
(In reply to jason600 from comment #40)
> I have been using Damien Lemoal's patch for the last couple of days on
> kernel 5.19.8, it works fine for me, and no obvious issues.
> 
> I also tried in on kernel 6.0, again works fine and no obvious issues.
> 
> (In reply to damien.lemoal from comment #39)
> > I am not sure at all this is correct though. This actually may break other
> > suspend/resume flavors. If we could somehow synchronize scsi pm resume to
> > run *after* ata pm resume, all problems should go away I think.
> 
> With regard to Damiens comments, and the problems with this patch last time
> it was submitted, I think it would be a good idea to have more people
> testing this patch before re-submitting it.

I did not submit that patch on the lists, because I am not convinced this is the right approach. It is a "hack" right now. As mentioned, the problem is that scsi and ata pm resume are running at the same time because the dev/pm code does not seem to know the dependency between the devices (struct dev). Would need to dig further into that first to see if a cleaner solution can be coded. Will try, but busy, so it may take some time. I first need to be able to recreate the issue on my systems. Will start with that :)
Comment 42 Bart Van Assche 2022-10-07 00:10:46 UTC
Has it been considered to use device_link_add() to enforce the order in which devices are resumed?
Comment 43 damien.lemoal 2022-10-07 00:15:49 UTC
On 10/7/22 09:10, bugzilla-daemon@kernel.org wrote:
> https://bugzilla.kernel.org/show_bug.cgi?id=215880
> 
> --- Comment #42 from Bart Van Assche (bvanassche@acm.org) ---
> Has it been considered to use device_link_add() to enforce the order in which
> devices are resumed?
> 

I am not very familiar with that API level and pm code in general. So I
cannot say. Will need to look into it.
Comment 44 Paul Ausbeck 2023-07-08 23:17:54 UTC
I'd like to add that I've also noticed this regression after upgrading from Debian 11/5.10 to Debian 12/6.1

I've copied the resume portion of the dmesg log below. Note that the pci bus activity and related gpu messages don't happen until after my 10 year old 2TB hdd spins up. Restarting of tasks is also deferred until then. I also note that the mouse cursor is also frozen until all those platters spin up..

To my mind, it would be a major improvement/tour de force if linux could defer spinning up the hdd at all until it is needed. Think of all the drive wear and energy savings that would ensue. I use my 2TB hdd for backing up smaller drives and for storing movies. It would be quite nice if the drive were to not spin up at all until accessed.


[167498.654484] ACPI: PM: Waking up from system sleep state S3
[167498.655607] pcieport 0000:00:1c.0: Enabling MPC IRBNCE
[167498.655608] pcieport 0000:00:1c.7: Enabling MPC IRBNCE
[167498.655613] pcieport 0000:00:1c.0: Intel PCH root port ACS workaround enabled
[167498.655614] pcieport 0000:00:1c.7: Intel PCH root port ACS workaround enabled
[167498.657222] parport_pc 00:05: activated
[167498.657638] serial 00:07: activated
[167498.666692] sd 2:0:0:0: [sdb] Starting disk
[167498.666709] sd 3:0:0:0: [sdd] Starting disk
[167498.666720] sd 1:0:0:0: [sdc] Starting disk
[167498.666725] sd 0:0:0:0: [sda] Starting disk
[167498.751660] input input0: event field not found
[167499.041464] ata6: SATA link down (SStatus 0 SControl 300)
[167499.041525] ata1: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
[167499.041556] ata2: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
[167499.041581] ata5: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
[167499.042450] ata2.00: ACPI cmd f5/00:00:00:00:00:00(SECURITY FREEZE LOCK) filtered out
[167499.042455] ata2.00: ACPI cmd b1/c1:00:00:00:00:00(DEVICE CONFIGURATION OVERLAY) filtered out
[167499.042479] ata1.00: ACPI cmd f5/00:00:00:00:00:00(SECURITY FREEZE LOCK) filtered out
[167499.042484] ata1.00: ACPI cmd b1/c1:00:00:00:00:00(DEVICE CONFIGURATION OVERLAY) filtered out
[167499.042605] ata1.00: supports DRM functions and may not be fully accessible
[167499.043225] ata2.00: ACPI cmd f5/00:00:00:00:00:00(SECURITY FREEZE LOCK) filtered out
[167499.043233] ata2.00: ACPI cmd b1/c1:00:00:00:00:00(DEVICE CONFIGURATION OVERLAY) filtered out
[167499.043464] ata2.00: configured for UDMA/133
[167499.053394] ata5.00: ACPI cmd f5/00:00:00:00:00:00(SECURITY FREEZE LOCK) filtered out
[167499.053402] ata5.00: ACPI cmd b1/c1:00:00:00:00:00(DEVICE CONFIGURATION OVERLAY) filtered out
[167499.074315] ata5.00: ACPI cmd f5/00:00:00:00:00:00(SECURITY FREEZE LOCK) filtered out
[167499.074323] ata5.00: ACPI cmd b1/c1:00:00:00:00:00(DEVICE CONFIGURATION OVERLAY) filtered out
[167499.079649] ata5.00: configured for UDMA/100
[167499.130641] ata1.00: NCQ Send/Recv Log not supported
[167499.132638] ata1.00: ACPI cmd f5/00:00:00:00:00:00(SECURITY FREEZE LOCK) filtered out
[167499.132646] ata1.00: ACPI cmd b1/c1:00:00:00:00:00(DEVICE CONFIGURATION OVERLAY) filtered out
[167499.132837] ata1.00: supports DRM functions and may not be fully accessible
[167499.132990] ata1.00: NCQ Send/Recv Log not supported
[167499.134317] ata1.00: configured for UDMA/133
[167499.247035] firewire_core 0000:03:01.0: rediscovered device fw0
[167499.247089] br0: port 1(eth0) entered disabled state
[167501.553034] e1000e 0000:00:19.0 eth0: NIC Link is Up 1000 Mbps Full Duplex, Flow Control: Rx/Tx
[167501.553075] br0: port 1(eth0) entered blocking state
[167501.553078] br0: port 1(eth0) entered forwarding state
[167504.079109] ata4: link is slow to respond, please be patient (ready=0)
[167504.079117] ata3: link is slow to respond, please be patient (ready=0)
[167504.799136] ata4: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
[167504.917319] ata4.00: ACPI cmd f5/00:00:00:00:00:00(SECURITY FREEZE LOCK) filtered out
[167504.917324] ata4.00: ACPI cmd b1/c1:00:00:00:00:00(DEVICE CONFIGURATION OVERLAY) filtered out
[167504.918915] ata4.00: ACPI cmd f5/00:00:00:00:00:00(SECURITY FREEZE LOCK) filtered out
[167504.918922] ata4.00: ACPI cmd b1/c1:00:00:00:00:00(DEVICE CONFIGURATION OVERLAY) filtered out
[167504.919646] ata4.00: configured for UDMA/133
[167508.699207] ata3: COMRESET failed (errno=-16)
[167510.871278] ata3: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
[167511.040204] ata3.00: ACPI cmd f5/00:00:00:00:00:00(SECURITY FREEZE LOCK) filtered out
[167511.040212] ata3.00: ACPI cmd b1/c1:00:00:00:00:00(DEVICE CONFIGURATION OVERLAY) filtered out
[167511.078770] ata3.00: ACPI cmd f5/00:00:00:00:00:00(SECURITY FREEZE LOCK) filtered out
[167511.078778] ata3.00: ACPI cmd b1/c1:00:00:00:00:00(DEVICE CONFIGURATION OVERLAY) filtered out
[167511.082920] ata3.00: configured for UDMA/133
[167511.107913] pci_bus 0000:03: Allocating resources
[167511.107932] pci 0000:02:00.0: bridge window [io  0x1000-0x0fff] to [bus 03] add_size 1000
[167511.107939] pci 0000:02:00.0: bridge window [mem 0x00100000-0x000fffff 64bit pref] to [bus 03] add_size 200000 add_align 100000
[167511.107949] pci 0000:02:00.0: BAR 15: no space for [mem size 0x00200000 64bit pref]
[167511.107952] pci 0000:02:00.0: BAR 15: failed to assign [mem size 0x00200000 64bit pref]
[167511.107955] pci 0000:02:00.0: BAR 13: no space for [io  size 0x1000]
[167511.107958] pci 0000:02:00.0: BAR 13: failed to assign [io  size 0x1000]
[167511.107962] pci 0000:02:00.0: BAR 15: no space for [mem size 0x00200000 64bit pref]
[167511.107965] pci 0000:02:00.0: BAR 15: failed to assign [mem size 0x00200000 64bit pref]
[167511.107967] pci 0000:02:00.0: BAR 13: no space for [io  size 0x1000]
[167511.107970] pci 0000:02:00.0: BAR 13: failed to assign [io  size 0x1000]
[167511.108011] OOM killer enabled.
[167511.108014] Restarting tasks ... done.
[167511.111619] random: crng reseeded on system resumption
[167511.111760] PM: suspend exit
Comment 45 The Linux kernel's regression tracker (Thorsten Leemhuis) 2023-07-09 07:09:00 UTC
(In reply to Paul Ausbeck from comment #44)
> I'd like to add that I've also noticed this regression after upgrading from
> Debian 11/5.10 to Debian 12/6.1

Can't remember for sure (Bart and Damien might), but I think this issue should have been fixed by this revert that went into 6.0:

https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=785538bfdd6

So if you see similar symptoms it's likely a different issue. You thus might want to open a new report, as things otherwise get messy; it's also in your own interest, see: https://linux-regtracking.leemhuis.info/post/frequent-reasons-why-linux-kernel-bug-reports-are-ignored/ especially https://linux-regtracking.leemhuis.info/post/frequent-reasons-why-linux-kernel-bug-reports-are-ignored/#you-reported-your-issue-in-a-reply-to-an-earlier-report
Comment 46 Damien Le Moal 2023-07-09 23:18:37 UTC
(In reply to Paul Ausbeck from comment #44)
> I'd like to add that I've also noticed this regression after upgrading from
> Debian 11/5.10 to Debian 12/6.1

> I've copied the resume portion of the dmesg log below. Note that the pci bus
> activity and related gpu messages don't happen until after my 10 year old
> 2TB hdd spins up. Restarting of tasks is also deferred until then. I also
> note that the mouse cursor is also frozen until all those platters spin up..

The GPU related messages are unrelated to ATA subsystem. Obviously, libata does not attempt to use the GPU for anything :) Restarting of tasks being differed until the hdd spins up is also normal. hdd spinning up and then starting responding to commands is part of the normal resume process and tasks will be resumed only after the kernel has finished resuming all devices. Resuming of devices is asynchronous and devices that are unrelated can be resumed simultaneously. So I am not sure why it seems that your GPU and mouse resuming is delaying only after the HDD spin up. Note that this change to asynchronous resume did cause some issues with ata/scsi (see below).

> To my mind, it would be a major improvement/tour de force if linux could
> defer spinning up the hdd at all until it is needed. Think of all the drive
> wear and energy savings that would ensue. I use my 2TB hdd for backing up
> smaller drives and for storing movies. It would be quite nice if the drive
> were to not spin up at all until accessed.

Having the drive sleep when unused can be achieved using power state timers setting on the drive itself. Resume is different: the kernel must ensure that the device that was there before suspend is still here, and in the case of storage, that the device is still the same. This is necessary to avoid issues with file system (that will resume later) and also with tasks that were doing IOs when the system was suspended. If file systems or tasks were resumed without the storage device ready and checked first, all sorts of bad things can happen.

Note that commit 6aa0365a3c85 ("ata: libata-scsi: Avoid deadlock on rescan after device resume") was added to kernel 6.4 (at rc7) to fix another issue with resume: a hard hang (not a delay). This patch synchronizes scsi and ata resume by having scsi resume wait for ata resume to complete, which means waiting for the device re-scan to complete. And that in turn means that the HDD must complete spin up first.
Comment 47 Paul Ausbeck 2023-07-10 00:18:34 UTC
The PCI/GPU messages are only connected to the ATA messages in the sense that the PCI/GPU activity is unexpectedly deferred until after ATA init is complete. Since resuming of devices is supposedly asynchronous, one would have thought that PCI/GPU init activity would be completed long before hdd spin up is complete.

I realize that deferring drive spin up until needed would not be easy, that's why I called such an idea a tour de force. In the past it may have been common for an ata device to not resume reliably, but today even the 10 year old ata devices on my Ivy Bridge machine resume just as reliably as they normally operate, which is quite reliably. If it weren't for power outages, I'd have years of continuous uptime. It's just a thought, but it may be time to revisit how disks, especially spinning disks, are resumed. It seems to me that the chance of a hdd failure during resume is not any greater than at any other time. It should be theoretically possible to queue up resume commands and execute them only when needed to service actual demand. The latency would have to eventually be absorbed, but if the api's are designed and implemented properly that would just happen when needed.
Comment 48 Damien Le Moal 2023-07-10 00:47:13 UTC
(In reply to Paul Ausbeck from comment #47)
> The PCI/GPU messages are only connected to the ATA messages in the sense
> that the PCI/GPU activity is unexpectedly deferred until after ATA init is
> complete. Since resuming of devices is supposedly asynchronous, one would
> have thought that PCI/GPU init activity would be completed long before hdd
> spin up is complete.

Maybe try enabling PM debug messages ? That could tell us what is going on.

> I realize that deferring drive spin up until needed would not be easy,
> that's why I called such an idea a tour de force. In the past it may have
> been common for an ata device to not resume reliably, but today even the 10
> year old ata devices on my Ivy Bridge machine resume just as reliably as
> they normally operate, which is quite reliably. If it weren't for power
> outages, I'd have years of continuous uptime. It's just a thought, but it
> may be time to revisit how disks, especially spinning disks, are resumed. It
> seems to me that the chance of a hdd failure during resume is not any
> greater than at any other time. It should be theoretically possible to queue
> up resume commands and execute them only when needed to service actual
> demand. The latency would have to eventually be absorbed, but if the api's
> are designed and implemented properly that would just happen when needed.

Spinning up the disk on resume is needed as many drives require the disk to be spun up to reply to commands, and that even applies to commands that do not access the media (e.g. IDENTIFY, READ LOG, etc). The reason is that many drives save their meta-data on a reserved area of the media (the spinning disk).

So even if the kernel were to explicitly defer spinning up the disk, the disk itself would in many instances automatically spinup, and that would manifest itself with a long delay to the reply to the initial IDENTIFY command issued during the rescan initiated by resume. And delaying that rescan is also not desired as that would force the PM code to report "OK, this device is resumed and ready" while it is in fact not ready at all to receive commands. So the "tour de force" is really not desired at all in practice.

You likely can avoid the hdd spinup on resume by soft-removing the device before suspend. Then we could add a libata option to ignore some ports on resume so that they are not probed. But then, reusing the drive after resume will require a manual rescan for the disk to show up again. And all of this also assumes that there is no file system mounted on the disk, of course.

Given that most laptop these days do not use HDDs anymore, I do not think all of this is worth the effort.
Comment 49 Paul Ausbeck 2023-07-10 01:02:57 UTC
My reflex memory says that when resuming Debian 11/5.10 the mouse cursor was live immediately after the LCD monitor came on. Current testing says that when resuming Debian 12/6.1 the mouse cursor is frozen until at least five seconds after the monitor comes on.

In order to verify that my reflex memory was correct, I booted Debian 12 but with the 5.10 kernel. Sure enough, upon resume the mouse cursor is immediately alive once the LCD monitor comes on.

I've attached the end portion of the Debian 12/5.10 resume dmesg log below. Note that the restarting of tasks and the PCI/GPU activity that happened at the end of resume for the Debian 12/6.1 combination happens before ATA bring up with the Debian 12/5.10 combination.

This is starting to look like a regression to me.

[   45.516511] OOM killer enabled.
[   45.516512] Restarting tasks ... 
[   45.516726] pci_bus 0000:03: Allocating resources
[   45.516743] pci 0000:02:00.0: bridge window [io  0x1000-0x0fff] to [bus 03] add_size 1000
[   45.516748] pci 0000:02:00.0: bridge window [mem 0x00100000-0x000fffff 64bit pref] to [bus 03] add_size 200000 add_align 100000
[   45.516756] pci 0000:02:00.0: BAR 15: no space for [mem size 0x00200000 64bit pref]
[   45.516759] pci 0000:02:00.0: BAR 15: failed to assign [mem size 0x00200000 64bit pref]
[   45.516762] pci 0000:02:00.0: BAR 13: no space for [io  size 0x1000]
[   45.516764] pci 0000:02:00.0: BAR 13: failed to assign [io  size 0x1000]
[   45.516768] pci 0000:02:00.0: BAR 15: no space for [mem size 0x00200000 64bit pref]
[   45.516769] pci 0000:02:00.0: BAR 15: failed to assign [mem size 0x00200000 64bit pref]
[   45.516771] pci 0000:02:00.0: BAR 13: no space for [io  size 0x1000]
[   45.516772] pci 0000:02:00.0: BAR 13: failed to assign [io  size 0x1000]
[   45.518520] done.
[   45.518693] video LNXVIDEO:00: Restoring backlight state
[   45.518696] PM: suspend exit
[   45.641461] br0: port 1(eth0) entered disabled state
[   45.734342] ata6: SATA link down (SStatus 0 SControl 300)
[   45.734372] ata1: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
[   45.734405] ata2: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
[   45.734435] ata5: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
[   45.735377] ata1.00: ACPI cmd ef/10:06:00:00:00:00 (SET FEATURES) succeeded
[   45.735382] ata1.00: ACPI cmd f5/00:00:00:00:00:00 (SECURITY FREEZE LOCK) filtered out
[   45.735384] ata1.00: ACPI cmd b1/c1:00:00:00:00:00 (DEVICE CONFIGURATION OVERLAY) filtered out
[   45.735433] ata2.00: ACPI cmd ef/10:06:00:00:00:00 (SET FEATURES) succeeded
[   45.735437] ata2.00: ACPI cmd f5/00:00:00:00:00:00 (SECURITY FREEZE LOCK) filtered out
[   45.735440] ata2.00: ACPI cmd b1/c1:00:00:00:00:00 (DEVICE CONFIGURATION OVERLAY) filtered out
[   45.735499] ata1.00: supports DRM functions and may not be fully accessible
[   45.736244] ata2.00: ACPI cmd ef/10:06:00:00:00:00 (SET FEATURES) succeeded
[   45.736249] ata2.00: ACPI cmd f5/00:00:00:00:00:00 (SECURITY FREEZE LOCK) filtered out
[   45.736251] ata2.00: ACPI cmd b1/c1:00:00:00:00:00 (DEVICE CONFIGURATION OVERLAY) filtered out
[   45.736531] ata2.00: configured for UDMA/133
[   45.746293] ata5.00: ACPI cmd ef/10:06:00:00:00:00 (SET FEATURES) succeeded
[   45.746298] ata5.00: ACPI cmd f5/00:00:00:00:00:00 (SECURITY FREEZE LOCK) filtered out
[   45.746301] ata5.00: ACPI cmd b1/c1:00:00:00:00:00 (DEVICE CONFIGURATION OVERLAY) filtered out
[   45.767136] ata5.00: ACPI cmd ef/10:06:00:00:00:00 (SET FEATURES) succeeded
[   45.767141] ata5.00: ACPI cmd f5/00:00:00:00:00:00 (SECURITY FREEZE LOCK) filtered out
[   45.767144] ata5.00: ACPI cmd b1/c1:00:00:00:00:00 (DEVICE CONFIGURATION OVERLAY) filtered out
[   45.772405] ata5.00: configured for UDMA/100
[   45.819126] ata1.00: NCQ Send/Recv Log not supported
[   45.820557] ata1.00: ACPI cmd ef/10:06:00:00:00:00 (SET FEATURES) succeeded
[   45.820571] ata1.00: ACPI cmd f5/00:00:00:00:00:00 (SECURITY FREEZE LOCK) filtered out
[   45.820573] ata1.00: ACPI cmd b1/c1:00:00:00:00:00 (DEVICE CONFIGURATION OVERLAY) filtered out
[   45.820687] ata1.00: supports DRM functions and may not be fully accessible
[   45.820786] ata1.00: NCQ Send/Recv Log not supported
[   45.821701] ata1.00: configured for UDMA/133
[   45.947955] firewire_core 0000:03:01.0: rediscovered device fw0
[   48.373868] e1000e 0000:00:19.0 eth0: NIC Link is Up 1000 Mbps Full Duplex, Flow Control: Rx/Tx
[   48.373927] br0: port 1(eth0) entered blocking state

[   48.373930] br0: port 1(eth0) entered forwarding state
[   50.775921] ata3: link is slow to respond, please be patient (ready=0)
[   50.783922] ata4: link is slow to respond, please be patient (ready=0)
[   51.143938] ata4: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
[   51.262481] ata4.00: ACPI cmd ef/10:06:00:00:00:00 (SET FEATURES) succeeded
[   51.262485] ata4.00: ACPI cmd f5/00:00:00:00:00:00 (SECURITY FREEZE LOCK) filtered out
[   51.262488] ata4.00: ACPI cmd b1/c1:00:00:00:00:00 (DEVICE CONFIGURATION OVERLAY) filtered out
[   51.264139] ata4.00: ACPI cmd ef/10:06:00:00:00:00 (SET FEATURES) succeeded
[   51.264144] ata4.00: ACPI cmd f5/00:00:00:00:00:00 (SECURITY FREEZE LOCK) filtered out
[   51.264147] ata4.00: ACPI cmd b1/c1:00:00:00:00:00 (DEVICE CONFIGURATION OVERLAY) filtered out
[   51.264802] ata4.00: configured for UDMA/133
[   55.403919] ata3: COMRESET failed (errno=-16)
[   56.799938] ata3: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
[   56.981696] ata3.00: ACPI cmd ef/10:06:00:00:00:00 (SET FEATURES) succeeded
[   56.981702] ata3.00: ACPI cmd f5/00:00:00:00:00:00 (SECURITY FREEZE LOCK) filtered out
[   56.981705] ata3.00: ACPI cmd b1/c1:00:00:00:00:00 (DEVICE CONFIGURATION OVERLAY) filtered out
[   57.008271] ata3.00: ACPI cmd ef/10:06:00:00:00:00 (SET FEATURES) succeeded
[   57.008277] ata3.00: ACPI cmd f5/00:00:00:00:00:00 (SECURITY FREEZE LOCK) filtered out
[   57.008280] ata3.00: ACPI cmd b1/c1:00:00:00:00:00 (DEVICE CONFIGURATION OVERLAY) filtered out
[   57.012179] ata3.00: configured for UDMA/133
Comment 50 Damien Le Moal 2023-07-10 01:07:33 UTC
Hmmm... PCI device scan is supposed to be asynchronous. So having to wait for the hdd to spinup should not impact the GPU/mouse (which I assume is USB ?). The messages "pci 0000:02:00.0: BAR 15: failed to assign" are weird... What is the device on port 0000:02:00.0 ? (see lspci).

Might want to raise this issue on linux-pci list.
Comment 51 Paul Ausbeck 2023-07-10 02:51:12 UTC
The PCIe BAR (Base address register) is usually GPU related as lspci verifies. BAR mapping failures are pretty routine on my older machines. My assessment is that they aren't relevant except to show that at least some GPU activity is deferred until after ATA bring up.

I think more important is that Restarting tasks ...  and even PM: suspend exit are happening before ATA bring up on 5.10 but after ATA bring up on 6.1.

This seems like a pretty serious change to me.
Comment 52 Damien Le Moal 2023-07-10 03:48:33 UTC
(In reply to Paul Ausbeck from comment #51)
> The PCIe BAR (Base address register) is usually GPU related as lspci
> verifies. BAR mapping failures are pretty routine on my older machines. My
> assessment is that they aren't relevant except to show that at least some
> GPU activity is deferred until after ATA bring up.
> 
> I think more important is that Restarting tasks ...  and even PM: suspend
> exit are happening before ATA bring up on 5.10 but after ATA bring up on 6.1.
> 
> This seems like a pretty serious change to me.

The "serious change" is a bug fix. "PM: suspend exit" sets the system out of suspend mode but before commit 6aa0365a3c85, libata was "lying" about the state of its devices. They were not really resumed and that was causing issues: some users reported inability to use storage because of a deadlock between ata resume and scsi resume. So we are not going back on that. This commit fixes a serious issue and makes libata compliant with PM semantic.

I can try to revisit that fix to see if I can shorten the time to signaling a "resumed" state for ata devices, but I doubt much is possible.

In any case, for now, having resume wait for the hdd spinup is safer than the alternative. Your other issue about the GPU and mouse is I think different and not related to libata resume. Not sure exactly what is happening. As I said, I think it is worth signaling this to the pci, drm and or pm dev lists.
Comment 53 Paul Ausbeck 2023-07-10 17:00:44 UTC
I sense that we are in disagreement. In my way of thinking disagreement is overcome through logical argument. So here goes.

I argue that you can't call a bug fixed if the fix introduces a serious, entirely separate, regression. I further argue that at resume the mouse pointer should be interactive at the same time that the display becomes active. I further argue that if the mouse pointer is not interactive at any time that the display is active, that is a serious UX bug. I further argue that before the recent kernel ATA changes this particular bug did not exist and therefore it is a regression. I further argue that though the regression is UX related it cannot be patched over outside of the kernel. I further argue that this regression likely affects any personal machine that contains a spinning disk and that this class of machines is substantial in size. I further argue that your reported deadlock between ata resume and scsi resume likely affects a far smaller class of machines. I further argue that though for a single machine deadlock is a more important problem than temporary lack of mouse pointer interactivity, for the kernel as a whole, problem importance is proportional to the multiplication of the individual seriousness and the size of the set of affected machines. I further argue that your assertion that the non-interactive mouse pointer is not related to the libata resume changes is incorrect. I further argue that the two dmesg fragments that I have posted to this thread explain the situation quite clearly.

My argument contains 10 assertions. If you disagree with any of them, please give details. In particular I don't have a clear idea of the size of the set of machines that might experience the "fixed" ata/scsi resume deadlock.
Comment 54 Damien Le Moal 2023-07-10 22:48:47 UTC
(In reply to Paul Ausbeck from comment #53)
> I sense that we are in disagreement. In my way of thinking disagreement is
> overcome through logical argument. So here goes.
> 
> I argue that you can't call a bug fixed if the fix introduces a serious,
> entirely separate, regression. I further argue that at resume the mouse
> pointer should be interactive at the same time that the display becomes
> active. I further argue that if the mouse pointer is not interactive at any
> time that the display is active, that is a serious UX bug. I further argue
> that before the recent kernel ATA changes this particular bug did not exist
> and therefore it is a regression. I further argue that though the regression
> is UX related it cannot be patched over outside of the kernel. I further
> argue that this regression likely affects any personal machine that contains
> a spinning disk and that this class of machines is substantial in size. I
> further argue that your reported deadlock between ata resume and scsi resume
> likely affects a far smaller class of machines. I further argue that though
> for a single machine deadlock is a more important problem than temporary
> lack of mouse pointer interactivity, for the kernel as a whole, problem
> importance is proportional to the multiplication of the individual
> seriousness and the size of the set of affected machines. I further argue
> that your assertion that the non-interactive mouse pointer is not related to
> the libata resume changes is incorrect. I further argue that the two dmesg
> fragments that I have posted to this thread explain the situation quite
> clearly.
> 
> My argument contains 10 assertions. If you disagree with any of them, please
> give details. In particular I don't have a clear idea of the size of the set
> of machines that might experience the "fixed" ata/scsi resume deadlock.

Repeating the same things again and again without new technical information we (developers) can act on is not very productive. So let's dig into this issue to verify *if* this is really an issue introduced by libata, which as I said, I doubt it is. But let's make sure. So could you please try the following:
1) Build the latest 6.5-rc1 kernel from git and try with that kernel to see if the issue is still there.
2) If the issue is not resolved, with that same kernel, please try to revert commit 6aa0365a3c85.
3) Regardless of the result for (2) please do a bisect to see if the commit responsible for the issue is indeed 6aa0365a3c85. You can find information on how to do that in the Linux documentation at Documentation/admin-guide/bug-bisect.rst. You may also want to read  Documentation/admin-guide/quickly-build-trimmed-linux.rst
4) Please provide the output of lspci and a full dmesg for your system. dmesg output after boot and after resume would be helpful.

With the above information, we will likely have a better idea of what is going on with your machine.

Also please know that the number of machines affected by an issue only determines the priority/urgency of a fix. Even if a single user is affected, issues will still be addressed as long as enough information is provided and the person affected helps with testing (as I cannot recreate that issue with the hardware I have).

Note that I am on vacation this week, so I am only checking emails and replying as a courtesy. I will not do anything until Tuesday next week.
Comment 55 Paul Ausbeck 2023-07-11 21:39:22 UTC
lspci output is relatively small and easy:

00:00.0 Host bridge: Intel Corporation Xeon E3-1200 v2/3rd Gen Core processor DRAM Controller (rev 09)
00:02.0 VGA compatible controller: Intel Corporation IvyBridge GT2 [HD Graphics 4000] (rev 09)
00:14.0 USB controller: Intel Corporation 7 Series/C210 Series Chipset Family USB xHCI Host Controller (rev 04)
00:16.0 Communication controller: Intel Corporation 7 Series/C216 Chipset Family MEI Controller #1 (rev 04)
00:19.0 Ethernet controller: Intel Corporation 82579V Gigabit Network Connection (rev 04)
00:1a.0 USB controller: Intel Corporation 7 Series/C216 Chipset Family USB Enhanced Host Controller #2 (rev 04)
00:1b.0 Audio device: Intel Corporation 7 Series/C216 Chipset Family High Definition Audio Controller (rev 04)
00:1c.0 PCI bridge: Intel Corporation 7 Series/C216 Chipset Family PCI Express Root Port 1 (rev c4)
00:1c.7 PCI bridge: Intel Corporation 7 Series/C210 Series Chipset Family PCI Express Root Port 8 (rev c4)
00:1d.0 USB controller: Intel Corporation 7 Series/C216 Chipset Family USB Enhanced Host Controller #1 (rev 04)
00:1f.0 ISA bridge: Intel Corporation H77 Express Chipset LPC Controller (rev 04)
00:1f.2 SATA controller: Intel Corporation 7 Series/C210 Series Chipset Family 6-port SATA Controller [AHCI mode] (rev 04)
00:1f.3 SMBus: Intel Corporation 7 Series/C216 Chipset Family SMBus Controller (rev 04)
01:00.0 Multimedia video controller: Conexant Systems, Inc. CX23887/8 PCIe Broadcast Audio and Video Decoder with 3D Comb (rev 0f)
02:00.0 PCI bridge: Integrated Technology Express, Inc. IT8892E PCIe to PCI Bridge (rev 30)
03:01.0 FireWire (IEEE 1394): Texas Instruments TSB43AB23 IEEE-1394a-2000 Controller (PHY/Link)

I'll wait a bit for full dmesg output and/or bisecting. When I initially read the bug report, ericspero seemed to do a pretty good job of bisecting the change.

It seems that we still don't agree that libata is responsible for the newly introduced lack of mouse pointer interactivity following resume. Therefore, I will spend some more time characterizing the problem. I have a relatively new Chuwi laptop with a spare m.2/sata slot. I've ordered an m.2/sata to sata 7 pin adapter so that I can plug a sata hard drive into this machine. As far as I can find, this type of adapter is only available directly from China so it will take a couple of weeks to get them. I'll post the results then. If I can muster to two machines as orthogonal as I can make them both exhibiting the same problem, perhaps we might come together. Or perhaps the Chuwi machine won't exhibit the problem and I'll have to rethink.

One other observation. It seems to me that libata maintainers should have a machine containing a spinning disk at their disposal. I have a bunch of spare machines with hard drives and would be amenable to donating one for this purpose. Also, in case you are interested, the previously described m.2/sata to 7pin sata adapter is available on eBay at:

https://www.ebay.com/itm/394315621714?hash=item5bcf0ae552:g:YGYAAOSwAjRjYcUd&amdata=enc%3AAQAIAAAA8BpM7jdIQ4JS2IAPozAs7uLQETGPFQcybkuhO3nBfjImXm%2BMSA5oCG2CcetiX4fOY579FQd0JR8Pj81mS%2FW8k9MyCtUCg1hGuVhX4aMzkBQrzfXXjZcz%2F%2BldcLdwbLHcyTpN%2BjrSKmKiyGprAr1aZEH27ur%2FTJReKBBziJpJAz8qlG9j6GCGEYsA5R2kLpW0bSX4s8it5oZ1xTHDNhiidRnGvgTi16IvgMCZiLuGnOgTeSJuVA4kQdpsli2lYjmMwuU4i3USWppY5mh54l0AGV3Nf%2Ff4%2FCwhCoEogLghu62kFYpd7h3fofI8mCPTFLjh%2FA%3D%3D%7Ctkp%3ABk9SR7LBpuuoYg
Comment 56 Damien Le Moal 2023-07-11 22:55:08 UTC
(In reply to Paul Ausbeck from comment #55)
> lspci output is relatively small and easy:
> 
> 00:00.0 Host bridge: Intel Corporation Xeon E3-1200 v2/3rd Gen Core
> processor DRAM Controller (rev 09)
> 00:02.0 VGA compatible controller: Intel Corporation IvyBridge GT2 [HD
> Graphics 4000] (rev 09)
> 00:14.0 USB controller: Intel Corporation 7 Series/C210 Series Chipset
> Family USB xHCI Host Controller (rev 04)
> 00:16.0 Communication controller: Intel Corporation 7 Series/C216 Chipset
> Family MEI Controller #1 (rev 04)
> 00:19.0 Ethernet controller: Intel Corporation 82579V Gigabit Network
> Connection (rev 04)
> 00:1a.0 USB controller: Intel Corporation 7 Series/C216 Chipset Family USB
> Enhanced Host Controller #2 (rev 04)
> 00:1b.0 Audio device: Intel Corporation 7 Series/C216 Chipset Family High
> Definition Audio Controller (rev 04)
> 00:1c.0 PCI bridge: Intel Corporation 7 Series/C216 Chipset Family PCI
> Express Root Port 1 (rev c4)
> 00:1c.7 PCI bridge: Intel Corporation 7 Series/C210 Series Chipset Family
> PCI Express Root Port 8 (rev c4)
> 00:1d.0 USB controller: Intel Corporation 7 Series/C216 Chipset Family USB
> Enhanced Host Controller #1 (rev 04)
> 00:1f.0 ISA bridge: Intel Corporation H77 Express Chipset LPC Controller
> (rev 04)
> 00:1f.2 SATA controller: Intel Corporation 7 Series/C210 Series Chipset
> Family 6-port SATA Controller [AHCI mode] (rev 04)

I think I have some servers that have the same chipset. But being servers, suspend/resume is not well supported. Can try suspend to disk at least, but not suspend to RAM.

> 00:1f.3 SMBus: Intel Corporation 7 Series/C216 Chipset Family SMBus
> Controller (rev 04)
> 01:00.0 Multimedia video controller: Conexant Systems, Inc. CX23887/8 PCIe
> Broadcast Audio and Video Decoder with 3D Comb (rev 0f)
> 02:00.0 PCI bridge: Integrated Technology Express, Inc. IT8892E PCIe to PCI
> Bridge (rev 30)
> 03:01.0 FireWire (IEEE 1394): Texas Instruments TSB43AB23 IEEE-1394a-2000
> Controller (PHY/Link)
> 
> I'll wait a bit for full dmesg output and/or bisecting. When I initially
> read the bug report, ericspero seemed to do a pretty good job of bisecting
> the change.

You could try with the revert of 6aa0365a3c85. What I think you are seeing is the X client start (Wayland ?) being delayed by the fact that with 6aa0365a3c85, resume waits for the HDD to be revalidated. Hence the delay for X & mouse to start working.

> It seems that we still don't agree that libata is responsible for the newly
> introduced lack of mouse pointer interactivity following resume. Therefore,

See above. It could be. But permanently reverting 6aa0365a3c85 is not a solution as that commit fixes a real problem that some users reported and that you could also hit yourself by reverting it. As I said, if you confirm that this commit is indeed the cause of the delay, I can try to see how to reduce the delay, if possible. The main issue is that the resume code is a bit of a mess between libata and scsi, and libata resume is mostly done using EH (error handler) context, which when running prevents using the drive for anything. A rework of the resume path may be needed but that is extremely dangerous as that could introduce lots of regressions (because libata is in a sense the accumulation of decades of "magic" code to deal with buggy adapters and drives, and there are a lot out these out there). So extreme caution is needed when touching such code.

> I will spend some more time characterizing the problem. I have a relatively
> new Chuwi laptop with a spare m.2/sata slot. I've ordered an m.2/sata to
> sata 7 pin adapter so that I can plug a sata hard drive into this machine.
> As far as I can find, this type of adapter is only available directly from
> China so it will take a couple of weeks to get them. I'll post the results
> then. If I can muster to two machines as orthogonal as I can make them both
> exhibiting the same problem, perhaps we might come together. Or perhaps the
> Chuwi machine won't exhibit the problem and I'll have to rethink.
> 
> One other observation. It seems to me that libata maintainers should have a
> machine containing a spinning disk at their disposal. I have a bunch of
> spare machines with hard drives and would be amenable to donating one for
> this purpose. Also, in case you are interested, the previously described
> m.2/sata to 7pin sata adapter is available on eBay at:

I am the libata maintainer, and if you check my email address, you will see that it is not hard for me to get SATA HDDs and SSDs. I have 3 racks of machine in my lab with literally hundreds of HDDs and SSDs usable, and also some PCs with CD/DVD, old parallel ATA drives, port multiplier enclosures, etc. Many favors of hardware.

But that does not mean that I can reproduce all issues notified by users. I could never reproduce the issue that commit 6aa0365a3c85 fixes so my tests only confirmed that I was not seeing any regression (and I did not see a bad delay with anything as I do not run any GUI on my machines).

Rather than hardware, which I have plenty, I would be happier with people spending time testing for-next and RC trees to verify that patches do not have unintended side effects. libata is not the sexiest of subsystems and it is hard to get people to review and test patches.
Comment 57 Paul Ausbeck 2023-07-20 21:52:14 UTC
The m.2/SATA to 7pin SATA adapter that I mentioned previously as on order, arrived. Using this adapter I connected a spare Barracuda 4TB drive to one of the m.2 slots on my Chuwi Gemibook Pro laptop. With this setup, the lack of mouse pointer interactivity on resume exhibits exactly the same as on my Intel dh77kc desktop.

The mouse pointer is not interactive until after "PM: suspend exit" is written to the kernel log. Additionally, an active GUI terminal is not interactive until the same time. This is noticeable by typing "sudo pm-suspend" in a terminal. The resulting CR/LF and next command prompt do not appear following resume until the same time that the mouse cursor comes alive. The Barracuda drive connected to the Gemibook isn't as slow to spin up as the relatively old WD 2TB in my desktop so the delay to interactivity is only about 5 seconds instead of 10+.

The interactivity delay is present on both machines when using kernel 6.1x and not present on both machines when using kernel 5.10x. When using kernel 5.10x, the dmesg log on both machines shows "PM suspend exit" happens before the slower ATA drives are "configured for UDMA/133". When using kernel 6.1x, the dmesg log shows "PM suspend exit" occurring after the slower ATA dries are "configured for UDMA/133".

At this point, I am convinced that this is a regression related to changes in libata. I will take this offline now to see what I can do to help recover the previous resume interactivity behavior.