Bug 216413

Summary: [BISECT INCLUDED] scsi/sd Rework asynchronous resume support breaks S2idle and S3 on several systems
Product: IO/Storage Reporter: Todd Brandt (todd.e.brandt)
Component: SCSIAssignee: Bart Van Assche (bvanassche)
Status: RESOLVED CODE_FIX    
Severity: blocking CC: bvanassche, lenb, linux-scsi, martin.petersen
Priority: P1    
Hardware: Intel   
OS: Linux   
Kernel Version: 6.0.0-rc1 Subsystem:
Regression: No Bisected commit-id:
Bug Depends on:    
Bug Blocks: 178231    
Attachments: otcpl-hp-x360-bsw-dmesg.txt
otcpl-hp-x360-system-info.txt
otcpl-dell-3493-icl-system-info.txt

Description Todd Brandt 2022-08-25 21:24:23 UTC
A commit in 6.0.0-rc1 has caused S2idle and S3 (freeze & mem) to completely hang the system on these 4 machines in our lab:

1) Clevo System76 Lemur 6
2) Lenovo Yoga 2 Pro
3) Dell Inspiron 3493
4) HP Pavillion x360

To reproduce the issue simply run kernel 6.0.0-rc1 or newer on these systems and run "sudo sleepgraph -m freeze" or "sudo sleepgraph -m mem". The system will hang after that.

I've bisected the problem to this specific commit:

88f1669019bd62b3009a3cebf772fbaaa21b9f38 is the first bad commit
commit 88f1669019bd62b3009a3cebf772fbaaa21b9f38
Author: Bart Van Assche <bvanassche@acm.org>
Date:   Thu Jun 30 12:57:03 2022 -0700

    scsi: sd: Rework asynchronous resume support
    
    For some technologies, e.g. an ATA bus, resuming can take multiple
    seconds. Waiting for resume to finish can cause a very noticeable delay.
    Hence this commit that restores the behavior from before "scsi: core: pm:
    Rely on the device driver core for async power management" for most SCSI
    devices.
    
    This commit introduces a behavior change: if the START command fails, do
    not consider this as a SCSI disk resume failure.
    
    Link: https://bugzilla.kernel.org/show_bug.cgi?id=215880
    Link: https://lore.kernel.org/r/20220630195703.10155-3-bvanassche@acm.org
    Fixes: a19a93e4c6a9 ("scsi: core: pm: Rely on the device driver core for async power management")
    Cc: Ming Lei <ming.lei@redhat.com>
    Cc: Hannes Reinecke <hare@suse.de>
    Cc: John Garry <john.garry@huawei.com>
    Cc: ericspero@icloud.com
    Cc: jason600.groome@gmail.com
    Tested-by: jason600.groome@gmail.com
    Signed-off-by: Bart Van Assche <bvanassche@acm.org>
    Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

:040000 040000 dbd390c19cfddba2b559b06691404aee4c165384 54c7fa67e3a1605878999bdf1e39a95ca793238a M	drivers
Comment 1 Bart Van Assche 2022-08-25 21:27:26 UTC
A revert for that patch has been posted: https://lore.kernel.org/linux-scsi/20220816172638.538734-1-bvanassche@acm.org/
Comment 2 Todd Brandt 2022-08-25 21:30:55 UTC
Created attachment 301662 [details]
otcpl-hp-x360-bsw-dmesg.txt

S3 suspend fail dmesg log for the HP Pavillion x360
Comment 3 Todd Brandt 2022-08-25 21:32:02 UTC
One other thing to note, I tried removing this one commit from the very latest 6.0.0-rc2 code upstream and it fixed it completely. So there's no doubt this one commit is the sole cause of the hang.
Comment 4 Todd Brandt 2022-08-25 21:35:52 UTC
Created attachment 301663 [details]
otcpl-hp-x360-system-info.txt

System info for the HP Pavillion x360
Comment 5 Todd Brandt 2022-08-25 21:38:40 UTC
Created attachment 301664 [details]
otcpl-dell-3493-icl-system-info.txt

Dell Inspiron 3493 system info
Comment 6 Todd Brandt 2022-08-25 21:46:48 UTC
(In reply to Bart Van Assche from comment #1)
> A revert for that patch has been posted:
> https://lore.kernel.org/linux-scsi/20220816172638.538734-1-bvanassche@acm.
> org/

When will this make it upstream? In 6.0.0-rc3 hopefully?
Comment 7 Bart Van Assche 2022-08-25 21:55:46 UTC
On 8/25/22 14:46, bugzilla-daemon@kernel.org wrote:
> When will this make it upstream? In 6.0.0-rc3 hopefully?

I'm not sure. This depends on the SCSI maintainer.
Comment 8 Todd Brandt 2022-08-25 22:12:13 UTC
(In reply to Bart Van Assche from comment #7)
> On 8/25/22 14:46, bugzilla-daemon@kernel.org wrote:
> > When will this make it upstream? In 6.0.0-rc3 hopefully?
> 
> I'm not sure. This depends on the SCSI maintainer.

ok, thank you, I'll keep monitoring and will close this bug when it lands upstream.
Comment 9 Todd Brandt 2022-08-29 22:37:04 UTC
Looks like the commit was successfully reverted in 6.0.0-rc3. S2idle works again on the HP Pavillion x360 and Dell Inspiron 3493.