Bug 208871 - [BISECTED][REGRESSION] 5.8.0-1, 5.7.11 and .14 fail to suspend on Dell XPS L502X
Summary: [BISECTED][REGRESSION] 5.8.0-1, 5.7.11 and .14 fail to suspend on Dell XPS L502X
Status: RESOLVED UNREPRODUCIBLE
Alias: None
Product: Power Management
Classification: Unclassified
Component: Hibernation/Suspend (show other bugs)
Hardware: Intel Linux
: P1 normal
Assignee: Rafael J. Wysocki
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2020-08-11 04:02 UTC by Bob Hepple
Modified: 2020-11-04 08:31 UTC (History)
2 users (show)

See Also:
Kernel Version: 5.8
Subsystem:
Regression: Yes
Bisected commit-id:


Attachments
git bisection on kernel (14.21 KB, text/plain)
2020-08-14 23:57 UTC, Bob Hepple
Details

Description Bob Hepple 2020-08-11 04:02:03 UTC
5.6.19 and prior were able to suspend/hibernate just fine using 'systemctl suspend' etc

On upgrade to 5.7.0, 5.7.11, 5.7.14 or 5.8.0 it fails:

# cat /sys/power/state
freeze mem disk
# echo mem >/sys/power/state
bash: echo: write error: Device or resource busy
# echo freeze >/sys/power/state
bash: echo: write error: Device or resource busy

Hardware is Dell XPS L502X

System is Fedora-31 and -32
Kernels were downloaded from kernel.org, compiled and installed.
Fedora kernels 5.7+ also fail
Comment 1 Bob Hepple 2020-08-12 00:33:58 UTC
Awesome!

I fixed it myself - all it needed was a BIOS update!
Comment 2 Bob Hepple 2020-08-12 00:34:54 UTC
No need to bother the gods of power management any more - so I'll close this.
Comment 3 Bob Hepple 2020-08-13 07:47:30 UTC
I made a mistake. I'm back to no suspend on 5.7.0 - I must have made an error when I reported that the BIOS update fixed it. Or some weird stuff happening when I booted to Windows to do the BIOS update that put the machine in a suspendable state.
Comment 4 Bob Hepple 2020-08-14 23:57:48 UTC
Created attachment 290907 [details]
git bisection on kernel
Comment 5 Bob Hepple 2020-08-14 23:58:32 UTC
I did a 'git bisect' on https://github.com/torvalds/linux - it seems the problem originates
in commit ffc1c20c46f74e24c3f03147688b4af6e429654a of April 1st (All Fools Day!) - please 
see attachment for details.

At this point, I'm at the end of my expertise - if someone can offer a patch, I'd be 
happy to try it out!!

Thanks!
Comment 6 Bob Hepple 2020-08-16 22:24:00 UTC
Just thought I'd double check as 'git bisect' seems to have sent me on a longer trip than necessary ... I manually checked out, compiled and tested these commits:

First failing commit:
ffc1c20c46f74e24c3f03147688b4af6e429654a Wed Apr  1 15:33:12 2020 -0700

Last successfull commit:
f365ab31efacb70bed1e821f7435626e0b2528a6 Wed Apr  1 15:24:20 2020 -0700

ffc1c20 is def the breaking commit:

commit ffc1c20c46f74e24c3f03147688b4af6e429654a
Merge: f365ab31efac 81d5553d1288
Author: Linus Torvalds <torvalds@linux-foundation.org>
Date:   Wed Apr 1 15:33:12 2020 -0700

    Merge tag 'for-5.7/dm-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm
    
    Pull device mapper updates from Mike Snitzer:
    
     - Add DM writecache "cleaner" policy feature that allows cache to be
       flushed while userspace monitors for completion to then discommision
       use of caching.
    
     - Optimize DM writecache superblock writing and also yield CPU while
       initializing writecache on large PMEM devices to avoid CPU stalls.
    
     - Various fixes to DM integrity target while preparing for the ability
       to resize a DM integrity device. In addition to resize support, add
       optional discard support with the "allow_discards" feature.
    
     - Fix DM clone target's discard handling and overflow bugs which could
       cause data corruption.
    
     - Fix memory leak in destructor for DM verity FEC support.
    
     - Fix DM zoned target's redundant increment of nr_rnd_zones.
    
     - Small cleanup in DM crypt to use crypt_integrity_aead() helper.
    
    * tag 'for-5.7/dm-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm:
      dm clone metadata: Fix return type of dm_clone_nr_of_hydrated_regions()
      dm clone: Add missing casts to prevent overflows and data corruption
      dm clone: Add overflow check for number of regions
      dm clone: Fix handling of partial region discards
      dm writecache: add cond_resched to avoid CPU hangs
      dm integrity: improve discard in journal mode
      dm integrity: add optional discard support
      dm integrity: allow resize of the integrity device
      dm integrity: factor out get_provided_data_sectors()
      dm integrity: don't replay journal data past the end of the device
      dm integrity: remove sector type casts
      dm integrity: fix a crash with unusually large tag size
      dm zoned: remove duplicate nr_rnd_zones increase in dmz_init_zone()
      dm verity fec: fix memory leak in verity_fec_dtr
      dm writecache: optimize superblock write
      dm writecache: implement gradual cleanup
      dm writecache: implement the "cleaner" policy
      dm writecache: do direct write if the cache is full
      dm integrity: print device name in integrity_metadata() error message
      dm crypt: use crypt_integrity_aead() helper
Comment 7 Bob Hepple 2020-08-18 23:41:08 UTC
I just tried today latest (https://github.com/torvalds/linux.git commit 18445bf405cb331117bc98427b1ba6f12418ad17) - which is a 5.9 pre-release - and suspend started to work again!
Comment 8 Bob Hepple 2020-10-25 08:19:39 UTC
Tested today against master 5.9 d76913908102044f14381df865bb74df17a538cb and all is well with suspend.
Comment 9 Chen Yu 2020-11-04 08:27:13 UTC
Thanks for bisecting. It looks like the bad commit is a pull request which includes many changes. But as you mentioned recently the suspend works as expected I would assume the bug has been fixed?
Comment 10 Bob Hepple 2020-11-04 08:31:27 UTC
Hi Chen Yu,

Yes - I think that all is well. For now :-)

Note You need to log in before you can comment on or make changes to this bug.