Bug 211809 - Sudden poweroffs on ThinkPad T490s with PCIe RTS522A card reader since v5.10-rc1
Summary: Sudden poweroffs on ThinkPad T490s with PCIe RTS522A card reader since v5.10-rc1
Status: NEW
Alias: None
Product: Drivers
Classification: Unclassified
Component: Other (show other bugs)
Hardware: x86-64 Linux
: P1 high
Assignee: Rafael J. Wysocki
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2021-02-16 23:23 UTC by Marius Bakke
Modified: 2021-03-25 17:20 UTC (History)
0 users

See Also:
Kernel Version: 5.10
Subsystem:
Regression: Yes
Bisected commit-id:


Attachments
kernel config (191.63 KB, text/plain)
2021-02-16 23:23 UTC, Marius Bakke
Details
dmesg (72.53 KB, text/plain)
2021-02-16 23:24 UTC, Marius Bakke
Details
bisect.log (3.72 KB, text/plain)
2021-02-21 12:24 UTC, Marius Bakke
Details

Description Marius Bakke 2021-02-16 23:23:13 UTC
Created attachment 295329 [details]
kernel config

I have a Thinkpad T490s that will suddenly freeze and shutdown when idle since 5.10-rc1 (and later, including 5.10.16).

There is no log or kernel trace to be seen, and it only seems to happen when idle.

I started bisecting this issue and it went swimmingly until I realized the problem may not necessarily manifest overnight (rendering many "good" commits bad).  Sometimes it takes less than an hour.

Progress is slow because this system is rarely idle for long periods.  I'll update this occasionally, any feedback is appreciated.

The earliest bad commit so far is v5.9-7630-g5a32c3413d33.

dmesg and kernel config attached.

5.9.11 runs fine for weeks on the exact same userland.
Comment 1 Marius Bakke 2021-02-16 23:24:34 UTC
Created attachment 295331 [details]
dmesg

dmesg of v5.9-3823-gb4e1bce85fd8
Comment 2 Marius Bakke 2021-02-21 12:23:13 UTC
59cdb23ca2dfef3b93411d1105409dfe9cd1f62f is the first bad commit
commit 59cdb23ca2dfef3b93411d1105409dfe9cd1f62f
Author: Scott Branden <scott.branden@broadcom.com>
Date:   Fri Oct 2 10:38:27 2020 -0700

    firmware: Add request_partial_firmware_into_buf()
    
    Add request_partial_firmware_into_buf() to allow for portions of a
    firmware file to be read into a buffer. This is needed when large firmware
    must be loaded in portions from a file on memory constrained systems.
    
    Signed-off-by: Scott Branden <scott.branden@broadcom.com>
    Co-developed-by: Kees Cook <keescook@chromium.org>
    Signed-off-by: Kees Cook <keescook@chromium.org>
    Link: https://lore.kernel.org/r/20201002173828.2099543-16-keescook@chromium.org
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

 drivers/base/firmware_loader/firmware.h |   4 ++
 drivers/base/firmware_loader/main.c     | 101 ++++++++++++++++++++++++++------
 include/linux/firmware.h                |  12 ++++
 3 files changed, 99 insertions(+), 18 deletions(-)

I hit it early during bisecting and have not had any lock-ups/poweroffs since.

Not sure where to redirect this bug report; apologies for the spam.
Comment 3 Marius Bakke 2021-02-21 12:24:51 UTC
Created attachment 295379 [details]
bisect.log
Comment 4 Marius Bakke 2021-02-24 19:07:24 UTC
After this finding I tried updating my copy of the linux-firmware repository (from 20201218 to f7915a0c29fee27a310cebd7155b9e3a6eb71a1d), and was stable on 5.10.17 for three days.

But the system crashed again just now in the same way.  Not sure if the firmware update made a difference or if I just got lucky.

Reassigned to drivers.  Suggestions for how to debug this issue further welcome.
Comment 5 Marius Bakke 2021-03-03 14:39:45 UTC
I'm still experiencing crashes on 5.10.19 even with the latest firmwares.  Also tried reverting 59cdb23ca2dfef3b93411d1105409dfe9cd1f62f to no avail.

Resuming the bisect.
Comment 6 Marius Bakke 2021-03-20 12:13:30 UTC
The problem was introduced in the range bc28369c6189..859d510e58da (15 commits).

Out of these, only two commits seem relevant for my hardware:

  0268eed10f12f785a618880920d90ee306fb2a50 misc: rtsx: Fix power down flow
  7c33e3c4c79ac5def79e7c773e38a7113eb14204 misc: rtsx: Add power saving functions and fix driving parameter

For reference, the system I'm using is a ThinkPad T490s with a RealTek RTS522A PCIe card reader:

01:00.0 Unassigned class [ff00]: Realtek Semiconductor Co., Ltd. RTS522A PCI Express Card Reader (rev 01)
        Subsystem: Lenovo Device 2286
        Flags: bus master, fast devsel, latency 0, IRQ 140, IOMMU group 13
        Memory at c9600000 (32-bit, non-prefetchable) [size=4K]
        Capabilities: [40] Power Management version 3
        Capabilities: [50] MSI: Enable+ Count=1/1 Maskable- 64bit+
        Capabilities: [70] Express Endpoint, MSI 00
        Capabilities: [100] Advanced Error Reporting
        Capabilities: [140] Device Serial Number 00-00-00-01-00-4c-e0-00
        Capabilities: [150] Latency Tolerance Reporting
        Capabilities: [158] L1 PM Substates
        Kernel driver in use: rtsx_pci

I also run "powertop --auto-tune" as a boot script which may contribute to this issue.

Armed with this knowledge, I was hopeful when I saw 920fd8a70619074eac7687352c8f1c6f3c2a64a5 in v5.10.20, but still experience sudden poweroffs on v5.10.23.

More investigation pending once I've pinpointed the exact commit.
Comment 7 Marius Bakke 2021-03-25 17:18:29 UTC
For posterity:

7c33e3c4c79ac5def79e7c773e38a7113eb14204 is the first bad commit
commit 7c33e3c4c79ac5def79e7c773e38a7113eb14204
Author: Ricky Wu <ricky_wu@realtek.com>
Date:   Mon Sep 7 18:07:31 2020 +0800

    misc: rtsx: Add power saving functions and fix driving parameter
    
    v4:
    split power down flow and power saving function to two patch
    
    v5:
    fix up modified change under the --- line
    
    Add rts522a L1 sub-state support
    Save more power on rts5227 rts5249 rts525a rts5260
    Fix rts5260 driving parameter
    
    Signed-off-by: Ricky Wu <ricky_wu@realtek.com>
    Link: https://lore.kernel.org/r/20200907100731.7722-1-ricky_wu@realtek.com
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

 drivers/misc/cardreader/rts5227.c  | 112 +++++++++++++++++++++++++++-
 drivers/misc/cardreader/rts5249.c  | 145 ++++++++++++++++++++++++++++++++++++-
 drivers/misc/cardreader/rts5260.c  |  28 +++----
 drivers/misc/cardreader/rtsx_pcr.h |  17 +++++
 4 files changed, 283 insertions(+), 19 deletions(-)

I have not been able to reproduce the problem on an AMD ThinkPad T14 Gen 1 with the same PCIe rts522a card reader.  Digging deeper...

Note You need to log in before you can comment on or make changes to this bug.