Created attachment 295329 [details] kernel config I have a Thinkpad T490s that will suddenly freeze and shutdown when idle since 5.10-rc1 (and later, including 5.10.16). There is no log or kernel trace to be seen, and it only seems to happen when idle. I started bisecting this issue and it went swimmingly until I realized the problem may not necessarily manifest overnight (rendering many "good" commits bad). Sometimes it takes less than an hour. Progress is slow because this system is rarely idle for long periods. I'll update this occasionally, any feedback is appreciated. The earliest bad commit so far is v5.9-7630-g5a32c3413d33. dmesg and kernel config attached. 5.9.11 runs fine for weeks on the exact same userland.
Created attachment 295331 [details] dmesg dmesg of v5.9-3823-gb4e1bce85fd8
59cdb23ca2dfef3b93411d1105409dfe9cd1f62f is the first bad commit commit 59cdb23ca2dfef3b93411d1105409dfe9cd1f62f Author: Scott Branden <scott.branden@broadcom.com> Date: Fri Oct 2 10:38:27 2020 -0700 firmware: Add request_partial_firmware_into_buf() Add request_partial_firmware_into_buf() to allow for portions of a firmware file to be read into a buffer. This is needed when large firmware must be loaded in portions from a file on memory constrained systems. Signed-off-by: Scott Branden <scott.branden@broadcom.com> Co-developed-by: Kees Cook <keescook@chromium.org> Signed-off-by: Kees Cook <keescook@chromium.org> Link: https://lore.kernel.org/r/20201002173828.2099543-16-keescook@chromium.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> drivers/base/firmware_loader/firmware.h | 4 ++ drivers/base/firmware_loader/main.c | 101 ++++++++++++++++++++++++++------ include/linux/firmware.h | 12 ++++ 3 files changed, 99 insertions(+), 18 deletions(-) I hit it early during bisecting and have not had any lock-ups/poweroffs since. Not sure where to redirect this bug report; apologies for the spam.
Created attachment 295379 [details] bisect.log
After this finding I tried updating my copy of the linux-firmware repository (from 20201218 to f7915a0c29fee27a310cebd7155b9e3a6eb71a1d), and was stable on 5.10.17 for three days. But the system crashed again just now in the same way. Not sure if the firmware update made a difference or if I just got lucky. Reassigned to drivers. Suggestions for how to debug this issue further welcome.
I'm still experiencing crashes on 5.10.19 even with the latest firmwares. Also tried reverting 59cdb23ca2dfef3b93411d1105409dfe9cd1f62f to no avail. Resuming the bisect.
The problem was introduced in the range bc28369c6189..859d510e58da (15 commits). Out of these, only two commits seem relevant for my hardware: 0268eed10f12f785a618880920d90ee306fb2a50 misc: rtsx: Fix power down flow 7c33e3c4c79ac5def79e7c773e38a7113eb14204 misc: rtsx: Add power saving functions and fix driving parameter For reference, the system I'm using is a ThinkPad T490s with a RealTek RTS522A PCIe card reader: 01:00.0 Unassigned class [ff00]: Realtek Semiconductor Co., Ltd. RTS522A PCI Express Card Reader (rev 01) Subsystem: Lenovo Device 2286 Flags: bus master, fast devsel, latency 0, IRQ 140, IOMMU group 13 Memory at c9600000 (32-bit, non-prefetchable) [size=4K] Capabilities: [40] Power Management version 3 Capabilities: [50] MSI: Enable+ Count=1/1 Maskable- 64bit+ Capabilities: [70] Express Endpoint, MSI 00 Capabilities: [100] Advanced Error Reporting Capabilities: [140] Device Serial Number 00-00-00-01-00-4c-e0-00 Capabilities: [150] Latency Tolerance Reporting Capabilities: [158] L1 PM Substates Kernel driver in use: rtsx_pci I also run "powertop --auto-tune" as a boot script which may contribute to this issue. Armed with this knowledge, I was hopeful when I saw 920fd8a70619074eac7687352c8f1c6f3c2a64a5 in v5.10.20, but still experience sudden poweroffs on v5.10.23. More investigation pending once I've pinpointed the exact commit.
For posterity: 7c33e3c4c79ac5def79e7c773e38a7113eb14204 is the first bad commit commit 7c33e3c4c79ac5def79e7c773e38a7113eb14204 Author: Ricky Wu <ricky_wu@realtek.com> Date: Mon Sep 7 18:07:31 2020 +0800 misc: rtsx: Add power saving functions and fix driving parameter v4: split power down flow and power saving function to two patch v5: fix up modified change under the --- line Add rts522a L1 sub-state support Save more power on rts5227 rts5249 rts525a rts5260 Fix rts5260 driving parameter Signed-off-by: Ricky Wu <ricky_wu@realtek.com> Link: https://lore.kernel.org/r/20200907100731.7722-1-ricky_wu@realtek.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> drivers/misc/cardreader/rts5227.c | 112 +++++++++++++++++++++++++++- drivers/misc/cardreader/rts5249.c | 145 ++++++++++++++++++++++++++++++++++++- drivers/misc/cardreader/rts5260.c | 28 +++---- drivers/misc/cardreader/rtsx_pcr.h | 17 +++++ 4 files changed, 283 insertions(+), 19 deletions(-) I have not been able to reproduce the problem on an AMD ThinkPad T14 Gen 1 with the same PCIe rts522a card reader. Digging deeper...