Bug 217664
Summary: | Medion AMD Laptop doesn't wake up from s2idle with SATA disk | ||
---|---|---|---|
Product: | IO/Storage | Reporter: | popus_czy_to_ty (penteljapan) |
Component: | Serial ATA | Assignee: | Tejun Heo (tj) |
Status: | RESOLVED DOCUMENTED | ||
Severity: | high | CC: | alexdeucher, bagasdotme, colombojrj, mario.limonciello |
Priority: | P3 | ||
Hardware: | AMD | ||
OS: | Linux | ||
Kernel Version: | 6.2.0-27-generic | Subsystem: | |
Regression: | No | Bisected commit-id: | |
Attachments: |
kernel 6.4.11 install
kernel log journalctrl spam python script - results second run journalctl from 26/08/23 test nr 2 acpi dump s |
Description
popus_czy_to_ty
2023-07-12 09:59:14 UTC
second card. somehow didnt add it, cant edit. 01:00.0 VGA compatible controller: NVIDIA Corporation GA107M [GeForce RTX 3050 Mobile] (rev a1) (prog-if 00 [VGA controller]) Subsystem: CLEVO/KAPOK Computer GA107M [GeForce RTX 3050 Mobile] Physical Slot: 0 Flags: bus master, fast devsel, latency 0, IRQ 80, IOMMU group 9 Memory at d0000000 (32-bit, non-prefetchable) [size=16M] Memory at fb00000000 (64-bit, prefetchable) [size=4G] Memory at fc00000000 (64-bit, prefetchable) [size=32M] I/O ports at 3000 [size=128] Expansion ROM at d1080000 [virtual] [disabled] [size=512K] Capabilities: [60] Power Management version 3 Capabilities: [68] MSI: Enable+ Count=1/1 Maskable- 64bit+ Capabilities: [78] Express Legacy Endpoint, MSI 00 Capabilities: [b4] Vendor Specific Information: Len=14 <?> Capabilities: [100] Virtual Channel Capabilities: [258] L1 PM Substates Capabilities: [128] Power Budgeting <?> Capabilities: [420] Advanced Error Reporting Capabilities: [600] Vendor Specific Information: ID=0001 Rev=1 Len=024 <?> Capabilities: [900] Secondary PCI Express Capabilities: [bb0] Physical Resizable BAR Capabilities: [c1c] Physical Layer 16.0 GT/s <?> Capabilities: [d00] Lane Margining at the Receiver <?> Capabilities: [e00] Data Link Feature <?> Kernel driver in use: nvidia Kernel modules: nvidiafb, nouveau, nvidia_drm, nvidia second card. somehow didnt add it, cant edit. 01:00.0 VGA compatible controller: NVIDIA Corporation GA107M [GeForce RTX 3050 Mobile] (rev a1) (prog-if 00 [VGA controller]) Subsystem: CLEVO/KAPOK Computer GA107M [GeForce RTX 3050 Mobile] Physical Slot: 0 Flags: bus master, fast devsel, latency 0, IRQ 80, IOMMU group 9 Memory at d0000000 (32-bit, non-prefetchable) [size=16M] Memory at fb00000000 (64-bit, prefetchable) [size=4G] Memory at fc00000000 (64-bit, prefetchable) [size=32M] I/O ports at 3000 [size=128] Expansion ROM at d1080000 [virtual] [disabled] [size=512K] Capabilities: [60] Power Management version 3 Capabilities: [68] MSI: Enable+ Count=1/1 Maskable- 64bit+ Capabilities: [78] Express Legacy Endpoint, MSI 00 Capabilities: [b4] Vendor Specific Information: Len=14 <?> Capabilities: [100] Virtual Channel Capabilities: [258] L1 PM Substates Capabilities: [128] Power Budgeting <?> Capabilities: [420] Advanced Error Reporting Capabilities: [600] Vendor Specific Information: ID=0001 Rev=1 Len=024 <?> Capabilities: [900] Secondary PCI Express Capabilities: [bb0] Physical Resizable BAR Capabilities: [c1c] Physical Layer 16.0 GT/s <?> Capabilities: [d00] Lane Margining at the Receiver <?> Capabilities: [e00] Data Link Feature <?> Kernel driver in use: nvidia Kernel modules: nvidiafb, nouveau, nvidia_drm, nvidia Can you show system log instead (output of journalctl)? stripped part, might be interesting? https://pastebin.com/MgpCf7Xf https://www.youtube.com/shorts/vuCrMdrtGdU - video of symptom BOE NV156FHM-N4G -display model NVIDIA-SMI 525.125.06 Driver Version: 525.125.06 CUDA Version: 12.0 tested debian 12 - dead tested mx linux with card choose during instalation, chosen amdgpu - dead Hi, I am having the same problem. My laptop is Thinkpad T480 with intel gpu. I discovered that it suspends and wake-up properly with AC pluged. But it fails to wake-up in battery mode. OpenSuse Leap and Tumbleweed -> dead Debian 12 -> dead Gentoo with kernel 6 -> dead I will try with gentoo kernel 5. Any ideas of how to debug this? Does this laptop use a display mux? Does it work without the nvidia driver loaded? Can you suspend via something other than the lid (e.g., via the menu in your GUI)? Does that work? aafter reinstalling and many times trying response for your questions is 1 (mux) whatever it is --> no 2 no 3 no, via command its dead also 4.no 5. insyde corp (bios manufacturer doesnt respond to my emails), even in bios i cant disable dedicated card If I may: have you tried to disable (or even better, uninstall) tlp? Most of modern GNU/Linuxes come with it previously installed. The display mux is a switch that controls what GPU the built in display is routed to (dGPU or APU). If it's not set correctly it can route the display to the wrong GPU. AFAIK, it should be handled by the nvidia driver. I'm not sure if they support this or not on Linux. You indicated that the system is still accessible after resume, can you get the dmesg output after resume? Are any of the external display connectors driven by the APU? If so, do any of those work correctly on resume? after resume fan is spinning, tried to reconnect over ssh from client its not possible. tlp is not installed. biggest success it was when i played with drivers i was able to backlight keyboard. now im on stock kubuntu and it doesnt even backlight keyboard i used to control mux in nvidia control panel (performance mode- nvidia) on demand AMD. I dont use or have any external monitors, so i cant say, propably not, since it doesnt back up network stack. Does suspend work with gpu drivers backlisted? *blacklisted banned all in grub ( GRUB_CMDLINE_LINUX_DEFAULT="quiet splash module_blacklist=nvidia,nvidia-current,nvidia_drm,nvidia_uvm,nvidia_modeset,nouveau" ) still doesnt wake up now nvidia using some intel driver somehow xD sd@Crawler-E25:~$ lspci -knn | grep VGA -A 5 01:00.0 VGA compatible controller [0300]: NVIDIA Corporation GA107M [GeForce RTX 3050 Mobile] [10de:25a2] (rev a1) Subsystem: CLEVO/KAPOK Computer GA107M [GeForce RTX 3050 Mobile] [1558:5e00] Kernel modules: nvidiafb, nouveau, nvidia_drm, nvidia 01:00.1 Audio device [0403]: NVIDIA Corporation Device [10de:2291] (rev a1) Subsystem: NVIDIA Corporation Device [10de:0000] Kernel driver in use: snd_hda_intel -- 05:00.0 VGA compatible controller [0300]: Advanced Micro Devices, Inc. [AMD/ATI] Cezanne [Radeon Vega Series / Radeon Vega Mobile Series] [1002:1638] (rev c6) Subsystem: CLEVO/KAPOK Computer Cezanne [Radeon Vega Series / Radeon Vega Mobile Series] [1558:5e00] Kernel driver in use: amdgpu Kernel modules: amdgpu 05:00.1 Audio device [0403]: Advanced Micro Devices, Inc. [AMD/ATI] Renoir Radeon High Definition Audio Controller [1002:1637] Subsystem: CLEVO/KAPOK Computer Renoir Radeon High Definition Audio (In reply to popus_czy_to_ty from comment #18) > banned all in grub ( GRUB_CMDLINE_LINUX_DEFAULT="quiet splash > module_blacklist=nvidia,nvidia-current,nvidia_drm,nvidia_uvm,nvidia_modeset, > nouveau" ) > still doesnt wake up All GPU drivers, including amdgpu. I'm trying to understand if this is a general platform issue or something specific to one of the GPUs. Now on those backup drivers? vesa? doesnt go to suspend, reacts on mouse clicks(touchpad), and touchpad movements, but not keyboard. im not good in linux but for some reason tty hangs up i need to run that from second tty after disabling amdgpu. i made video how to it looks like to show it https://www.youtube.com/watch?v=HqxWxdpFVFU https://mega.nz/file/rkwHSRrb#MGBASmlT95wzofl5ZCg6PjyU4Y6CYTC5yl4bc03Hjik - kernel log dmesg - https://pastebin.com/jBgktfa1 [on vesa drivers] Is the system accessible on resume? I.e., can you get ssh access if the display is not active? > Kernel Version: 6.2.0-25-generic (64-bit) This is the upstream kernel bug tracker and you're filing a bug on a distro kernel. Can you please try against a supported upstream mainline kernel not a distro kernel? This might be missing patches. You can find mainline kernel builds for Ubuntu here: https://kernel.ubuntu.com/~kernel-ppa/mainline/ I suggest trying 6.4.11. > [ 145.070506] PM: suspend entry (s2idle) > [ 152.723268] amd_pmc AMDI0005:00: Last suspend didn't reach deepest state This system is using s2idle. In this case, disabling amdgpu won't be useful to identify a platform issue because the system won't reach the deepest state without it. sudo add-apt-repository ppa:cappelikan/ppa sudo apt update sudo apt install mainline sudo mainline lsd@Crawler-E25:~$ sudo mainline install 6.4.11 mainline 1.4.8 Updating Kernels... Pobieranie 6.4.11 Instalowanie 6.4.11 after this, and format im landing on initramfs on fresh kubuntu 23.04 for some reason I've never used that tool before, ut please make sure that you have all the necessary packages installed. You need both the linux-image and linux-modules packages. @alex, as i renember it was remote shell accessible from suspending (on LLVMPIPE) [rest banned], but it didnt go deep sleep anyway as far i renember(?) @mario formatted 3x with changing filesystem between to make sure is all erased, grub now rebuilided correctly from scratch. Freshly installed Kubuntu 23.04. https://www.youtube.com/watch?v=d8z1gjoIoUk symptom is exactly same as on kernel 6.2.xxx, maybe few more complains in console. https://mega.nz/file/OoBmED6L#6Tw4c1kwsUirYXK_XQ7pw6vtLyghrho8MMyB5rzmYYw - ppa install https://mega.nz/file/3gRwgDyD#trmYYtnvYSP8z03U4Ggr3BvKFG-KFMOjhJvGnowBjFU - kern log https://mega.nz/file/X8gjCZJB#go4CsVkVdbtQNAlWcgEhshci8SGSe4bjYqeDyBQESLg - journalctrl (be aware huge spam batch) > https://mega.nz/file/OoBmED6L#6Tw4c1kwsUirYXK_XQ7pw6vtLyghrho8MMyB5rzmYYw - > ppa install > https://mega.nz/file/3gRwgDyD#trmYYtnvYSP8z03U4Ggr3BvKFG-KFMOjhJvGnowBjFU - > kern log > https://mega.nz/file/X8gjCZJB#go4CsVkVdbtQNAlWcgEhshci8SGSe4bjYqeDyBQESLg - > journalctrl (be aware huge spam batch) Please attach your logs directly to kernel Bugzilla. I can't access this resource. > https://www.youtube.com/watch?v=d8z1gjoIoUk symptom is exactly same as on > kernel 6.2.xxx, maybe few more complains in console. This looks "like" what I would expect happens when you don't have nouveau/nvidia blacklisted to me, but I don't know for sure since I can't access your logs. Please also look in /var/lib/systemd/pstore to see if anything was captured. As this is s2idle not s3, can you reproduce it under the context of this script? https://gitlab.freedesktop.org/drm/amd/-/blob/master/scripts/amd_s2idle.py Created attachment 304936 [details]
kernel 6.4.11 install
Created attachment 304937 [details]
kernel log
Created attachment 304938 [details]
journalctrl spam
Created attachment 304939 [details]
python script - results
Created attachment 304940 [details]
second run
banned nouveau by nvidia guide --> https://docs.nvidia.com/ai-enterprise/deployment-guide-vmware/0.1.0/nouveau.html lsd@Crawler-E25:~$ lspci -k | grep -EA3 'VGA|3D|Display' 01:00.0 VGA compatible controller: NVIDIA Corporation GA107M [GeForce RTX 3050 Mobile] (rev a1) Subsystem: CLEVO/KAPOK Computer GA107M [GeForce RTX 3050 Mobile] Kernel modules: nvidiafb, nouveau 01:00.1 Audio device: NVIDIA Corporation Device 2291 (rev a1) -- 05:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Cezanne [Radeon Vega Series / Radeon Vega Mobile Series] (rev c6) Subsystem: CLEVO/KAPOK Computer Cezanne [Radeon Vega Series / Radeon Vega Mobile Series] Kernel driver in use: amdgpu Kernel modules: amdgpu ---- /var/lib/systemd/pstore is empty, doesnt return back from suspend on your script (im trying to do it manually but doesnt do anything as before) a) before baning nouveau b) after > Kernel modules: nvidiafb, nouveau > Kernel command line: BOOT_IMAGE=/boot/vmlinuz-6.4.11-060411-generic > root=UUID=218238de-a398-4391-b329-f9e1a095db36 ro quiet splash vt.handoff=7 202 I don't see nouveau blocked. Furthmore I see problems with nouveau changing power states. > 2023-08-24T10:25:01.981451+01:00 Crawler-E25 kernel: [ 25.196621] nouveau > 0000:01:00.0: Unable to change power state from D3cold to D0, device > inaccessible > 2023-08-24T10:25:02.041542+01:00 Crawler-E25 kernel: [ 25.257667] nouveau > 0000:01:00.0: Unable to change power state from D3cold to D0, device > inaccessible > 2023-08-24T10:25:02.041554+01:00 Crawler-E25 kernel: [ 25.257703] nouveau > 0000:01:00.0: Unable to change power state from D3cold to D0, device > inaccessible > 2023-08-24T10:25:02.041557+01:00 Crawler-E25 kernel: [ 25.257811] nouveau > 0000:01:00.0: timer: stalled at ffffffffffffffff It seems like the card might be in a bad state to me. I suggest blacklisting ALL the relevant drivers for it (nvidia, nouveau, nvidiafb) on your kernel command line so we can remove it from the equation for suspend. Created attachment 304944 [details]
journalctl from 26/08/23
doesnt work now, when im trying to it doesnt want go suspend for first time, then i repeat and it goes but not recovers
Can you please use the s2idle script without the modules loaded? It will capture other debugging information I'd like to see to come up with a next step. Created attachment 304948 [details]
test nr 2
> 2023-08-24T10:52:38.933500+01:00 Crawler-E25 kernel: [ 3.671686] ahci
> 0000:06:00.0: AHCI 0001.0301 32 slots 1 ports 6 Gbps 0x1 impl SATA mode
> 2023-08-24T10:52:38.933501+01:00 Crawler-E25 kernel: [ 3.671690] ahci
> 0000:06:00.0: flags: 64bit ncq sntf ilck pm led clo only pmp fbs pio slum
> part
> 2023-08-24T10:28:47.625144+01:00 Crawler-E25 kernel: [ 4.672965] ata1.00:
> ATA-11: Samsung SSD 860 EVO 250GB, RVT04B6Q, max UDMA/133
> 2023-08-24T10:28:47.625144+01:00 Crawler-E25 kernel: [ 4.677878] ata1.00:
> Features: Trust Dev-Sleep NCQ-sndrcv
1) What do you have SATA_MOBILE_LPM_POLICY set to in your kernel?
2) Can you please try to remove your SATA disks from the system and only run with the NVME?
1) sorry for very long spam but im not into compiling own kernel too much, first https://cateee.net/lkddb/web-lkddb/SATA_MOBILE_LPM_POLICY.html i seen that page what you wrote then i stopped on https://davidaugustat.com/linux/how-to-compile-linux-kernel-on-ubuntu step `$ cp -v /boot/config-$(uname -r) .config` , which it gaves me CONFIG_SATA_MOBILE_LPM_POLICY=3 potentially in both cases (make localmodconfig ) aswell 2) sata is itself kubuntu 23.04, i will try to install that on main nvme (after how to learn build from source not from distro Was that NVME something you added to the system or it came with it? I am suspecting that your platform or the NVME doesn't end up activating a feature needed for s2idle to work properly called DevSlp at all or at the right timing. Can you please share to me an acpidump? I want to check if you have some _DSD properties set up appropriately. You can see more about this in 7c5f641a5914 ("ata: libahci: Adjust behavior when StorageD3Enable _DSD is set") If it's a missing _DSD but the system and disk both support DevSlp we can probably find a way to work around that in the kernel. s/NVME/SATA/ Created attachment 304974 [details]
acpi dump
SAMSUNG MZVLB512HBJQ-00000 512,1 GB (came in bundle (as laptop) Samsung SSD 860 EVO 250GB 250,0 GB -sata (added just for linux) i will reply after to you other questions later if i can. also i changed value from =3 to =0 on CONFIG_SATA_MOBILE_LPM_POLICY and i was able to compile it smoothly and run, but still no change (In reply to popus_czy_to_ty from comment #43) > Created attachment 304974 [details] > acpi dump It appears that your system doesn't have the StorageD3Enable _DSD on any of the AHCI ports, it's only for NVME. However we already have a force for this in the kernel for your system since kernel 6.3-rc1.: > 2023-08-27 07:32:12,410 INFO: ✅ AMD Ryzen 5 5600H with Radeon Graphics > (family 19 model 50) https://github.com/torvalds/linux/commit/e2a56364485e7789e7b8f342637c7f3a219f7ede > also i changed value from =3 to =0 on CONFIG_SATA_MOBILE_LPM_POLICY and i was > able to compile it smoothly and run, but still no change 3 is the right value (medium with DPM enabled) > Aug 26 17:57:15 Crawler-E25 kernel: ata1.00: Features: Trust Dev-Sleep > NCQ-sndrcv The disk supports the feature we need, DevSlp. > Aug 26 17:57:15 Crawler-E25 kernel: ahci 0000:06:00.0: flags: 64bit ncq sntf > ilck pm led clo only pmp fbs pio slum part The controller does not support 'sds' or 'sadm'. 'sds' is DevSlp 'sadm' is aggressive DevSlp The 'sds' is the important one, this is what allows the disk to transition to DevSlp during suspend. Does your BIOS have any settings for this? > 2) sata is itself kubuntu 23.04, i will try to install that on main nvme > (after how to learn build from source not from distro Something else you can do if it makes your experimentation easier is to take a live USB key of Ubuntu and boot off that with your SATA disk disconnected. This should let you test whether this feature works without the SATA disk. no my bios is totally stripped. nothing to find there to be honest. 1) https://www.youtube.com/watch?v=biqpvyrZb-A (bios choices) 2) YES YES YES that was the thing, i cant remake live iso with banned all nvidia stuff but it still WORKS. Doesnt work on keyboard press (but badly wants), any input from touchpad works, or power button. https://www.youtube.com/watch?v=CD-PKlgpH9U THANKS for you all! > 1) https://www.youtube.com/watch?v=biqpvyrZb-A (bios choices) On that first "Main" page where it listed SATA stuff there are a few ">" arrows. Do those dig any further? If not; could you contact the manufacturer and ask them to enable the feature? > 2) YES YES YES that was the thing, i cant remake live iso with banned all > nvidia stuff but it still WORKS. That's great news. It confirms that all the nvidia/nouveau stuff was a red herring to the real problem which is that SATA DevSlp isn't set up properly by the platform firmware. I don't think there is anything that can be done by the kernel for this, it needs the firmware to set it up properly. > Doesnt work on keyboard press (but badly wants), The lack of keyboard wakeup is an AMD AGESA bug [1]. It's fixed in newer AGESA, if you contact your manufacturer and ask them to upgrade this can get fixed. [1] https://github.com/torvalds/linux/commit/8e60615e8932167057b363c11a7835da7f007106 Created attachment 304980 [details]
s
a
i e-mailed them, but they dont care $$$$, i will ask medion instead, im out of this topic for now. I knew it from begining. Nothing under arrows Mario. OK - as this is a BIOS bug that we can't workaround from Linux, I'm going to close the issue as "DOCUMENTED". If Medion doesn't fix the issue in BIOS to add DevSlp to your controller your options are: 1) Return the SATA disk and get a second NVME disk instead and use that. 2) modprobe.blacklist=amd_pmc. This will prevent the system from going into hardware sleep and triggering the bug. It will consume a lot of power in s2idle, but it will work. https://www.youtube.com/watch?v=_OVQR1prUyQ - under arrows (nothing) However this problem has been found, but im still wondering, if its an bios bug, how windows is able to make suspend and recover? At least for AMD platforms the Linux SATA driver and Windows SATA driver don't follow the exact same sequence for suspend and the Windows sequence can handle disks without utilizing Devslp. Linux requires DevSlp. |