Created attachment 305817 [details] log Linux fedora 6.8.0-0.rc0.20240112git70d201a40823.5.fc40.x86_64 Laptop: Chuwi Freebook 2023 System is freezing at random times - no relation whether it's used or idle. Needs to be rebooted to work again. Changes on /sys/module/nvme_core/parameters/default_ps_max_latency_us do not bring any result. Can this be kernel bug or rather hardware failure ?
Created attachment 305818 [details] smartctl info
Have you tried booting with: nvme_core.default_ps_max_latency_us=0 pcie_aspm=off ?
Yes and validated if the setting is applied, but did not help. I have also tested this laptop with windows - it doesnt happen on windows. I understand this may be wrong on hardware that ignores disabled power states, but if Windows can handle it somehow - there should also be workaround to do this with Linux. This doesn't happen on debug kernel, but i guess the reason may be that debug kernel creates constant load by saving logs, which prevents the drive from resting. The drive in described in Bios as Airdisk, their webiste is only chinese - hard to find any information and Chuwi guys declare support only for Windows.
There probably is a way to make it work in Linux, but if it isn't behaving in a spec compliant manner, then either the maker needs to disclose how it actually works or someone who has the device could reverse engineer what differences are happening between the operating systems and make a fix from there.
I just wonder if this is only nvme issue (on windows it works with generic driver without this experience) or the problem may be located somwhere on how linux deals with file system after (lets say) unexpected nvme restart. In the end, log does not indicate that storage could not be restarted - the final problem is that its getting remounted with read-only mode to secure fs consistency, so maybe solutiion should be not on driver level. Not sure if this is the right place to elaborate such suppositions.
It is common for vendors to WHQL their hardware prior to releasing it, which can change the Windows driver for quirky behavior. Windows also has non-standard ACPI settings that we don't know about in Linux; D3Storage was one such example, but I believe that has been standardized now. The error message indicates we can't do MMIO. That's at the PCIe Link Layer, well below the filesystem in the software stack.
Surprisingly, issue has got resolved on kernel 6.9 :)