Bug 214025
Summary: | Better error message for PCI devices killed during boot? | ||
---|---|---|---|
Product: | Drivers | Reporter: | Sam Edwards (CFSworks) |
Component: | PCI | Assignee: | drivers_pci (drivers_pci) |
Status: | NEW --- | ||
Severity: | low | CC: | bczhc0, mastag |
Priority: | P1 | ||
Hardware: | All | ||
OS: | Linux | ||
Kernel Version: | 5.13.8 | Subsystem: | |
Regression: | No | Bisected commit-id: |
Description
Sam Edwards
2021-08-10 17:27:17 UTC
Hi there, I have the exact same problem. I've bought a Razer laptop (11th gen intel). And I'm getting: pci 0000:02:00:0: CLS mismatch (64 != 1020), using 64 bytes And then later on: nvme 0000:02:00.0: can't change power state from D3hot to D0 (config space inaccessible) The nvme is not detected now. It used to work in 5.11, but in 5.13 and 5.14 it's broken. I've tried to boot with pci_aspm=off but that didn't fix the problem either. Any known workaround for this? Hey! Same laptop. "CLS mismatch (... != 1020)" is because the PCIe device is being shut off by ACPI after it's discovered by enumeration but before it's been fully initialized. This bug report is only to request a specific check, and more helpful error message, for that circumstance. The underlying problem you're encountering is actually #214035 (please make it known over there that you are affected by this too - the ACPI subsystem maintainer hasn't yet taken notice of this). I worked around it by throwing in a big unconditional return before the body of acpi_turn_off_unused_power_resources() Speaking of the nvme, check its SMART information to see if you have the SSSTC CA6, firmware ERA0901. If you do, beware, I recently encountered some filesystem corruption due to a write issue on that SSD. I don't know if I just had bad luck, but make sure you're taking backups regularly just in case. (It might help to keep the SSD's write cache disabled by adding a "hdparm -W0" to your startup scripts.) Thanks for the information Sam! For the record I'm running the exact same firmware version: ERA0901 Afaik there are no firmware updates available either. I've found out that you can work around the bug by enabling the Intel VMD RST chipset in the bios. Even if you don't create a RAID set or use their optane caching technology, the kernel will at least be able to detect the drive. On the other hand, if you use Windows, it will probably require to re-install because it needs a separate driver from Intel. For testing I've also added an older Samsung 970 EVO NVME drive and this seems to work fine on all kernels. Now as you also mention possible corruption.. I think I'll just replace it with something else. Razer has probably chosen this drive because it's cheap (no proper testing with linux, no firmware updates, possible corruption etc..). Hi Sam, Small update. I've replaced the SSSTC nvme ssd with another one from WD and the same thing happens. So for the record, the first (internal) one is now from WD and the second one is the Samsung EVO 970 ssd. So it seems the nvme ssd type/brand is not to blame here. It just doesn't initialize properly when it's in the first slot regardless of the brand. If you don't have a second ssd installed, you could try to move it to the optional slot which is right above it. Or optionally enable the Intel VMD Rapid Storage chipset, but that will require you to reinstall Windows with the floppy driver from Intel. To be precise, the only thing to blame *here* is that the kernel isn't giving a clear indication that the PCI device has been shut off. Again: The underlying problem you're encountering is actually #214035. The insight about enabling VMD is probably very useful over there. That bug is about preventing the hardware from being shut off. This bug is about detecting when that has happened. |