Bug 211573
Summary: | Samsung 970 EVO Plus Generates NVME Errors | ||
---|---|---|---|
Product: | IO/Storage | Reporter: | gs |
Component: | NVMe | Assignee: | IO/NVME Virtual Default Assignee (io_nvme) |
Status: | NEW --- | ||
Severity: | normal | CC: | agurenko, gs, kbusch, kernel.org, kolAflash, peter+linux, robert, szotsaki |
Priority: | P1 | ||
Hardware: | All | ||
OS: | Linux | ||
Kernel Version: | 5.10.13 | Subsystem: | |
Regression: | No | Bisected commit-id: |
Description
gs
2021-02-05 03:37:18 UTC
You should report these kinds of errors to your vendor. The driver isn't doing anything wrong here. I am seeing exactly this as well. It only started recently, but then I'd not been booting into this Linux installation since around February 2022. That was exactly the time when I upgraded from Debian buster (oldstable) to bullseye (current stable), moving from a 4.19-based kernel to a 5.10-based one. So presumably some difference between those *Debian* kernel versions has caused SMART to start logging these "Device: /dev/nvme0, number of Error Log entries increased from 2479 to 2482" (and similar) reports. It consistently detects the count has increased by a few every boot. The only output in `dmesg` for `nvme0`: ``` 08:52:29 0$ dmesg | grep nvme0 [ 1.096802] nvme nvme0: pci function 0000:0a:00.0 [ 1.103753] nvme nvme0: missing or invalid SUBNQN field. [ 1.103774] nvme nvme0: Shutdown timeout set to 8 seconds [ 1.114520] nvme nvme0: 8/0/0 default/read/poll queues [ 1.117133] nvme0n1: p1 p2 p3 p4 p5 ``` In fact, now I just got the smartctl email for this boot up... it's consistently +3 on Error Logs for the past two boots. I shall try to remember to check if Windows has similar SMART logging anywhere, given that the drive in question is purely for my Windows 10 install (it's the C: drive). Linux is only concerned with it due to mounting, using ntfs-3g, in case I need to check something in there. fstab for it: UUID=<uuid> /Win10-C ntfs-3g rw,exec,user,noatime,uid=athan,gid=athan,umask=02,nofail 0 0 So, it turns out that the best way to get full SMART information in Windows 10 is ... to install smartmontools for Windows. Doing so and running `smartctrl -x <nvme drive>` there shows another +3 to the "Error Information Log Entries" count. At this stage it *could* be that every reboot causes this, or it could be that both boot-up and reboot in Linux causes it. I'll investigate further. And, yes, it's entirely possible that this is just a misfeature of the drive, or actual indication of problems with my unit. The nvme specification is not very consistent on how to identify what features the controller supports, so in some cases the driver just has to try it and see if it worked. The log entries are likely harmless driver initiated admin commands (SqId 0) checking if a particular feature is supported. The SSD doesn't *need* to log an error entry for such commands as it has no impact on media health (which is what SMART is supposed to care about), but it is allowed to save the error if it wants. I personally find these types of errors to be less than useless. Indeed, the three showing up in Windows 10 are: Error Information (NVMe Log 0x01, 16 of 64 entries) Num ErrCount SQId CmdId Status PELoc LBA NSID VS 0 2485 0 0x0089 0x4212 0x028 0 - - 1 2484 0 0x001d 0x4212 0x028 0 - - 2 2483 0 0x0002 0x4004 0x028 0 0 - In Linux that section of `smartctl -x <device>` output was empty. In my case you can blame the Debian buster->bullseye upgrade for suddenly highlighting these. Either it upgraded smartmontools to a version that sends the alert emails and/or the different kernel version has tickled something. All the rest of the SMART output suggests there's no issues with the drive health, so I'll just ignore those specific emails. Thanks. Maybe related / helpful: Bug 217445 - standby-resume cycle increases NVMe error count (maybe bad NVMe commands) https://bugzilla.kernel.org/show_bug.cgi?id=217445 (rebooting also increases the error count in that bugreport) Smartd should ignore non-error entries from NVMe Error Information log https://www.smartmontools.org/ticket/1222 |