Samsung 970 EVO Plus Generates NVME Errors, whether it is mounted or not. Sample size of 2 different hardware units. When the drives were used in a FreeBSD machine, the error count did not increase. Sample two has higher power on hours, however the majority was spent on a different(FreeBSD) machine. Sample 1 has only been in Linux machines. Thus I believe this to be a Linux driver/kernel bug. I have noticed no adverse effects of this error, simply an increment in the number of log entries. Please let me know how I can assist in debugging, or if this is not the correct location to report this issue. To my novice eye, commands sent to the 0xFFFFFFFF namespace by the NVME standard should be sent to all namespaces, but it appears to be considered an invalid namespace below. Steps to reproduce: Insert drive into system and power up Expected result: No errors generated Actual Results: Errors generated, although no adverse effects observed thus far (although no exhaustive search performed) Build & Hardware: Linux NAME 5.10.13-arch1-1 #1 SMP PREEMPT Wed, 03 Feb 2021 23:44:07 +0000 x86_64 GNU/Linux Hardware: Dell Precision 7530 with Samsung 970 EVO Plus Error always has same STATUS and PELoc, examples from SMART: 0 801903 0 0x0004 0x4016 0x004 0 - - 1 801902 0 0x0014 0x4016 0x004 0 - - 2 801901 0 0x0003 0x4016 0x004 0 - - 3 801900 0 0x0006 0x4016 0x004 0 - - 4 801899 0 0x0016 0x4016 0x004 0 - - 5 801898 0 0x0014 0x4016 0x004 0 - - 6 801897 0 0x0001 0x4016 0x004 0 - - 7 801896 0 0x0003 0x4016 0x004 0 - - 8 801895 0 0x0001 0x4016 0x004 0 - - 9 801894 0 0x0003 0x4016 0x004 0 - - 10 801893 0 0x0004 0x4016 0x004 0 - - 11 801892 0 0x0015 0x4016 0x004 0 - - 12 801891 0 0x0001 0x4016 0x004 0 - - 13 801890 0 0x0003 0x4016 0x004 0 - - 14 801889 0 0x0006 0x4016 0x004 0 - - 15 801888 0 0x0014 0x4016 0x004 0 - - Examples from nvme error-log: Entry[61] ................. error_count : 801847 sqid : 0 cmdid : 0x1 status_field : 0x4016(INVALID_NS: The namespace or the format of that namespace is invalid) parm_err_loc : 0x4 lba : 0 nsid : 0xffffffff vs : 0 trtype : The transport type is not indicated or the error is not transport related. cs : 0 trtype_spec_info: 0 ................. Entry[62] ................. error_count : 801846 sqid : 0 cmdid : 0x4 status_field : 0x4016(INVALID_NS: The namespace or the format of that namespace is invalid) parm_err_loc : 0x4 lba : 0 nsid : 0xffffffff vs : 0 trtype : The transport type is not indicated or the error is not transport related. cs : 0 trtype_spec_info: 0 ................. Entry[63] ................. error_count : 801845 sqid : 0 cmdid : 0x2 status_field : 0x4016(INVALID_NS: The namespace or the format of that namespace is invalid) parm_err_loc : 0x4 lba : 0 nsid : 0xffffffff vs : 0 trtype : The transport type is not indicated or the error is not transport related. cs : 0 trtype_spec_info: 0 ................. SMART information of unit 1 (newer): smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.10.13-arch1-1] (local build) Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org === START OF INFORMATION SECTION === Model Number: Samsung SSD 970 EVO Plus 2TB Serial Number: XXXXXXXXXXXXXXXXX Firmware Version: 2B2QEXM7 PCI Vendor/Subsystem ID: 0x144d IEEE OUI Identifier: 0x002538 Total NVM Capacity: 2,000,398,934,016 [2.00 TB] Unallocated NVM Capacity: 0 Controller ID: 4 NVMe Version: 1.3 Number of Namespaces: 1 Namespace 1 Size/Capacity: 2,000,398,934,016 [2.00 TB] Namespace 1 Utilization: 1,969,516,711,936 [1.96 TB] Namespace 1 Formatted LBA Size: 512 Namespace 1 IEEE EUI-64: 002538 5a01aec1a4 Local Time is: Thu Feb 4 21:23:18 2021 CST Firmware Updates (0x16): 3 Slots, no Reset required Optional Admin Commands (0x0017): Security Format Frmw_DL Self_Test Optional NVM Commands (0x005f): Comp Wr_Unc DS_Mngmt Wr_Zero Sav/Sel_Feat Timestmp Log Page Attributes (0x03): S/H_per_NS Cmd_Eff_Lg Maximum Data Transfer Size: 512 Pages Warning Comp. Temp. Threshold: 85 Celsius Critical Comp. Temp. Threshold: 85 Celsius Supported Power States St Op Max Active Idle RL RT WL WT Ent_Lat Ex_Lat 0 + 7.50W - - 0 0 0 0 0 0 1 + 5.90W - - 1 1 1 1 0 0 2 + 3.60W - - 2 2 2 2 0 0 3 - 0.0700W - - 3 3 3 3 210 1200 4 - 0.0050W - - 4 4 4 4 2000 8000 Supported LBA Sizes (NSID 0x1) Id Fmt Data Metadt Rel_Perf 0 + 512 0 0 === START OF SMART DATA SECTION === SMART overall-health self-assessment test result: PASSED SMART/Health Information (NVMe Log 0x02) Critical Warning: 0x00 Temperature: 44 Celsius Available Spare: 100% Available Spare Threshold: 10% Percentage Used: 0% Data Units Read: 31,326,159 [16.0 TB] Data Units Written: 35,929,602 [18.3 TB] Host Read Commands: 300,068,596 Host Write Commands: 239,589,759 Controller Busy Time: 532 Power Cycles: 37 Power On Hours: 818 Unsafe Shutdowns: 18 Media and Data Integrity Errors: 0 Error Information Log Entries: 103,607 Warning Comp. Temperature Time: 0 Critical Comp. Temperature Time: 0 Temperature Sensor 1: 44 Celsius Temperature Sensor 2: 38 Celsius Error Information (NVMe Log 0x01, 16 of 64 entries) Num ErrCount SQId CmdId Status PELoc LBA NSID VS 0 103607 0 0x000d 0x4016 0x004 0 - - 1 103606 0 0x000f 0x4016 0x004 0 - - 2 103605 0 0x000b 0x4016 0x004 0 - - 3 103604 0 0x0009 0x4016 0x004 0 - - 4 103603 0 0x001c 0x4016 0x004 0 - - 5 103602 0 0x001c 0x4016 0x004 0 - - 6 103601 0 0x0017 0x4016 0x004 0 - - 7 103600 0 0x001c 0x4016 0x004 0 - - 8 103599 0 0x001c 0x4016 0x004 0 - - 9 103598 0 0x001c 0x4016 0x004 0 - - 10 103597 0 0x0016 0x4016 0x004 0 - - 11 103596 0 0x000d 0x4016 0x004 0 - - 12 103595 0 0x001d 0x4016 0x004 0 - - 13 103594 0 0x000f 0x4016 0x004 0 - - 14 103593 0 0x001d 0x4016 0x004 0 - - 15 103592 0 0x000d 0x4016 0x004 0 - - ... (48 entries not read) SMART information unit 2: smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.10.13-arch1-1] (local build) Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org === START OF INFORMATION SECTION === Model Number: Samsung SSD 970 EVO Plus 2TB Serial Number: XXXXXXXXXXXXXx Firmware Version: 2B2QEXM7 PCI Vendor/Subsystem ID: 0x144d IEEE OUI Identifier: 0x002538 Total NVM Capacity: 2,000,398,934,016 [2.00 TB] Unallocated NVM Capacity: 0 Controller ID: 4 NVMe Version: 1.3 Number of Namespaces: 1 Namespace 1 Size/Capacity: 2,000,398,934,016 [2.00 TB] Namespace 1 Utilization: 1,657,011,032,064 [1.65 TB] Namespace 1 Formatted LBA Size: 512 Namespace 1 IEEE EUI-64: 002538 5901ae3f49 Local Time is: Thu Feb 4 21:23:56 2021 CST Firmware Updates (0x16): 3 Slots, no Reset required Optional Admin Commands (0x0017): Security Format Frmw_DL Self_Test Optional NVM Commands (0x005f): Comp Wr_Unc DS_Mngmt Wr_Zero Sav/Sel_Feat Timestmp Log Page Attributes (0x03): S/H_per_NS Cmd_Eff_Lg Maximum Data Transfer Size: 512 Pages Warning Comp. Temp. Threshold: 85 Celsius Critical Comp. Temp. Threshold: 85 Celsius Supported Power States St Op Max Active Idle RL RT WL WT Ent_Lat Ex_Lat 0 + 7.50W - - 0 0 0 0 0 0 1 + 5.90W - - 1 1 1 1 0 0 2 + 3.60W - - 2 2 2 2 0 0 3 - 0.0700W - - 3 3 3 3 210 1200 4 - 0.0050W - - 4 4 4 4 2000 8000 Supported LBA Sizes (NSID 0x1) Id Fmt Data Metadt Rel_Perf 0 + 512 0 0 === START OF SMART DATA SECTION === SMART overall-health self-assessment test result: PASSED SMART/Health Information (NVMe Log 0x02) Critical Warning: 0x00 Temperature: 46 Celsius Available Spare: 100% Available Spare Threshold: 10% Percentage Used: 0% Data Units Read: 4,854,764 [2.48 TB] Data Units Written: 22,988,874 [11.7 TB] Host Read Commands: 68,667,495 Host Write Commands: 450,184,240 Controller Busy Time: 376 Power Cycles: 136 Power On Hours: 516 Unsafe Shutdowns: 63 Media and Data Integrity Errors: 0 Error Information Log Entries: 801,903 Warning Comp. Temperature Time: 0 Critical Comp. Temperature Time: 0 Temperature Sensor 1: 46 Celsius Temperature Sensor 2: 41 Celsius Error Information (NVMe Log 0x01, 16 of 64 entries) Num ErrCount SQId CmdId Status PELoc LBA NSID VS 0 801903 0 0x0004 0x4016 0x004 0 - - 1 801902 0 0x0014 0x4016 0x004 0 - - 2 801901 0 0x0003 0x4016 0x004 0 - - 3 801900 0 0x0006 0x4016 0x004 0 - - 4 801899 0 0x0016 0x4016 0x004 0 - - 5 801898 0 0x0014 0x4016 0x004 0 - - 6 801897 0 0x0001 0x4016 0x004 0 - - 7 801896 0 0x0003 0x4016 0x004 0 - - 8 801895 0 0x0001 0x4016 0x004 0 - - 9 801894 0 0x0003 0x4016 0x004 0 - - 10 801893 0 0x0004 0x4016 0x004 0 - - 11 801892 0 0x0015 0x4016 0x004 0 - - 12 801891 0 0x0001 0x4016 0x004 0 - - 13 801890 0 0x0003 0x4016 0x004 0 - - 14 801889 0 0x0006 0x4016 0x004 0 - - 15 801888 0 0x0014 0x4016 0x004 0 - - ... (48 entries not read)
You should report these kinds of errors to your vendor. The driver isn't doing anything wrong here.
I am seeing exactly this as well. It only started recently, but then I'd not been booting into this Linux installation since around February 2022. That was exactly the time when I upgraded from Debian buster (oldstable) to bullseye (current stable), moving from a 4.19-based kernel to a 5.10-based one. So presumably some difference between those *Debian* kernel versions has caused SMART to start logging these "Device: /dev/nvme0, number of Error Log entries increased from 2479 to 2482" (and similar) reports. It consistently detects the count has increased by a few every boot. The only output in `dmesg` for `nvme0`: ``` 08:52:29 0$ dmesg | grep nvme0 [ 1.096802] nvme nvme0: pci function 0000:0a:00.0 [ 1.103753] nvme nvme0: missing or invalid SUBNQN field. [ 1.103774] nvme nvme0: Shutdown timeout set to 8 seconds [ 1.114520] nvme nvme0: 8/0/0 default/read/poll queues [ 1.117133] nvme0n1: p1 p2 p3 p4 p5 ```
In fact, now I just got the smartctl email for this boot up... it's consistently +3 on Error Logs for the past two boots.
I shall try to remember to check if Windows has similar SMART logging anywhere, given that the drive in question is purely for my Windows 10 install (it's the C: drive). Linux is only concerned with it due to mounting, using ntfs-3g, in case I need to check something in there. fstab for it: UUID=<uuid> /Win10-C ntfs-3g rw,exec,user,noatime,uid=athan,gid=athan,umask=02,nofail 0 0
So, it turns out that the best way to get full SMART information in Windows 10 is ... to install smartmontools for Windows. Doing so and running `smartctrl -x <nvme drive>` there shows another +3 to the "Error Information Log Entries" count. At this stage it *could* be that every reboot causes this, or it could be that both boot-up and reboot in Linux causes it. I'll investigate further. And, yes, it's entirely possible that this is just a misfeature of the drive, or actual indication of problems with my unit.
The nvme specification is not very consistent on how to identify what features the controller supports, so in some cases the driver just has to try it and see if it worked. The log entries are likely harmless driver initiated admin commands (SqId 0) checking if a particular feature is supported. The SSD doesn't *need* to log an error entry for such commands as it has no impact on media health (which is what SMART is supposed to care about), but it is allowed to save the error if it wants. I personally find these types of errors to be less than useless.
Indeed, the three showing up in Windows 10 are: Error Information (NVMe Log 0x01, 16 of 64 entries) Num ErrCount SQId CmdId Status PELoc LBA NSID VS 0 2485 0 0x0089 0x4212 0x028 0 - - 1 2484 0 0x001d 0x4212 0x028 0 - - 2 2483 0 0x0002 0x4004 0x028 0 0 - In Linux that section of `smartctl -x <device>` output was empty. In my case you can blame the Debian buster->bullseye upgrade for suddenly highlighting these. Either it upgraded smartmontools to a version that sends the alert emails and/or the different kernel version has tickled something. All the rest of the SMART output suggests there's no issues with the drive health, so I'll just ignore those specific emails. Thanks.
Maybe related / helpful: Bug 217445 - standby-resume cycle increases NVMe error count (maybe bad NVMe commands) https://bugzilla.kernel.org/show_bug.cgi?id=217445 (rebooting also increases the error count in that bugreport) Smartd should ignore non-error entries from NVMe Error Information log https://www.smartmontools.org/ticket/1222