Bug 202665
Summary: | NVMe AMD-Vi IO_PAGE_FAULT only with hardware IOMMU and fstrim/discard | ||
---|---|---|---|
Product: | IO/Storage | Reporter: | aladjev.andrew (aladjev.andrew) |
Component: | Block Layer | Assignee: | Jens Axboe (axboe) |
Status: | NEW --- | ||
Severity: | normal | CC: | agroszer, aladjev.andrew, alexander.jenisch, andreas.thalhammer, andrew.kavalov, arne.keller, bjoern, bsandhu, bugs+kernel, civiloid, claudius.ptolemaeus, clement, code, developer, erhard_f, george, hamelg, ilya, joona-matias.heikkila, kbusch, kernel, kernelbugs.philipl, knweiss, konoha02, mail, marti, mastercatz, michael.class, noodles, nutodafozo, ono.kirin, poseidon+o1zah, pseus7, rocky, rogerreubsaet, serg, spartanagm, stf_xl, stuart.w.hayes, swk, timm366, travneff, tworaz, umamahesh.allenki |
Priority: | P1 | ||
Hardware: | All | ||
OS: | Linux | ||
Kernel Version: | 4.20.12 | Subsystem: | |
Regression: | No | Bisected commit-id: | |
Attachments: |
4.20.12 kernel config
dmesg with page fault dmesg with iommu=soft dmesg without discard dmesg with hardware iommu and fstrim fstrim / function_graph trace for fstrim ftrace for fstrim / ftrace for (failing) fstrim /mnt/work 5.0.5 trace-cmd trace-cmd results trace_report trace report + dmesg attachment-10825-0.html kernel-5.3-nvme-discard-align-to-page-size.patch log with iommu=pt log with iommu=on log with AMD virtualization OFF kernel dmesg (kernel 5.14-rc5, AMD FX-8370) |
Created attachment 281317 [details]
dmesg with page fault
Created attachment 281319 [details]
dmesg with iommu=soft
Created attachment 281321 [details]
dmesg without discard
I've read recommendation from vendors to disable discard and make fstrim everyday instead. I've reproduced the same issue with hardware iommu and fstrim. Created attachment 281357 [details]
dmesg with hardware iommu and fstrim
This is the only one issue I have with ryzen build for now. Everything is just works including radeonsi video driver. I think that this issue may be related to AMD Store MI https://www.amd.com/system/files/2018-04/AMD-StoreMI-FAQ.pdf that is not yet implemented in Linux but exists in B450 chipset. I can confirm this issue on Asus X470 PRO motherboard (latest bios from today 4406) and HP EX920 ssd (same sm2265 controller) - running fstrim spams dmesg with IO_PAGE_FAULT. OS is Ubuntu 18.04.2 (Linux machine 4.18.0-16-generic #17~18.04.1-Ubuntu SMP Tue Feb 12 13:35:51 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux) [ 3415.904659] nvme 0000:01:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0000 address=0x00000000fbf96000 flags=0x0000] ... [ 3415.908733] nvme 0000:01:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0000 address=0x00000000fd1bb000 flags=0x0000] [ 3415.908768] amd_iommu_report_page_fault: 28 callbacks suppressed [ 3415.908769] AMD-Vi: Event logged [IO_PAGE_FAULT device=01:00.0 domain=0x0000 address=0x00000000fd1b0000 flags=0x0000] ... [ 3415.924844] AMD-Vi: Event logged [IO_PAGE_FAULT device=01:00.0 domain=0x0000 address=0x00000000fd1a8000 flags=0x0000] *** Bug 198733 has been marked as a duplicate of this bug. *** BTW, still present in 5.0.0. For details see Bug 198733 (Linux ZenMachine 5.0.0-gentoo-RYZEN #1 SMP Wed Mar 6 10:39:58 CET 2019 x86_64 AMD Ryzen 7 1800X Eight-Core Processor AuthenticAMD GNU/Linux). Here is another setup with the same problems (on Linux 5.0.0) * AMD Ryzen TR 1950X * MSI X399 SLI PLUS * Corsair MP510 960GB Status with this setup is identical to the OP * iommu=soft + fstrim => OK * iommu active + discard => AMD-Vi IO_PAGE_FAULT * iommu active + fstrim => AMD-Vi IO_PAGE_FAULT So at the moment it looks like that the problem is restricted to AMD Ryzen. Also worth noting is that the NVMe shows errors in its log (example of one message from "nvme smart-log"): error_count : 65526 sqid : 6 cmdid : 0x4c status_field : 0x400c(INTERNAL) parm_err_loc : 0xffff lba : 0x80000000 nsid : 0x1 vs : 0 I can confirm this issue on AMD Ryzen 3 2200G, ASRock B450M Pro4 and Adata XPG SX8200 Pro 256GB (NVMe). Didn't faced this issue with only one installed SATA SSD (Kingston HyperX Savage 240GB) with same kernel. Ah! I've had this error with only one NVMe SSD from the start. I just recently added a second NVMe SSD, using a PCI Express expansion card. It didn't change anything for me. There is upstream fix for AMD IOMMMU dma mapping issue: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=4e50ce03976fbc8ae995a000c4b10c737467beaa It's already on 5.0.x and 4.19.x (4.20 is end of life). There is a chance it fixes this issue. I can confirm the above fix does not resolve the IOMMU issue. I'm running latest manjaro kernel (5.0.3-1) and the problem still occurs. The iommu=soft workaround does avoid the issue for me, thus I'm confident my issue is the same as this one. (In reply to Tim Murphy from comment #14) > I can confirm the above fix does not resolve the IOMMU issue. I'm running > latest manjaro kernel (5.0.3-1) and the problem still occurs. The iommu=soft > workaround does avoid the issue for me, thus I'm confident my issue is the > same as this one. The fix in the 5.0.x stream only landed for 5.0.5 so worth trying a later kernel. I no longer see the IO_PAGE_FAULT message with 5.0.5 but I still see: [ 40.479512] print_req_error: I/O error, dev nvme0n1, sector 1390960 flags 803 messages when I do "fstrim -v /", which I don't see when I have "iommu=pt" passed on the kernel command line. Thanks. Sadly, my setup still shows page fault errors with 5.0.5-arch1, which i pulled & built earlier today. FWIW, error occurs only on my 2nd 'data' NVMe, which is attached via add-in card for 2nd (x4) PCIE slot. My boot NVMe has never displayed the error, on any kernel. My setup is Asrock B450M Pro4, Ryzen 1700. My hardware is also an Asrock B450M Pro4, but with a Ryzen 2700. I'm running 2 NVMe SSDs, one on the board slot and one in a x4 PCIE slot. I see the print_req_errors and IO_PAGE_FAULT errors for both pre 5.0.5, but only the print_req_io errors on 5.0.5. Reliably triggered by "fstrim -v /", and neither appears with "iommu=pt". If you do not have already, please configure kernel with CONFIG_DYNAMIC_FTRACE CONFIG_FUNCTION_GRAPH_TRACER run this script as root with iommu enabled: #!/bin/bash mount -t debugfs debugfs /sys/kernel/debug cd /sys/kernel/debug/tracing/ function_graph > current_tracer echo 1 > tracing_on fstrim / echo 0 > tracing_on cat trace > ~/trace.txt and provide trace.txt file here (compressed if too big for bugzilla). (In reply to Stanislaw Gruszka from comment #18) > function_graph > current_tracer echo function_graph > current_tracer Created attachment 282075 [details]
fstrim / function_graph trace for fstrim
Trace from 5.0.5 with no iommu= option passed to the kernel. No IO_PAGE_FAULT errors, but still the I/O errors, which aren't seen with "iommu=pt".
Created attachment 282077 [details]
ftrace for fstrim /
I'm adding the requested result from my 5.0.5-arch1-1-custom kernel.
Thanks, Tim
Created attachment 282079 [details]
ftrace for (failing) fstrim /mnt/work
This is the requested output for the failing operation on my 5.0.5-arch1-1-custom rig (fstrim of PCIE add-in card attached NVME device) - obsoletes my prior attachment which was done on my non-failing root NVME device (fstrim /).
Here's what shows in dmesg when the fstrim fails.
69.396195] nvme 0000:23:00.0: Event logged [IO_PAGE_FAULT domain=0x0000 address=0x0 flags=0x0000]
[ 69.396458] nvme 0000:23:00.0: Event logged [IO_PAGE_FAULT domain=0x0000 address=0x0 flags=0x0000]
[ 69.396645] nvme 0000:23:00.0: Event logged [IO_PAGE_FAULT domain=0x0000 address=0x200 flags=0x0000]
[ 69.396898] nvme 0000:23:00.0: Event logged [IO_PAGE_FAULT domain=0x0000 address=0x0 flags=0x0000]
[ 69.397079] nvme 0000:23:00.0: Event logged [IO_PAGE_FAULT domain=0x0000 address=0x400 flags=0x0000]
[ 69.397258] nvme 0000:23:00.0: Event logged [IO_PAGE_FAULT domain=0x0000 address=0x0 flags=0x0000]
[ 69.397439] nvme 0000:23:00.0: Event logged [IO_PAGE_FAULT domain=0x0000 address=0x400 flags=0x0000]
[ 69.397618] nvme 0000:23:00.0: Event logged [IO_PAGE_FAULT domain=0x0000 address=0x0 flags=0x0000]
[ 69.404931] nvme 0000:23:00.0: Event logged [IO_PAGE_FAULT domain=0x0000 address=0x200 flags=0x0000]
[ 69.405166] print_req_error: I/O error, dev nvme0n1, sector 76088 flags 803
[ 69.405226] nvme 0000:23:00.0: Event logged [IO_PAGE_FAULT domain=0x0000 address=0x0 flags=0x0000]
[ 69.405373] AMD-Vi: Event logged [IO_PAGE_FAULT device=23:00.0 domain=0x0000 address=0x200 flags=0x0000]
[tim1@pearl ~]$
Thanks
Sorry guys, but it is harder than I thought to see where the problem is. Perhaps tracing events will give better picture. Please install: https://git.kernel.org/pub/scm/linux/kernel/git/rostedt/trace-cmd.git and do $ ./trace-cmd/tracecmd/trace-cmd record -e block -e nvme -e iommu fstrim / $ ./trace-cmd/tracecmd/trace-cmd report > trace_report.txt and attach trace_report.txt(.xz) All block,nvme,iommu events should be available, if not kernel recompilation with proper option will be needed. Created attachment 282093 [details]
5.0.5 trace-cmd
Comment on attachment 282093 [details]
5.0.5 trace-cmd
Stanislaw, here's my trace attached, it is 5.0.5-050005-generic #201903271212 SMP Wed Mar 27 16:14:07 UTC 2019 x86_64.
linux@pc:/tmp$ sudo ./trace-cmd/tracecmd/trace-cmd record -e block -e nvme -e iommu fstrim -v /
/: 869,9 GiB (934002778112 bytes) trimmed
CPU0 data recorded at offset=0x55d000
8192 bytes in size
CPU1 data recorded at offset=0x55f000
0 bytes in size
CPU2 data recorded at offset=0x55f000
4096 bytes in size
CPU3 data recorded at offset=0x560000
20480 bytes in size
CPU4 data recorded at offset=0x565000
139264 bytes in size
CPU5 data recorded at offset=0x587000
0 bytes in size
CPU6 data recorded at offset=0x587000
0 bytes in size
CPU7 data recorded at offset=0x587000
8192 bytes in size
CPU8 data recorded at offset=0x589000
45056 bytes in size
CPU9 data recorded at offset=0x594000
0 bytes in size
CPU10 data recorded at offset=0x594000
4096 bytes in size
CPU11 data recorded at offset=0x595000
16384 bytes in size
CPU12 data recorded at offset=0x599000
0 bytes in size
CPU13 data recorded at offset=0x599000
3526656 bytes in size
CPU14 data recorded at offset=0x8f6000
12582912 bytes in size
CPU15 data recorded at offset=0x14f6000
315392 bytes in size
linux@pc:/tmp$ sudo ./trace-cmd/tracecmd/trace-cmd report > trace_report.txt trace-cmd: No such file or directory
[nvme:nvme_sq] function nvme_trace_disk_name not defined
[nvme:nvme_setup_cmd] function nvme_trace_disk_name not defined
[nvme:nvme_complete_rq] function nvme_trace_disk_name not defined
dmesg errors:
[ 230.001807] nvme 0000:01:00.0: Event logged [IO_PAGE_FAULT domain=0x0000 address=0xf9d9b000 flags=0x0000]
[ 230.003210] nvme 0000:01:00.0: Event logged [IO_PAGE_FAULT domain=0x0000 address=0xff783000 flags=0x0000]
[ 230.004647] nvme 0000:01:00.0: Event logged [IO_PAGE_FAULT domain=0x0000 address=0xf9e74000 flags=0x0000]
[ 230.006076] nvme 0000:01:00.0: Event logged [IO_PAGE_FAULT domain=0x0000 address=0xff77c000 flags=0x0000]
[ 230.007463] nvme 0000:01:00.0: Event logged [IO_PAGE_FAULT domain=0x0000 address=0xe4eda000 flags=0x0000]
[ 230.008914] nvme 0000:01:00.0: Event logged [IO_PAGE_FAULT domain=0x0000 address=0xfa2fe000 flags=0x0000]
[ 230.010305] nvme 0000:01:00.0: Event logged [IO_PAGE_FAULT domain=0x0000 address=0xfb8b1000 flags=0x0000]
[ 230.011688] nvme 0000:01:00.0: Event logged [IO_PAGE_FAULT domain=0x0000 address=0xfa432000 flags=0x0000]
[ 230.013064] nvme 0000:01:00.0: Event logged [IO_PAGE_FAULT domain=0x0000 address=0xfa430000 flags=0x0000]
[ 230.014437] nvme 0000:01:00.0: Event logged [IO_PAGE_FAULT domain=0x0000 address=0xfa435000 flags=0x0000]
[ 230.015819] AMD-Vi: Event logged [IO_PAGE_FAULT device=01:00.0 domain=0x0000 address=0xfa4f7000 flags=0x0000]
[ 230.017197] AMD-Vi: Event logged [IO_PAGE_FAULT device=01:00.0 domain=0x0000 address=0xfa42d000 flags=0x0000]
[ 230.018580] AMD-Vi: Event logged [IO_PAGE_FAULT device=01:00.0 domain=0x0000 address=0xfa749000 flags=0x0000]
[ 230.019956] AMD-Vi: Event logged [IO_PAGE_FAULT device=01:00.0 domain=0x0000 address=0xfa74a000 flags=0x0000]
[ 230.021333] AMD-Vi: Event logged [IO_PAGE_FAULT device=01:00.0 domain=0x0000 address=0xfa74c000 flags=0x0000]
[ 230.022783] AMD-Vi: Event logged [IO_PAGE_FAULT device=01:00.0 domain=0x0000 address=0xf9dcc000 flags=0x0000]
[ 230.023828] AMD-Vi: Event logged [IO_PAGE_FAULT device=01:00.0 domain=0x0000 address=0xfa09e000 flags=0x0000]
[ 230.025200] AMD-Vi: Event logged [IO_PAGE_FAULT device=01:00.0 domain=0x0000 address=0xfd37d000 flags=0x0000]
[ 230.026585] AMD-Vi: Event logged [IO_PAGE_FAULT device=01:00.0 domain=0x0000 address=0xfd3f3000 flags=0x0000]
[ 230.028008] AMD-Vi: Event logged [IO_PAGE_FAULT device=01:00.0 domain=0x0000 address=0xfd37d000 flags=0x0000]
Created attachment 282111 [details]
trace-cmd results
Stanislaw, here is the output you requested from my system (apologies for delay). Note, the failing device in my case is mounted on /mnt/work. Thanks.
[tim1@pearl 202665]$ sudo trace-cmd record -e block -e nvme -e iommu fstrim /mnt/work
[sudo] password for tim1:
fstrim: /mnt/work: FITRIM ioctl failed: Input/output error
CPU0 data recorded at offset=0x57e000
0 bytes in size
CPU1 data recorded at offset=0x57e000
0 bytes in size
CPU2 data recorded at offset=0x57e000
0 bytes in size
CPU3 data recorded at offset=0x57e000
0 bytes in size
CPU4 data recorded at offset=0x57e000
4096 bytes in size
CPU5 data recorded at offset=0x57f000
4096 bytes in size
CPU6 data recorded at offset=0x580000
0 bytes in size
CPU7 data recorded at offset=0x580000
0 bytes in size
CPU8 data recorded at offset=0x580000
0 bytes in size
CPU9 data recorded at offset=0x580000
0 bytes in size
CPU10 data recorded at offset=0x580000
0 bytes in size
CPU11 data recorded at offset=0x580000
0 bytes in size
CPU12 data recorded at offset=0x580000
0 bytes in size
CPU13 data recorded at offset=0x580000
0 bytes in size
CPU14 data recorded at offset=0x580000
0 bytes in size
CPU15 data recorded at offset=0x580000
0 bytes in size
[tim1@pearl 202665]$ sudo trace-cmd report > trace_report.txt
trace-cmd: No such file or directory
[nvme:nvme_sq] function nvme_trace_disk_name not defined
[nvme:nvme_setup_cmd] function nvme_trace_disk_name not defined
[nvme:nvme_complete_rq] function nvme_trace_disk_name not defined
[tim1@pearl 202665]$
Created attachment 282131 [details]
trace_report
I can also confirm that the bug is still around on 5.0.6.
Board: ASUS PRIME X370-PRO, BIOS 4406 02/28/2019
CPU: AMD Ryzen 7 1800X
NVMe #1: Intel M.2 600p (onboard M.2)
NVMe #2: Crucial P1 (SilverStone SST-ECM20 PCIe 3.0 x4 to M.2)
/ is on NVMe #1 from the onboard M.2, but the problem also occurs on the NVMe from the PCIe expansion card.
And just to make it clear: * iommu=soft + discard/fstrim => OK * iommu=pt + discard/fstrim => OK * iommu active + discard or fstrim => nvme/AMD-Vi IO_PAGE_FAULT I currently use iommu=pt. Would it help to see a log with iommu=soft and iommu=pt to see the difference? Using btrfs in a RAID (with one partition on my NVMe #1 and one on my NVMe #2) I had various other serious errors as well, like: print_req_error: I/O error, dev nvme1n1, sector 1214060160 BUT this occured after the complete fail of NVMe #2 and its loss from the list of devices (/dev/nvme1n1 was gone), like this: [ 605.403827] INFO: task systemd:2816 blocked for more than 120 seconds. [ 605.403830] Tainted: G O T 4.20.1-gentoo-RYZEN #1 [ 605.403831] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 605.403833] systemd D 0 2816 1 0x00000000 [ 605.403836] Call Trace: [ 605.403844] __schedule+0x21c/0x720 [ 605.403847] schedule+0x27/0x80 [ 605.403850] io_schedule+0x11/0x40 [ 605.403853] wait_on_page_bit+0x11d/0x200 [ 605.403855] ? __page_cache_alloc+0x20/0x20 [ 605.403859] read_extent_buffer_pages+0x257/0x300 [ 605.403863] btree_read_extent_buffer_pages+0xc2/0x230 [ 605.403865] ? alloc_extent_buffer+0x35e/0x390 [ 605.403868] read_tree_block+0x5c/0x80 [ 605.403871] read_block_for_search.isra.13+0x1a9/0x380 [ 605.403874] btrfs_search_slot+0x226/0x970 [ 605.403876] btrfs_lookup_inode+0x63/0xfc [ 605.403879] btrfs_iget_path+0x67e/0x770 [ 605.403882] btrfs_lookup_dentry+0x478/0x570 [ 605.403885] btrfs_lookup+0x18/0x40 [ 605.403888] path_openat+0xbbd/0x13e0 [ 605.403891] do_filp_open+0xa7/0x110 [ 605.403894] do_sys_open+0x18e/0x230 [ 605.403896] __x64_sys_openat+0x1f/0x30 [ 605.403899] do_syscall_64+0x55/0x100 [ 605.403901] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [ 605.403904] RIP: 0033:0x7f57bc1a731a [ 605.403909] Code: Bad RIP value. [ 605.403911] RSP: 002b:00007ffe14628540 EFLAGS: 00000246 ORIG_RAX: 0000000000000101 [ 605.403913] RAX: ffffffffffffffda RBX: 00007ffe14628638 RCX: 00007f57bc1a731a [ 605.403914] RDX: 00000000000a0100 RSI: 0000562ae1fd7dd0 RDI: 00000000ffffff9c [ 605.403915] RBP: 0000000000000008 R08: 91824bee752ca339 R09: 00007f57bbf11540 [ 605.403917] R10: 0000000000000000 R11: 0000000000000246 R12: 0000562ae1fd7de6 [ 605.403918] R13: 0000562ae1fd7b10 R14: 00007ffe146285c0 R15: 0000562ae1fa6168 [ 655.735860] nvme nvme1: Device not ready; aborting reset Without the btrfs being in RAID mode the device isn't lost somehow, although I don't have any other partition in real use on NVMe #2 at the moment, other than a swap partition. But I cannot say that my system swaps that much as it has 32 GB of RAM which rarely gets used up completely. I don't see the connection to this error though, but if there is, it could help to dignose it. So if it helps I can setup a test partition on that other NVMe device. Just tell me what you need me to set up... Add me to the list of affected people. - Asrock X399 Taichi ATX-Size - Threadripper 1950X - 2x Corsair force M510P 480GB (Phison Electronics Corporation E12 NVMe Controller) - Kernel 5.0.10 customized. Adding trace report + kernel log. Created attachment 282603 [details]
trace report + dmesg
Not directly related to this issue, but the workaround iommu=pt has more side effects. I had to disable Secure Memory Encryption as the Megaraid SAS and radeon driver were unable to initialize properly with SME: mpt3sas 0000:09:00.0: SME is active, device will require DMA bounce buffers mpt2sas_cm0: reply_post_free pool: dma_pool_alloc failed mpt2sas_cm0: failure at drivers/scsi/mpt3sas/mpt3sas_scsih.c:10506/_scsih_probe()! radeon 0000:07:00.0: SME is active, device will require DMA bounce buffers radeon 0000:07:00.0: SME is active, device will require DMA bounce buffers software IO TLB: SME is active and system is using DMA bounce buffers [drm:r600_ring_test [radeon]] *ERROR* radeon: ring 0 test failed (scratch(0x8504)=0xCAFEDEAD) radeon 0000:07:00.0: disabling GPU acceleration I tried iommu=on mem_encrypt=off, but discarding the NVMe failed like before. Out of interest, the issue still exists in kernel 5.0.12. Slightly OT, I use a Radeon RX Vega (VEGA10) graphics card and SME never ever worked. I tried iommu=soft, iommu=pt and no iommu kernel cmdline option (i.e. on). Whenever I use mem_encrypt=on, the last line I see on the screen is this: fb0: switching to amdgpudrmfb from simple The system doesn't panic or stall or anything, it's just that I don't see any more screen updates at all. I'd have to do everything in the blind, or over ssh. BTW, I looked more closely at /Documentation/admin-guide/kernel-parameters.txt in the Linux kernel source tree, and there is also the following: amd_iommu= [HW,X86-64] Pass parameters to the AMD IOMMU driver in the system. Possible values are: fullflush - enable flushing of IO/TLB entries when they are unmapped. Otherwise they are flushed before they will be reused, which is a lot of faster off - do not initialize any AMD IOMMU found in the system force_isolation - Force device isolation for all devices. The IOMMU driver is not allowed anymore to lift isolation requirements as needed. This option does not override iommu=pt amd_iommu_dump= [HW,X86-64] Enable AMD IOMMU driver option to dump the ACPI table for AMD IOMMU. With this option enabled, AMD IOMMU driver will print ACPI tables for AMD IOMMU during IOMMU initialization. amd_iommu_intr=[legacy|vapic] iommu=[off|force|noforce|biomerge|panic|nopanic|merge|nomerge|soft|pt|nopt] Maybe I need some amd_iommu* tweaks as well? What options are there for amd_iommu_dump? enable/disable maybe? And is the amdgpu bug with SME enabled somehow related? amd_iommu_dump=1 And during ACPI IVRS is printed on the console/log: ------- example, not related to this issue [ 0.851042] AMD-Vi: Using IVHD type 0x11 [ 0.851401] AMD-Vi: device: 00:00.2 cap: 0040 seg: 0 flags: b0 info 0000 [ 0.851401] AMD-Vi: mmio-addr: 00000000feb80000 [ 0.851430] AMD-Vi: DEV_SELECT_RANGE_START devid: 00:01.0 flags: 00 [ 0.851431] AMD-Vi: DEV_RANGE_END devid: ff:1f.6 [ 0.851870] AMD-Vi: DEV_ALIAS_RANGE devid: ff:00.0 flags: 00 devid_to: 00:14.4 [ 0.851871] AMD-Vi: DEV_RANGE_END devid: ff:1f.7 [ 0.851875] AMD-Vi: DEV_SPECIAL(HPET[0]) devid: 00:14.0 [ 0.851876] AMD-Vi: DEV_SPECIAL(IOAPIC[33]) devid: 00:14.0 [ 0.851877] AMD-Vi: DEV_SPECIAL(IOAPIC[34]) devid: 00:00.1 [ 1.171028] AMD-Vi: IOMMU performance counters supported ------- example, not related to this issue Maybe worth having a look into these. I need only a text console, but using the radeon driver saves me a few watts than just using VGA framebuffer. I am not sure how SME works with DMA, but device read/writes would also need to be encrypted. The same here. I've to apply the workaround iommu=soft to be able trim my ssd. - MSI B450 GAMING PLUS (MS-7B86), BIOS 1.4 - CPU: AMD Ryzen 2700 - SSD nvme Force MP510 (FW ECFM12.2) - Kernel 5.1.7 (Archlinux) AMD broke VFIO since agesa 0072 (https://www.reddit.com/r/Amd/comments/bh3qqz/agesa_0072_pci_quirk/). There's a patch for 5.1 kernel that makes it work, could somebody test if it helps with our problem here? patch: https://clbin.com/VCiYJ (In reply to nutodafozo from comment #35) > AMD broke VFIO since agesa 0072 > (https://www.reddit.com/r/Amd/comments/bh3qqz/agesa_0072_pci_quirk/). > There's a patch for 5.1 kernel that makes it work, could somebody test if it > helps with our problem here? > > patch: https://clbin.com/VCiYJ Tried it on kernel 5.1.15. It doesn't fix the discard (trim) problem for me. Same here with 5.1.15. Without iommu=pt the errors come back. Same here, the patch doesn't make any difference :( My bios has AGESA code 1.0.0.6, the patch is related to version 1.0.0.7.x+. did somebody try with new agesa's (1.0.0.2/1.0.0.3ab)? (In reply to nutodafozo from comment #39) > did somebody try with new agesa's (1.0.0.2/1.0.0.3ab)? Can reproduce with a Ryzen 3600x, x570 motherboard, AGESA 1.0.0.3ab. Kernel 5.2.1-arch1. hardware: threadripper 2990wx, x399, AGESA 1.1.0.2 Force MP510, Fw: ECFM12.3 kernel: 5.3.0-rc5 boot param: iommu=pt, avic=1 this issue is gone, but with iommu=pt removed io_page_fault error comes back. Is it possible that relaxed ordering is enabled on the NVME device? (lspci -vvv, look for RlxdOrd+ or RlxdOrd-) A few months ago I was getting I/O page faults with an NVME drive on an AMD system that had relaxed ordering enabled, because (as I recall) the drive's write to the completion queue got reordered before the last data write, and the amd_iommu driver had already unmapped the data buffer before the last write went through the hardware, which caused the IOMMU fault. If it is enabled, you could try disabling it. > If it is enabled, you could try disabling it.
Can you give me some instructions or directions how to do this? Haven't found much on the internet.
I think you'll need the pciutils package for lspci / setpci. First find the bus/device/function number of your NVMe drive... probably "lspci |grep -i nvme" will show you. It'll be some numbers like 0000:05:00.0 if it's on PCI bus 5, device 0, function 0. Once you have that you can do... lspci -vvvv -s 0000:05:00.0 |grep Rlxd (use your numbers, not 0000:05:00.0) That will show you if it is even enabled... if you see RlxdOrd+, then read the device control register in the PCI express capability structure. I don't have a system in front of me to test this on, so it may not work, but I think you should be able to do that with: setpci -s 0000:05:00.0 CAP_EXP+0x08.w That will show you the 16-bit value of the pci express device control register. You want to write that value, except set bit 4 to 0. That means subtract 0x10 from the value, assuming bit 4 is set... so if you read 0x1234, clearing bit 4 would result in 0x1224. Write that value with: setpci -s 0000:05:00.0 CAP_EXP+0x08.w=0x1224 (use your number, not 0x1224!) Then do the original "lspci -vvv -s 0000:05:00.0 | grep Rlxd" to make sure it now says "RlxdOrd-" instead of "RlxdOrd+". Your change will be wiped out if you unplug the drive or reboot the system, though... it isn't permanent. $ sudo lspci -vvvv -s 01:00.0
01:00.0 Non-Volatile memory controller: Silicon Motion, Inc. Device 2262 (rev 03) (prog-if 02 [NVM Express])
Subsystem: Silicon Motion, Inc. Device 2262
....
DevCtl: Report errors: Correctable+ Non-Fatal+ Fatal+ Unsupported+
RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop- FLReset-
....
user@pc:~$ sudo setpci -s 01:00.0 CAP_EXP+0x08.w
201f
>>> bin(0x201f)
'0b10000000011111'
$ sudo setpci -s 01:00.0 CAP_EXP+0x08.w=0x200f
$ sudo lspci -vvvv -s 01:00.0|grep Rlx
RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop- FLReset-
$ sudo systemctl start fstrim.service
Hm, is that it? No errors in dmesg now...
The workaround doesn't work here, I still get the kernel error "nvme...IO_PAGE_FAULT" and fstrim fails with "Input/output error". DevCtl: CorrErr+ NonFatalErr+ FatalErr+ UnsupReq+ RlxdOrd- ExtTag+ PhantFunc- AuxPwr- NoSnoop+ FLReset- Workaround also fails for me. Here is the lspci output after disabling RlxdOrd: 41:00.0 Non-Volatile memory controller: Device 1987:5012 (rev 01) (prog-if 02 [NVM Express]) Subsystem: Device 1987:5012 ... DevCtl: Report errors: Correctable+ Non-Fatal+ Fatal+ Unsupported+ RlxdOrd- ExtTag+ PhantFunc- AuxPwr- NoSnoop+ FLReset- MaxPayload 256 bytes, MaxReadReq 512 bytes Same here. I booted without iommu=pt and risked using mount option discard. The error messages (AMD-Vi IO_PAGE_FAULT) came before and after setting RlxdOrd- # lspci | grep -i "Non-Volatile memory controller" 01:00.0 Non-Volatile memory controller: Intel Corporation SSD 600P Series (rev 03) 04:00.0 Non-Volatile memory controller: Micron/Crucial Technology Device 2263 (rev 03) # lspci -vvvv -s 0000:01:00.0 |grep Rlxd RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop- FLReset- # lspci -vvvv -s 0000:04:00.0 |grep Rlxd RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop- FLReset- # setpci -s 01:00.0 CAP_EXP+0x08.w 201f # setpci -s 04:00.0 CAP_EXP+0x08.w 201f # setpci -s 01:00.0 CAP_EXP+0x08.w=0x200f # lspci -vvvv -s 0000:01:00.0 |grep Rlxd RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop- FLReset- # setpci -s 04:00.0 CAP_EXP+0x08.w=0x200f # lspci -vvvv -s 0000:04:00.0 |grep Rlxd RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop- FLReset- # uname -a Linux ZenMachine 5.2.11-gentoo-RYZEN #1 SMP Fri Aug 30 23:51:50 CEST 2019 x86_64 AMD Ryzen 7 1800X Eight-Core Processor AuthenticAMD GNU/Linux Found a workaround for the problem with my MP510: When I change the kmalloc_array call in nvme_setup_discard in drivers/nvme/host/core.c to unconditionally allocate a full page "range = kmalloc_array(256, sizeof *range), GFP_ATOMIC|__GFP_NOWARN)" then the IO_PAGE_FAULT messages are gone. Not sure what is going on here but I suspect a firmware bug on the MP510 (my firmware is ECFM12.1). The NVMe behaves as if it expects data area for the NVM-1.3 "6.7 Dataset Management command" to always have a full page size. When kmalloc_array is called with a smaller size it returns addresses which are not aligned to a page size. The NVMe then sees that the "offset portion of the PBAO field of PRP1 is non-zero" and assumes the address of the 2nd page to be present in PRP2. On my host PRP2 is always set to zero and when NVME tries to read it gives IO_PAGE_FAULT messages with zero address: nvme 0000:aa:bb.c: Event logged [IO_PAGE_FAULT domain=0x0000 address=0x0 flags=0x0000]. So another conclusion is that when setting iommu=pt it does not really fix my problem with MP510 but just hides the discard problem. Incredibale analysis :-) Haven't looked into the code, but maybe rounding up to a full page may be the safer alternative instead of using requesting one. My Force MP510 have firmware ECFM12.2, but it seems Corsair does not offer FW upgrades. (In reply to Eduard Hasenleithner from comment #49) > Not sure what is going on here but I suspect a firmware bug on the MP510 (my > firmware is ECFM12.1). some comments reports the same issue with different SSD controllers, not only Phison E12. (In reply to hamelg from comment #51) > some comments reports the same issue with different SSD controllers, not > only Phison E12. True. But I'm so specific here since I guess that the reason for others failing is different from my case. E.g. the other logs contain IO_PAGE_FAULT messages with nonzero address. (In reply to Eduard Hasenleithner from comment #49) > So another conclusion is that when setting iommu=pt it does not really fix > my problem with MP510 but just hides the discard problem. What's the implication? TRIM isn't actually being used? Is there a way to see if and when and how many blocks have been "trimmed"? Created attachment 284879 [details] attachment-10825-0.html The -v option to fstrim lists how many blocks were trimmed On Sat, Sep 7, 2019 at 12:32 PM <bugzilla-daemon@bugzilla.kernel.org> wrote: > https://bugzilla.kernel.org/show_bug.cgi?id=202665 > > --- Comment #53 from Andreas (andreas.thalhammer@linux.com) --- > (In reply to Eduard Hasenleithner from comment #49) > > So another conclusion is that when setting iommu=pt it does not really > fix > > my problem with MP510 but just hides the discard problem. > > What's the implication? TRIM isn't actually being used? > Is there a way to see if and when and how many blocks have been "trimmed"? > > -- > You are receiving this mail because: > You are on the CC list for the bug. (In reply to Tim Murphy from comment #54) > The -v option to fstrim lists how many blocks were trimmed Thanks. > # fstrim -v / > /: 80,9 GiB (86825152512 Bytes) trimmed I did I used the command again after one minute: > /: 1.2 GiB (1303777280 bytes) trimmed I don't really understand this, but maybe it has to do with me using the discard mount option, not a periodical fstrim. Anyway, if this is correct, TRIM should work despite 1) the errors and 2) iommu=pt. This is only a summary. It may be one large continuous block or many small fragments. The MP510 has a very high discard size: /sys/block/nvme0n1/queue/discard_granularity 512 /sys/block/nvme0n1/queue/discard_max_bytes 2199023255040 /sys/block/nvme0n1/queue/discard_max_hw_bytes 2199023255040 /sys/block/nvme0n1/queue/discard_zeroes_data 0 So a blkdiscard should be able to make this in one discard request. If you left an unpartitioned area, you can create a partition there and test blkdiscard. Or if you have a swap partition on it, swapon will do a discard AFAIK. At least for SATA/SAS, there were multiple commands/ways for trimming a device, but no idea how this is implemented with NVMes (In reply to Christoph Nelles from comment #56) > /sys/block/nvme0n1/queue/discard_granularity 512 > /sys/block/nvme0n1/queue/discard_max_bytes 2199023255040 > /sys/block/nvme0n1/queue/discard_max_hw_bytes 2199023255040 > /sys/block/nvme0n1/queue/discard_zeroes_data 0 All my NVMe-SSDs show the same values. That is a Intel 600p (nvme0n1) on the on-board NVMe connector of the motherboard and a Crucial P1 (nvme1n1) on the PCIe NVMe expansion card (SilverStone SST-ECM20 PCIe 3.0 x4 to M.2). Anyway, fstrim -v doesn't seem to work on swap devices (as they cannot be mounted) and blkdiscard only does discards, but doesn't give summaries. (Or I'm just too stupid to find the required command line options...) I have the same issue on one of my NVMe SSDs. System: MSI X570 Ace (AGESA both stock and 1.0.0.3abb), Ryzen 9 3900X on Crucial MP600 (NVMe, Firmware Version EGFM11.0, Phison E16). I have a second NVMe drive (Samsung 950 Pro) which is not affected. The messages I got were the same as in comment 49: nvme 0000:01:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0000 address=0x0 flags=0x0000] So I've tried workaround from the same workaround (unconditionally allocate a full page) and it also helped to get rid of those error messages. (In reply to Christoph Nelles from comment #50) > Incredibale analysis :-) Haven't looked into the code, but maybe rounding up > to a full page may be the safer alternative instead of using requesting one. > > My Force MP510 have firmware ECFM12.2, but it seems Corsair does not offer > FW upgrades. here is ECFM12.3 for MP510 http://ssd.borecraft.com/photos/Phison%20ECFM12.3%20Firmware%20update-20190720T063056Z-001.zip by the way I have tried it, there is no change with respect to this issue. but you get little performance improvement. I've now investigated the situation also for an intel 660p device with firmware 002C. (This should have a Silicon Motion SM2263EN controller.) With this I'm also getting IO_PAGE_FAULT logs. The controller is behaving differently, but IMHO also non-conformant to NVMe spec: * The controller always reads a multiple of 512 bytes * When the 512 bytes don't fit within the remaining part of the page the controller continues reading with the subsequent page. The subsequent page is really adjacent, it doesnt use the value given in PRP2 (which happens to be 0). So for this model it is sufficient to align the discard info to a 512 byte boundary. Considering all the trouble with different controllers it is probably best to allocate a multiple of a page (4096 byte) for the discard command. (I guess a page is also the maximum needed for discard). Is it realistic to get such a kernel patch accepted? How come this problem arises only on Ryzen? (In reply to nutodafozo from comment #61) > How come this problem arises only on Ryzen? Have similar problem on Cpu: fx8350 Mb: M5A99FX PRO R2.0 Ssd: Crucial P1 1TB (CT1000P1SSD8) So... not only ryzen, probably AMD-Vi in general. If it's AMD-Vi, then why patching the generic kernel code helps it for phison e12? As for my hp ex920 (SiliconMotion 2262 controller), I ran trim again and again can confirm that setting RlxdOrd- helps it - no errors. So it seems this topic has at least 3 different cases for these AMD-Vi IO_PAGE_FAULT errors 1) SM2262 controller. Solution: set RlxdOrd-. #gotta wait for another confirmation, adata 8200 user would be nice. 2) phison e12/e16 ssds. Solution: kernel patch by Eduard Hasenleithner 3) intel 660p (SM2263EN). Solution: kernel patch by Eduard Hasenleithner #gotta wait for another confirmation,crucial p1 user would be nice. ... 4) intel 600p (SM2260)? PS I should probably try Eduard's patch on my SM2262, if it'll help, too, without setting RlxdOrd- Okay, yes, changing only "segments" to 256 fixed the error for me as well. diff for drivers/nvme/host/core.c: diff -Nau1r a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c --- a/drivers/nvme/host/core.c 2019-09-14 11:27:34.986373747 +0200 +++ b/drivers/nvme/host/core.c 2019-09-13 16:14:16.937812531 +0200 @@ -564,3 +564,3 @@ - range = kmalloc_array(segments, sizeof(*range), + range = kmalloc_array(256, sizeof(*range), GFP_ATOMIC | __GFP_NOWARN); That's for both my Intel 600p and Crucial P1. More details: Board: ASUS PRIME X370-PRO, BIOS 5204 07/29/2019 CPU: AMD Ryzen 7 1800X NVMe #1: Intel M.2 600p (onboard M.2) NVMe #2: Crucial P1 (SilverStone SST-ECM20 PCIe 3.0 x4 to M.2) Distro: Gentoo Linux Kernel: 5.2.14 TRIM usage: AMD-Vi IO_PAGE_FAULT occured constantly with filesystems "discard" mount option I applied the segments-->256 patch and removed iommu=pt from the kernel cmdline. Result: * iommu active + discard or fstrim => No more AMD-Vi IO_PAGE_FAULTs About the patch: I don't understand what I'm doing here. So: What does this fix do and is this the final way to fix it? Are there some negative side effects (I assume using a variable size in kmalloc_array does make sense)? Why does it work with some other NVMe SSDs without this fix and why does RlxdOrd- make a difference on some other systems? One more datapoint and possible resolution with this specific hardware. Base Board Information Manufacturer: Micro-Star International Co., Ltd Product Name: B450-A PRO (MS-7B86) Version: 2.0 BIOS Information Vendor: American Megatrends Inc. Version: A.A0 Release Date: 08/28/2019 01:00.0 Non-Volatile memory controller: Silicon Motion, Inc. Device 2262 (rev 03) (prog-if 02 [NVM Express]) --snip-- Capabilities: [70] Express (v2) Endpoint, MSI 00 DevCap: MaxPayload 128 bytes, PhantFunc 0, Latency L0s unlimited, L1 unlimited ExtTag- AttnBtn- AttnInd- PwrInd- RBE+ FLReset+ SlotPowerLimit 0.000W DevCtl: Report errors: Correctable+ Non-Fatal+ Fatal+ Unsupported+ RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop- FLReset- MaxPayload 128 bytes, MaxReadReq 512 bytes --snip-- BOOT_IMAGE=/boot/vmlinuz-5.0.0-27-generic root=/dev/mapper/kubuntu--vg-root ro video=vesafb:off quiet splash vt.handoff=1 The "AMD-Vi: Event logged [IO_PAGE_FAULT device=01:00.0... " messages logged when trimming "fstrim -v /". Setting RlxOrd+ using "sudo setpci -s 01:00.0 CAP_EXP+0x08.w=0x200f" did not have an effect, page fault messages were still logged. System would go into a catatonic state for a few minutes while starting VM's under QEMU/KVM. Setting the BIOS setting IOMMU=Enable from the default IOMMU=Auto resolved the issue. If it matters, in my case (Corsair MP600, MSI X570 Ace motherboard) I had IOMMU=Enabled since first attempts to workaround that and only patch from #49 helped. Earlier I've said that setting RlxdOrd- helped me with hp ex920 (sm2262), turns out it doesn't. Just now got another bunch of IO_PAGE_FAULT's after fstrim.service kicked in. [190614.943785] nvme 0000:01:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0000 address=0xfcdd1000 flags=0x0000] [190614.947494] AMD-Vi: Event logged [IO_PAGE_FAULT device=01:00.0 domain=0x0000 address=0xc0986000 flags=0x0000] (In reply to Norik from comment #66) > One more datapoint and possible resolution with this specific hardware. > > Base Board Information > Manufacturer: Micro-Star International Co., Ltd > Product Name: B450-A PRO (MS-7B86) > Version: 2.0 > > BIOS Information > Vendor: American Megatrends Inc. > Version: A.A0 > Release Date: 08/28/2019 > > 01:00.0 Non-Volatile memory controller: Silicon Motion, Inc. Device 2262 > (rev 03) (prog-if 02 [NVM Express]) > > --snip-- > Capabilities: [70] Express (v2) Endpoint, MSI 00 > DevCap: MaxPayload 128 bytes, PhantFunc 0, Latency L0s > unlimited, L1 > unlimited > ExtTag- AttnBtn- AttnInd- PwrInd- RBE+ FLReset+ > SlotPowerLimit 0.000W > DevCtl: Report errors: Correctable+ Non-Fatal+ Fatal+ > Unsupported+ > RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop- FLReset- > MaxPayload 128 bytes, MaxReadReq 512 bytes > > --snip-- > > BOOT_IMAGE=/boot/vmlinuz-5.0.0-27-generic root=/dev/mapper/kubuntu--vg-root > ro video=vesafb:off quiet splash vt.handoff=1 > > > > The "AMD-Vi: Event logged [IO_PAGE_FAULT device=01:00.0... " messages logged > when trimming "fstrim -v /". Setting RlxOrd+ using "sudo setpci -s 01:00.0 > CAP_EXP+0x08.w=0x200f" did not have an effect, page fault messages were > still logged. System would go into a catatonic state for a few minutes while > starting VM's under QEMU/KVM. > > Setting the BIOS setting IOMMU=Enable from the default IOMMU=Auto resolved > the issue. Update: This report seems to be reducing the frequency of page fault logs. Does not eliminate them. Seems related to the size of the trim activity. For the record, I also tried changing the IOMMU value in the BIOS (UEFI) setup. No change. ASUS PRIME X370-PRO, BIOS 5204 07/29/2019 UEFI BIOS Setting: Advanced\AMD CBS\IOMMU=[Disabled|Enabled|Auto] Auto is enabled, and setting it to enabled doesn't change anything: I get the same errors with both settings without iommu=pt or without the patch (tested with kernel 5.2.14). I used the patch from Eduard Hasenleithner on drivers/nvme/host/core.c function kmalloc_array, segments --> 256, again on kernel 5.3.0. I use it now on a daily basis, with mount options "discard" in fstab for all filesystems which support it, and all seems stable. BIG THANKS. My version of the fix for Linux 5.3.1. Probably many errors in these few lines, but currently this works for me. --- a/drivers/nvme/host/core.c 2019-09-21 07:19:47.000000000 +0200 +++ b/drivers/nvme/host/core.c 2019-09-29 18:01:13.533381568 +0200 @@ -563,5 +563,11 @@ struct bio *bio; + size_t space_required = sizeof(*range) * segments; + size_t space_padded = round_up(space_required, PAGE_SIZE); - range = kmalloc_array(segments, sizeof(*range), - GFP_ATOMIC | __GFP_NOWARN); + if (space_required > PAGE_SIZE) { + pr_warning("Discard request larger than one page. Segments: %lu, struct size: %lu, total size: %lu, padded: %lu\n ", + (unsigned long) segments, (unsigned long) sizeof(*range), (unsigned long) space_required, (unsigned long) space_padded); + } + + range = kmalloc(space_padded, GFP_ATOMIC | __GFP_NOWARN); if (!range) Not sure if more than 256 segments are possible and not sure if pr_warning is allowed in this context. (In reply to Christoph Nelles from comment #71) > My version of the fix for Linux 5.3.1. Probably many errors in these few > lines, but currently this works for me. > Hi, Your patch is working like a charm on my system: CentOS 7.7 ELrepo kernel 5.3.1 patched (also) against with your patch release: Linux home.dmfn.ru 5.3.1-1.el7.dmfn.x86_64 #4 SMP Tue Oct 1 01:19:52 MSK 2019 x86_64 x86_64 x86_64 GNU/Linux hardware IOMMU (2 x Intel Xeon 2683v3): dmesg | grep -i iommu [ 1.088429] DMAR: IOMMU enabled ... NVMEs: root@home ~]# nvme list ... Transcend 1GB TME110S TS1TMTE110S WDC WDS256G1X0C-00ENX0 256GB No more issues with discard! THANKS! My version of the fix: --- a/drivers/nvme/host/core.c +++ b/drivers/nvme/host/core.c @@ -559,10 +559,15 @@ static blk_status_t nvme_setup_discard(struct nvme_ns *ns, struct request *req, struct nvme_command *cmnd) { unsigned short segments = blk_rq_nr_discard_segments(req), n = 0; + unsigned short alloc_size = segments; struct nvme_dsm_range *range; struct bio *bio; - range = kmalloc_array(segments, sizeof(*range), + if (ns->ctrl->quirks & NVME_QUIRK_DISCARD_ALIGN_TO_PAGE_SIZE) { + alloc_size = round_up(segments, PAGE_SIZE); + } + + range = kmalloc_array(alloc_size, sizeof(*range), GFP_ATOMIC | __GFP_NOWARN); if (!range) { /* diff --git a/drivers/nvme/host/nvme.h b/drivers/nvme/host/nvme.h index 2d678fb968c7..5abcd1bd6028 100644 --- a/drivers/nvme/host/nvme.h +++ b/drivers/nvme/host/nvme.h @@ -97,6 +97,10 @@ enum nvme_quirks { * Force simple suspend/resume path. */ NVME_QUIRK_SIMPLE_SUSPEND = (1 << 10), + /* + * Discard command should be aligned to a PAGE_SIZE + */ + NVME_QUIRK_DISCARD_ALIGN_TO_PAGE_SIZE = (1 << 11), }; /* diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c index 732d5b63ec05..af3faa468682 100644 --- a/drivers/nvme/host/pci.c +++ b/drivers/nvme/host/pci.c @@ -3012,9 +3012,13 @@ static const struct pci_device_id nvme_id_table[] = { NVME_QUIRK_DEALLOCATE_ZEROES, }, { PCI_VDEVICE(INTEL, 0xf1a5), /* Intel 600P/P3100 */ .driver_data = NVME_QUIRK_NO_DEEPEST_PS | - NVME_QUIRK_MEDIUM_PRIO_SQ }, + NVME_QUIRK_MEDIUM_PRIO_SQ | + NVME_QUIRK_DISCARD_ALIGN_TO_PAGE_SIZE, }, { PCI_VDEVICE(INTEL, 0xf1a6), /* Intel 760p/Pro 7600p */ - .driver_data = NVME_QUIRK_IGNORE_DEV_SUBNQN, }, + .driver_data = NVME_QUIRK_IGNORE_DEV_SUBNQN | + NVME_QUIRK_DISCARD_ALIGN_TO_PAGE_SIZE, }, + { PCI_VDEVICE(INTEL, 0xf1a8), /* Intel 660P */ + .driver_data = NVME_QUIRK_DISCARD_ALIGN_TO_PAGE_SIZE }, { PCI_VDEVICE(INTEL, 0x5845), /* Qemu emulated controller */ .driver_data = NVME_QUIRK_IDENTIFY_CNS | NVME_QUIRK_DISABLE_WRITE_ZEROES, }, @@ -3028,6 +3032,20 @@ static const struct pci_device_id nvme_id_table[] = { .driver_data = NVME_QUIRK_DELAY_BEFORE_CHK_RDY, }, { PCI_DEVICE(0x144d, 0xa821), /* Samsung PM1725 */ .driver_data = NVME_QUIRK_DELAY_BEFORE_CHK_RDY, }, + { PCI_DEVICE(0x1987, 0x5016), /* Phison E16 */ + .driver_data = NVME_QUIRK_DISCARD_ALIGN_TO_PAGE_SIZE, }, + { PCI_DEVICE(0x1987, 0x5012), /* Phison E12 */ + .driver_data = NVME_QUIRK_DISCARD_ALIGN_TO_PAGE_SIZE, }, + { PCI_DEVICE(0x126f, 0x2265), /* Silicon Motion SM2265 */ + .driver_data = NVME_QUIRK_DISCARD_ALIGN_TO_PAGE_SIZE, }, + { PCI_DEVICE(0x126f, 0x2263), /* Silicon Motion SM2263 */ + .driver_data = NVME_QUIRK_DISCARD_ALIGN_TO_PAGE_SIZE, }, + { PCI_DEVICE(0x126f, 0x2262), /* Silicon Motion SM2262 */ + .driver_data = NVME_QUIRK_DISCARD_ALIGN_TO_PAGE_SIZE, }, + { PCI_DEVICE(0x126f, 0x2260), /* Silicon Motion SM2260 */ + .driver_data = NVME_QUIRK_DISCARD_ALIGN_TO_PAGE_SIZE, }, + { PCI_DEVICE(0xc0a9, 0x2263), /* Crucial P1 (SM2263) */ + .driver_data = NVME_QUIRK_DISCARD_ALIGN_TO_PAGE_SIZE, }, { PCI_DEVICE(0x144d, 0xa822), /* Samsung PM1725a */ .driver_data = NVME_QUIRK_DELAY_BEFORE_CHK_RDY, }, { PCI_DEVICE(0x1d1d, 0x1f1f), /* LighNVM qemu device */ It's done by introducing a new quirk that is currently applied only to Phison E16 and E12 devices and some Device IDs I've found for the SM226x SSDs (I bet it's not all of them though). Best solution :-) But are you sure you calculated the alloc size correctly? //given there's one segment + unsigned short alloc_size = segments; //alloc_size = 1 + alloc_size = round_up(segments, PAGE_SIZE); //when quirk mode, alloc_size will be round up to 4096 + range = kmalloc_array(alloc_size, sizeof(*range), GFP_ATOMIC | __GFP_NOWARN); //allocating 4096 * sizeof(struct nvme_dsm_range) (16 bytes) = 64kb Right, it should be, I think: round_up(segments, PAGE_SIZE / sizeof(*range)); to preserve the semantics. Created attachment 285295 [details]
kernel-5.3-nvme-discard-align-to-page-size.patch
This one is fixed version of previous one (with qurik for specific model)
(In reply to Serg Shipaev from comment #72) > Your patch is working like a charm on my system: > hardware IOMMU (2 x Intel Xeon 2683v3): Just to be clear, Serg, did you have this issue on an Intel system with Intel VT-d? Because I'm seeing lots of reports here with AMD-Vi but you're the only person who even suggested this affects an Intel IOMMU. But you did not post any logs etc from such a system. (In reply to Marti Raudsepp from comment #77) > (In reply to Serg Shipaev from comment #72) > > Your patch is working like a charm on my system: > > > hardware IOMMU (2 x Intel Xeon 2683v3): > > Just to be clear, Serg, did you have this issue on an Intel system with > Intel VT-d? Hi, Marti Indeed. The Intel platform also has such an issue. And the patches above are making a pretty workaround. I think the messages on Intel Platform are different and comes from DMAR Patch works on my system. Again the specs: Board: ASUS PRIME X370-PRO, BIOS 5220 09/12/2019 CPU: AMD Ryzen 7 1800X NVMe #1: Intel M.2 600p [8086:f1a5] NVMe #2: Crucial P1 [c0a9:2263] Distro: Gentoo Linux Kernel: 5.3.2 Patch: kernel-5.3-nvme-discard-align-to-page-size.patch Thanks! I have enclosed 2 logs with and without iommu=pt Enabled IOMMU in the bios during both type of test case. Setup 1: X570 + Ryzen 3600X + Gigabyte AORUS Gen 4 1TB + Kernel 5.4.0rc1 Setup 2: X399 + TR2990WX + Corsair MP510 + Kernel 5.4.0rc1 I have not applied the PAGE alignment code modification. as per my observation with iommu=pt, it works even if the memory is 4KB unaligned but it does not when iommu=on (not pt) Created attachment 285313 [details]
log with iommu=pt
Created attachment 285315 [details]
log with iommu=on
watch for [NVME_DSM] tag (In reply to swk from comment #81) > I have enclosed 2 logs with and without iommu=pt > > Enabled IOMMU in the bios during both type of test case. > Setup 1: X570 + Ryzen 3600X + Gigabyte AORUS Gen 4 1TB + Kernel 5.4.0rc1 > Setup 2: X399 + TR2990WX + Corsair MP510 + Kernel 5.4.0rc1 > > I have not applied the PAGE alignment code modification. > > as per my observation with iommu=pt, it works even if the memory is 4KB > unaligned but it does not when iommu=on (not pt) as it's mentioned somewhere earlier on a bug, `iommu=pt` only masks the issue (iommu is enabled only for devices that are passed-through to the VM, so it cannot detect and notify about a page fault) and discard continue to work anyway. Created attachment 285319 [details]
log with AMD virtualization OFF
with IOMMU and virtualization enabled, we use the virtual bus address for the DMA between host and nvme device, for some reason i thinks its messed up.
with IOMMU pass through set, it tells the kernel not to apply the virtual bus address translation on the devices which does not support, so our nvme which comes under this category now operates only using cpu side virtual memory and not virtual bus address.
the enclosed log file confirms that root cause is iommu+virtualization. in this run I have disabled the virtualization and iommu in bios there by kernel never uses virtual bus memory which is same as IOMMU pass through.
(In reply to swk from comment #86) > Created attachment 285319 [details] > log with AMD virtualization OFF > > with IOMMU and virtualization enabled, we use the virtual bus address for > the DMA between host and nvme device, for some reason i thinks its messed > up. > > with IOMMU pass through set, it tells the kernel not to apply the virtual > bus address translation on the devices which does not support, so our nvme > which comes under this category now operates only using cpu side virtual > memory and not virtual bus address. > > the enclosed log file confirms that root cause is iommu+virtualization. in > this run I have disabled the virtualization and iommu in bios there by > kernel never uses virtual bus memory which is same as IOMMU pass through. There is another view on the root cause of the Page Fault described by original workaround author: https://bugzilla.kernel.org/show_bug.cgi?id=202665#c49 (for Phishon controllers) and https://bugzilla.kernel.org/show_bug.cgi?id=202665#c60 (for Silicon Motion controllers) I'm not qualified enough to judge, but it sounds more like the controller's firmware is the actual root cause of the issue and IOMMU only allows to detect it. Out of interest: wouldn't the fastest solution be to just set a fixed size in function nvme_setup_discard to allocate, without adding the overhead to check if a specific PCI device is actually affected or not? Naturally this would only be meaningfull if nvme_setup_discard is called often, saving cycles when being called again and again. This also assumes that memory is not limited, which in my case it isn't. I also assume that the memory page is freed after the discard command finishes. With kernel 5.3.4 I reverted to the patch by Eduard Hasenleithner in Comment 49 to unconditionally allocate 4096, one full page size. I've just tried to rewrite patch by Eduard Hasenleithner to look more like it's done in other parts of the driver - where for such cases it's more common to use quirks as overhead of 1 if statement is not a big deal. However I'm not familiar with kernel development (I just happen to build a desktop at home, that's affected by that) and it seems that there were no comments from Jens about what's the best approach to solve that. Vladimir Smirnov, your patch worked fine. I'm also just a user of a desktop system that happend to be hit by this issue. I don't know which approach is the correct one. But I hope this issue will be fixed in the kernel - for good. Both disabling PCIe relaxed ordering or using Vladimir's patch fixes the issues mentioned here, but disabling RlxdOrd is kind of useless when scheduled fstrim from systemd kicks in early during the boot process so you have to make a script to disable PCIe relaxed ordering before fstrim service doing their job. I have ADATA SX8200PNP (aka SX8200 Pro) running on a Ryzen 2500U laptop. SX8200PNP uses SM2262EN controller but have a different VID and PID, not the generic one used in the patch, so I have to add new entries to nvme_id_table with the workaround made by Vladimir and additional quirk for another problem related with SX8200PNP (NVME_QUIRK_IGNORE_DEV_SUBNQN). The way to get this fixed in the kernel is to send the patch (Vladimir's probably) to the linux-nvme mailing list. The devs rarely pay much attention to what goes on in bugzilla. Vladimir, if you can send the patch to the list, it will bring it to the right people's attention and if it needs modification, they'll tell you. I'd offer to do it myself, but I can't provide a Sign-Off on it; only you can. For the record, I've got a Phison E12 based ssd (MyDigitalSSD BPX Pro) and an intel CPU, so the error messages look different but the problem and solution are the same. I'd like to confirm that patch also helped me with another phison e12 ssd SiliconPower P34A80 (fw12.3) Want to ask if anybody else gets these errors: pcieport 0000:00:01.1: AER: Corrected error received: 0000:01:00.0 nvme 0000:01:00.0: AER: PCIe Bus Error: severity=Corrected, type=Data Link Layer, (Transmitter ID) nvme 0000:01:00.0: AER: device [1987:5012] error status/mask=00001000/00006000 nvme 0000:01:00.0: AER: [12] Timeout I have them. There seems no way to stop the corrected message, so I commented them out: --- linux-5.3.1-stock/drivers/pci/pcie/aer.c 2019-09-21 07:19:47.000000000 +0200 +++ linux-5.3.1/drivers/pci/pcie/aer.c 2019-09-29 21:49:39.579714115 +0200 @@ -1178,6 +1178,6 @@ e_info.multi_error_valid = 0; - aer_print_port_info(pdev, &e_info); + //aer_print_port_info(pdev, &e_info); - if (find_source_device(pdev, &e_info)) - aer_process_err_devices(&e_info); + //if (find_source_device(pdev, &e_info)) + // aer_process_err_devices(&e_info); } I made a docker image to build kernel 4.15.0 of Ubuntu 16.04 to apply a patch to fix problem. After this, fstrim works correctly. https://github.com/fx-kirin/docker-ubuntu-kernel-build/tree/ubuntu16.04-kernel4.15.0 The explaination is here. http://fx-kirin.com/ubuntu/fix-amd-vi-io_page_fault/ Ok. Alignment patch helped me with fx-8350 cpu and Crucial P1 1TB (CT1000P1SSD8). Relaxed ordering wasn't active in my case. But looks like bug still not mentioned in mailing lists. We don't necessary need to send a signed patch, just mention the problem and existing solution. Actually I've already started discussion in this thread: https://lists.infradead.org/pipermail/linux-nvme/2019-November/027822.html Does it mean the fix will be present with vanilla kernel 5.4 ? (In reply to hamelg from comment #99) > Does it mean the fix will be present with vanilla kernel 5.4 ? It's staged for 5.5; we can set it for stable once that merge window opens so it can it all the LTS's. I think there should be an option to default to a 4k page. And the reason is: (In reply to Vladimir Smirnov from comment #87) > ... it sounds more like the controller's > firmware is the actual root cause of the issue and IOMMU only allows to > detect it. If so, it should be the default in case the IOMMU+Virtualization is not active, because otherwise there is no way to detect it. Also, I'd like to know if the QUIRK is checked on every call of nvme_setup_discard -- if so, wouldn't it be the cleaner solution to check for affected hardware once (on initialization), then set the kmalloc_array accordingly, and statically? Hi. Don't know if you're still looking for affected devices, but I think I have one (or two): The error is as follows: [ 1924.242628] pcieport 0000:00:01.1: AER: Corrected error received: 0000:01:00.0 [ 1924.242634] nvme 0000:01:00.0: AER: PCIe Bus Error: severity=Corrected, type=Data Link Layer, (Transmitter ID) [ 1924.242637] nvme 0000:01:00.0: AER: device [1987:5013] error status/mask=00001000/00006000 [ 1924.242640] nvme 0000:01:00.0: AER: [12] Timeout [ 1925.192262] pcieport 0000:00:01.1: AER: Corrected error received: 0000:01:00.0 [ 1925.192269] nvme 0000:01:00.0: AER: PCIe Bus Error: severity=Corrected, type=Data Link Layer, (Transmitter ID) [ 1925.192274] nvme 0000:01:00.0: AER: device [1987:5013] error status/mask=00001000/00006000 [ 1925.192276] nvme 0000:01:00.0: AER: [12] Timeout [ 1925.855331] pcieport 0000:00:01.1: AER: Corrected error received: 0000:01:00.0 [ 1925.855337] nvme 0000:01:00.0: AER: PCIe Bus Error: severity=Corrected, type=Data Link Layer, (Transmitter ID) [ 1925.855341] nvme 0000:01:00.0: AER: device [1987:5013] error status/mask=00001000/00006000 [ 1925.855343] nvme 0000:01:00.0: AER: [12] Timeout ... The devices are two Gigabyte SSD: root@enterprise:~# nvme list Node SN Model Namespace Usage Format FW Rev ---------------- -------------------- ---------------------------------------- --------- -------------------------- ---------------- -------- /dev/nvme0n1 SN193808924309 GIGABYTE GP-GSM2NE3100TNTD 1 1,02 TB / 1,02 TB 512 B + 0 B EDFM00.2 /dev/nvme1n1 SN193808927365 GIGABYTE GP-GSM2NE3512GNTD 1 512,11 GB / 512,11 GB 512 B + 0 B EDFM00.2 Both sitting on an "ASUS ROG STRIX B450-I GAMING" Mini-ITX board with a Ryzen 5-3600 CPU running a PopOS! 19.10 with kernel Linux version 5.3.0-20-generic (buildd@lgw01-amd64-060) (gcc version 9.2.1 20191008 (Ubuntu 9.2.1-9ubuntu2)) #21+system76~1572304854~19.10~8caa3e6-Ubuntu SMP Tue Oct 29 00:4 (Ubuntu 5.3.0-20.21+system76~1572304854~19.10~8caa3e6-generic 5.3.7) Nov 28 12:02:33 pop-os kernel: [ 0.970766] nvme nvme0: pci function 0000:01:00.0 Nov 28 12:02:33 pop-os kernel: [ 0.970814] nvme nvme1: pci function 0000:07:00.0 Nov 28 12:02:33 pop-os kernel: [ 1.190008] nvme nvme1: missing or invalid SUBNQN field. Nov 28 12:02:33 pop-os kernel: [ 1.195375] nvme nvme0: missing or invalid SUBNQN field. Nov 28 12:02:33 pop-os kernel: [ 1.226915] nvme nvme1: allocated 128 MiB host memory buffer. Nov 28 12:02:33 pop-os kernel: [ 1.249453] nvme nvme1: 8/0/0 default/read/poll queues Nov 28 12:02:33 pop-os kernel: [ 1.254760] nvme0n1: p1 Nov 28 12:02:33 pop-os kernel: [ 1.263947] nvme nvme0: allocated 128 MiB host memory buffer. Nov 28 12:02:33 pop-os kernel: [ 1.300798] nvme nvme0: 8/0/0 default/read/poll queues Nov 28 12:02:33 pop-os kernel: [ 1.332509] nvme1n1: p1 p2 p3 p4 p5 p6 p7 p8 p9 ... Nov 28 12:02:33 pop-os kernel: [ 0.776499] pci 0000:00:00.2: AMD-Vi: IOMMU performance counters supported Nov 28 12:02:33 pop-os kernel: [ 0.776499] pci 0000:00:00.2: AMD-Vi: IOMMU performance counters supported Nov 28 12:02:33 pop-os kernel: [ 0.779361] pci 0000:00:00.2: AMD-Vi: Found IOMMU cap 0x40 Nov 28 12:02:33 pop-os kernel: [ 0.779361] pci 0000:00:00.2: AMD-Vi: Extended features (0x58f77ef22294ade): Nov 28 12:02:33 pop-os kernel: [ 0.779362] PPR X2APIC NX GT IA GA PC GA_vAPIC Nov 28 12:02:33 pop-os kernel: [ 0.779364] AMD-Vi: Interrupt remapping enabled Nov 28 12:02:33 pop-os kernel: [ 0.779364] AMD-Vi: Virtual APIC enabled Nov 28 12:02:33 pop-os kernel: [ 0.779364] AMD-Vi: X2APIC enabled Nov 28 12:02:33 pop-os kernel: [ 0.779442] AMD-Vi: Lazy IO/TLB flushing enabled Nov 28 12:02:33 pop-os kernel: [ 0.780098] amd_uncore: AMD NB counters detected Nov 28 12:02:33 pop-os kernel: [ 0.780101] amd_uncore: AMD LLC counters detected Nov 28 12:02:33 pop-os kernel: [ 0.780221] LVT offset 0 assigned for vector 0x400 Nov 28 12:02:33 pop-os kernel: [ 0.780278] perf: AMD IBS detected (0x000003ff) Nov 28 12:02:33 pop-os kernel: [ 0.780282] perf/amd_iommu: Detected AMD IOMMU #0 (2 banks, 4 counters/bank). ... Please tell me, if you need more data. (I will install Buster in the next days, so I can also double check the error messages.) Regards, Dirk Your message doesn't sound like the same issue discussed in this bug. However you can still try this patch and just verify that pci I'd of your ssd is there. These AER timeouts are probably phison e12 specific, I have them, Christoph Nelles above confirmed, he has them, too. For the record, this is the latest version of the patch: http://git.infradead.org/nvme.git/commitdiff/530436c45ef2e446c12538a400e465929a0b3ade?hp=400b6a7b13a3fd71cff087139ce45dd1e5fff444 Shouldn’t it also be backported to older stable kernels? Forgive me if this is a dumb question, but are there some instructions regarding how one could implement this fix? I see that the fix is within: drivers/nvme/host/core.c ...and replacing 'segments' with '256' - :range = kmalloc_array(segments, sizeof(*range), + : range = kmalloc_array(256, sizeof(*range), 1) How does one get to/access (drivers/nvme/host/core.c) in order to make this change? 2) After making this change, does anything else need to be done, other than a reboot I'm guessing? (In reply to AM from comment #106) > 1) How does one get to/access (drivers/nvme/host/core.c) in order to make > this change? > > 2) After making this change, does anything else need to be done, other than > a reboot I'm guessing? The basics are: https://www.wikihow.com/Compile-the-Linux-Kernel 1. Get the source. Apply the patch. 2. Compile the kernel and the modules, then install it. 3. Boot that kernel. (Probably involves configuring a boot manager.) How you would do all that depends on the distribution you use. E.g. for Ubuntu, read this: https://wiki.ubuntu.com/Kernel/BuildYourOwnKernel BTW, it will be in 5.5, so if you can wait, just wait... It seems like on 5.4.3 (upgraded from 5.3.6) those phison e12 'AER: Corrected error received' Timout errors are gone. Can somebody confirm? Although I've also upgraded agesa to latest 1.0.0.4 (In reply to Andreas from comment #107) > (In reply to AM from comment #106) > > 1) How does one get to/access (drivers/nvme/host/core.c) in order to make > > this change? > > > > 2) After making this change, does anything else need to be done, other > than > > a reboot I'm guessing? > > The basics are: > https://www.wikihow.com/Compile-the-Linux-Kernel > > 1. Get the source. Apply the patch. > 2. Compile the kernel and the modules, then install it. > 3. Boot that kernel. (Probably involves configuring a boot manager.) > > How you would do all that depends on the distribution you use. > E.g. for Ubuntu, read this: > https://wiki.ubuntu.com/Kernel/BuildYourOwnKernel > > BTW, it will be in 5.5, so if you can wait, just wait... Thank you! I've been running 5.5 RC with no errors/obvious issues. I'll move to 5.5 stable when it's finally released Commit 'nvme: Discard workaround for non-conformant devices' have been applied to 5.4 and 4.19 stable kernels, i.e. 5.4.7 and 4.19.92. I've just upgraded to 5.4.7. I confirm the issue is solved :) Thanks for including the fix in 5.4.7, tested on Gentoo with gentoo-sources of the kernel. 5.4.8 and working I am using newer kernel: Linux x.home 5.4.12-200.fc31.x86_64 #1 SMP Tue Jan 14 20:07:04 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux Yet $ journalctl --no-hostname --no-pager --since=yesterday --system --priority=3 retuns the error very similar to above: Jan 23 06:57:40 kernel: iommu ivhd1: AMD-Vi: Event logged [INVALID_DEVICE_REQUEST device=00:00.0 pasid=0x00000 address=0xfffffffdf8000000 flags=0x0a00] Is this expected? Thanks. (In reply to NM from comment #115) > retuns the error very similar to above: > > Jan 23 06:57:40 kernel: iommu ivhd1: AMD-Vi: Event logged > [INVALID_DEVICE_REQUEST device=00:00.0 pasid=0x00000 > address=0xfffffffdf8000000 flags=0x0a00] > > Is this expected? That must be unrelated to this bugzilla as the device reported is 00:00.0. That's the host bridge, where this issue was dealing with an nvme end device. Sorry, for confusion. Thanks. Hi. I think I'm seeing a very similar issue: [51108.041160] amd_iommu_report_page_fault: 1978716 callbacks suppressed [51108.041161] AMD-Vi: Event logged [IO_PAGE_FAULT device=05:00.0 domain=0x0018 address=0xfec8a96c flags=0x0000] uname -a Linux eNTi 5.5.7-2-ck-zen2 #1 SMP PREEMPT Sat, 29 Feb 2020 23:56:35 +0000 x86_64 GNU/Linux 03:00.0 Non-Volatile memory controller: Samsung Electronics Co Ltd NVMe SSD Controller SM981/PM981/PM983 CPU: AMD Ryzen 7 3800X MB: ASUS X570P PRIME I also see the following but it is probably a seperate issue: [51103.031076] amd_iommu_report_page_fault: 1926858 callbacks suppressed [51103.031077] snd_emu10k1 0000:05:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0018 address=0xfec8a96c flags=0x0000] This will crash my sound card after a while. Notice(In reply to Alexander Jenisch from comment #118) > [51108.041160] amd_iommu_report_page_fault: 1978716 callbacks suppressed > [51108.041161] AMD-Vi: Event logged [IO_PAGE_FAULT device=05:00.0 > domain=0x0018 address=0xfec8a96c flags=0x0000] > > uname -a > Linux eNTi 5.5.7-2-ck-zen2 #1 SMP PREEMPT Sat, 29 Feb 2020 23:56:35 +0000 > x86_64 GNU/Linux > > 03:00.0 Non-Volatile memory controller: Samsung Electronics Co Ltd NVMe SSD > Controller SM981/PM981/PM983 Notice the error event references device 05:00.0, but your nvme is 03:00.0. Your issue is not related to this bugzilla. Created attachment 298273 [details]
kernel dmesg (kernel 5.14-rc5, AMD FX-8370)
Looks like I hit this issue too on my board (AM3+, AMD FX8370). But on current v5.14rc-5 kernel. Is this being worked on?
Getting lotsa (see kernel dmesg for more info)
[ 119.343899] Bluetooth: RFCOMM ver 1.11
nvme 0000:05:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0015 address=0xb5508000 flags=0x0050]
and
AMD-Vi: Event logged [IO_PAGE_FAULT device=05:00.0 domain=0x0015 address=0x80 flags=0x0000]
Some data about the NVMe-SSD:
# lspci -v -s 05:00.0
05:00.0 Non-Volatile memory controller: Shenzhen Longsys Electronics Co., Ltd. Device 2263 (rev 03) (prog-if 02 [NVM Express])
Subsystem: Shenzhen Longsys Electronics Co., Ltd. Device 2263
Flags: bus master, fast devsel, latency 0, IRQ 35, IOMMU group 20
Memory at fe900000 (64-bit, non-prefetchable) [size=16K]
Capabilities: [40] Power Management version 3
Capabilities: [50] MSI: Enable- Count=1/16 Maskable+ 64bit+
Capabilities: [70] Express Endpoint, MSI 00
Capabilities: [b0] MSI-X: Enable+ Count=16 Masked-
Capabilities: [100] Advanced Error Reporting
Capabilities: [158] Secondary PCI Express
Capabilities: [178] Latency Tolerance Reporting
Capabilities: [180] L1 PM Substates
Kernel driver in use: nvme
Some data about my box:
$ inxi -bZ
System: Kernel: 5.14.0-rc5-bdver2 x86_64 bits: 64 Desktop: MATE 1.24.1 Distro: Gentoo Base System release 2.7
Machine: Type: Desktop System: Gigabyte product: N/A v: N/A serial: <superuser/root required>
Mobo: Gigabyte model: 970-GAMING v: x.x serial: <superuser/root required> UEFI: American Megatrends v: F2
date: 04/06/2016
CPU: Info: 8-Core AMD FX-8370 [MCP] speed: 1727 MHz min/max: 1400/4000 MHz
Graphics: Device-1: Advanced Micro Devices [AMD/ATI] Navi 14 [Radeon RX 5500/5500M / Pro 5500M] driver: amdgpu v: kernel
Display: x11 server: X.Org 1.20.11 driver: amdgpu,ati unloaded: fbdev,modesetting,radeon resolution: 1920x1080~60Hz
OpenGL: renderer: Radeon RX 5500 XT (NAVI14 DRM 3.42.0 5.14.0-rc5-bdver2 LLVM 12.0.1) v: 4.6 Mesa 21.1.4
Network: Device-1: Qualcomm Atheros Killer E2400 Gigabit Ethernet driver: alx
(In reply to Erhard F. from comment #120) > Created attachment 298273 [details] > kernel dmesg (kernel 5.14-rc5, AMD FX-8370) > > Looks like I hit this issue too on my board (AM3+, AMD FX8370). But on > current v5.14rc-5 kernel. Is this being worked on? > > Getting lotsa (see kernel dmesg for more info) > [ 119.343899] Bluetooth: RFCOMM ver 1.11 > nvme 0000:05:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0015 > address=0xb5508000 flags=0x0050] > and > AMD-Vi: Event logged [IO_PAGE_FAULT device=05:00.0 domain=0x0015 > address=0x80 flags=0x0000] I'm not sure it could be the same issue since the fix for the root cause is still in place for 5.14. Are you observing these errors from trim/discard requests? (In reply to Keith Busch from comment #121) > I'm not sure it could be the same issue since the fix for the root cause is > still in place for 5.14. Are you observing these errors from trim/discard > requests? Yes, I can force these errors with 'trim -v /'. Also when booted with iommu=soft no such errors show up. Fresh installation. Debian 11 (bullseye). Pair of identical 1TB INTEL SSDPEKNW010T9 in mdadm raid1. lvm on raid. Issue persists with or without a filesystem. Motherboard - Gigabyte Technology Co., Ltd. X570 AORUS ELITE. CPU - 16x AMD Ryzen 7 3700X 8-Core Processor. American Megatrends International, LLC. BIOS - F36e (14 Oct 2021). From dmesg: [ 5.717671] nvme 0000:04:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0018 address=0xfedfc000 flags=0x0050] [ 5.717734] nvme 0000:04:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0018 address=0xfedfc080 flags=0x0050] kernel_device +pci:0000:04:00.0 kernel_subsystem pci transport kernel udev_sysname 0000:04.00.0 GRUB_CMDLINE_LINUX_DEFAULT="quiet iommu=soft" and update-grub makes no difference. uname -r 5.10.0-9-amd64 Getting a very similar situation here with a legacy PCI raid controller when I enable IOMMU. System is Asus X370 with AMD Ryzen CPU. LSPCI output related to the controller: 07:01.0 RAID bus controller: Broadcom / LSI MegaRAID (rev 01) Subsystem: Broadcom / LSI MegaRAID SATA 150-4 RAID Controller Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx- Status: Cap+ 66MHz+ UDF- FastB2B+ ParErr- DEVSEL=slow >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- Latency: 32, Cache Line Size: 64 bytes Interrupt: pin A routed to IRQ 30 IOMMU group: 9 Region 0: Memory at f0000000 (32-bit, prefetchable) [size=64K] Expansion ROM at fcc00000 [disabled] [size=64K] Capabilities: [80] Power Management version 2 Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-) Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME- Kernel driver in use: megaraid Kernel modules: megaraid_mbox This is on Arch, Linux version 5.17.1-arch1-1 and also some other recent distros I tried. As soon as I mount the logical drive from the raid controller with IOMMU enabled in UEFI setup the errors start pooring out: Apr 03 08:52:07 archiso kernel: AMD-Vi: Event logged [IO_PAGE_FAULT device=07:00.0 domain=0x000a address=0xbfdb0000 flags=0x0000] Apr 03 08:52:07 archiso kernel: AMD-Vi: Event logged [IO_PAGE_FAULT device=07:00.0 domain=0x000a address=0xbfdb0080 flags=0x0000] Apr 03 08:52:07 archiso kernel: AMD-Vi: Event logged [IO_PAGE_FAULT device=07:00.0 domain=0x000a address=0xbfdb00c0 flags=0x0000] Apr 03 08:52:07 archiso kernel: AMD-Vi: Event logged [IO_PAGE_FAULT device=07:00.0 domain=0x000a address=0xbfdb0040 flags=0x0000] Apr 03 08:52:07 archiso kernel: AMD-Vi: Event logged [IO_PAGE_FAULT device=07:00.0 domain=0x000a address=0xbfdb0100 flags=0x0000] Apr 03 08:52:07 archiso kernel: AMD-Vi: Event logged [IO_PAGE_FAULT device=07:00.0 domain=0x000a address=0xbfdb0140 flags=0x0000] Apr 03 08:52:07 archiso kernel: AMD-Vi: Event logged [IO_PAGE_FAULT device=07:00.0 domain=0x000a address=0xbfda0000 flags=0x0000] Apr 03 08:52:07 archiso kernel: AMD-Vi: Event logged [IO_PAGE_FAULT device=07:00.0 domain=0x000a address=0xbfda0080 flags=0x0000] Apr 03 08:52:07 archiso kernel: AMD-Vi: Event logged [IO_PAGE_FAULT device=07:00.0 domain=0x000a address=0xbfda00c0 flags=0x0000] Apr 03 08:52:07 archiso kernel: AMD-Vi: Event logged [IO_PAGE_FAULT device=07:00.0 domain=0x000a address=0xbfda0100 flags=0x0000] Apr 03 08:52:45 archiso kernel: amd_iommu_report_page_fault: 5235 callbacks suppressed With IOMMU disabled there are no errors. Looks like this is related, I am wondering if the fix from Eduard Hasenleithner for the nvme drives is also applicable here somehow. ASUS X570, ROG Strix X570-E Gaming, AMD Ryzen CPU. Ubuntu, custom kernel Linux version 5.19.6-tkg-pds "This wasn't an issue on < 5.18" The whole computer freezes like it's out of memory and HDD light blinks with regular interval. After reboot dmessage showed this [ 1.440475] nvme 0000:01:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x000f address=0xffffc000 flags=0x0050] [ 1.444305] nvme nvme0: 15/0/0 default/read/poll queues [ 1.447595] nvme0n1: p1 lspci 01:00.0 Non-Volatile memory controller: ADATA Technology Co., Ltd. XPG SX8200 Pro PCIe Gen3x4 M.2 2280 Solid State Drive (rev 03) lspci -vv -s 01:00.0 Non-Volatile memory controller: ADATA Technology Co., Ltd. XPG SX8200 Pro PCIe Gen3x4 M.2 2280 Solid State Drive (rev 03) (prog-if 02 [NVM Express]) Subsystem: ADATA Technology Co., Ltd. XPG SX8200 Pro PCIe Gen3x4 M.2 2280 Solid State Drive Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+ Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort+ <MAbort- >SERR- <PERR- INTx- Latency: 0, Cache Line Size: 64 bytes Interrupt: pin A routed to IRQ 84 NUMA node: 0 IOMMU group: 14 Region 0: Memory at fcf00000 (64-bit, non-prefetchable) [size=16K] Capabilities: [40] Power Management version 3 Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-) Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME- Capabilities: [50] MSI: Enable- Count=1/8 Maskable+ 64bit+ Address: 0000000000000000 Data: 0000 Masking: 00000000 Pending: 00000000 Capabilities: [70] Express (v2) Endpoint, MSI 00 DevCap: MaxPayload 256 bytes, PhantFunc 0, Latency L0s unlimited, L1 unlimited ExtTag- AttnBtn- AttnInd- PwrInd- RBE+ FLReset+ SlotPowerLimit 75.000W DevCtl: CorrErr+ NonFatalErr+ FatalErr+ UnsupReq+ RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop- FLReset- MaxPayload 256 bytes, MaxReadReq 512 bytes DevSta: CorrErr+ NonFatalErr- FatalErr- UnsupReq+ AuxPwr+ TransPend- LnkCap: Port #0, Speed 8GT/s, Width x4, ASPM L1, Exit Latency L1 <8us ClockPM+ Surprise- LLActRep- BwNot- ASPMOptComp+ LnkCtl: ASPM Disabled; RCB 64 bytes, Disabled- CommClk+ ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt- LnkSta: Speed 8GT/s (ok), Width x4 (ok) TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt- DevCap2: Completion Timeout: Range ABCD, TimeoutDis+ NROPrPrP- LTR+ 10BitTagComp- 10BitTagReq- OBFF Not Supported, ExtFmt- EETLPPrefix- EmergencyPowerReduction Not Supported, EmergencyPowerReductionInit- FRS- TPHComp- ExtTPHComp- AtomicOpsCap: 32bit- 64bit- 128bitCAS- DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis- LTR+ OBFF Disabled, AtomicOpsCtl: ReqEn- LnkCap2: Supported Link Speeds: 2.5-8GT/s, Crosslink- Retimer- 2Retimers- DRS- LnkCtl2: Target Link Speed: 8GT/s, EnterCompliance- SpeedDis- Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS- Compliance De-emphasis: -6dB LnkSta2: Current De-emphasis Level: -3.5dB, EqualizationComplete+ EqualizationPhase1+ EqualizationPhase2+ EqualizationPhase3+ LinkEqualizationRequest- Retimer- 2Retimers- CrosslinkRes: unsupported Capabilities: [b0] MSI-X: Enable+ Count=16 Masked- Vector table: BAR=0 offset=00002000 PBA: BAR=0 offset=00002100 Capabilities: [100 v2] Advanced Error Reporting UESta: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol- UEMsk: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol- UESvrt: DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol- CESta: RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr- CEMsk: RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr+ AERCap: First Error Pointer: 00, ECRCGenCap+ ECRCGenEn- ECRCChkCap+ ECRCChkEn- MultHdrRecCap- MultHdrRecEn- TLPPfxPres- HdrLogCap- HeaderLog: 00000000 00000000 00000000 00000000 Capabilities: [158 v1] Secondary PCI Express LnkCtl3: LnkEquIntrruptEn- PerformEqu- LaneErrStat: 0 Capabilities: [178 v1] Latency Tolerance Reporting Max snoop latency: 1048576ns Max no snoop latency: 1048576ns Capabilities: [180 v1] L1 PM Substates L1SubCap: PCI-PM_L1.2+ PCI-PM_L1.1+ ASPM_L1.2+ ASPM_L1.1+ L1_PM_Substates+ PortCommonModeRestoreTime=10us PortTPowerOnTime=10us L1SubCtl1: PCI-PM_L1.2- PCI-PM_L1.1- ASPM_L1.2- ASPM_L1.1- T_CommonMode=0us LTR1.2_Threshold=32768ns L1SubCtl2: T_PwrOn=10us Kernel driver in use: nvme Kernel modules: nvme === START OF INFORMATION SECTION === Model Number: ADATA SX8200PNP Serial Number: 2K35291G16FL Firmware Version: 42B2S7JA PCI Vendor/Subsystem ID: 0x1cc1 IEEE OUI Identifier: 0x000000 Controller ID: 1 NVMe Version: 1.3 Number of Namespaces: 1 (In reply to Joona-Matias Heikkilä from comment #125) > ASUS X570, ROG Strix X570-E Gaming, AMD Ryzen CPU. > > Ubuntu, custom kernel Linux version 5.19.6-tkg-pds > > "This wasn't an issue on < 5.18" > > The whole computer freezes like it's out of memory and HDD light blinks with > regular interval. > MSI X470 GAMING PLUS (MS-7B79) Ryzen 3600X 01:00.0 Non-Volatile memory controller: ADATA Technology Co., Ltd. XPG SX8200 Pro PCIe Gen3x4 M.2 2280 Solid State Drive (rev 03) (prog-if 02 [NVM Express]) Subsystem: ADATA Technology Co., Ltd. XPG SX8200 Pro PCIe Gen3x4 M.2 2280 Solid State Drive Sort of the same going on here. On heavy disk writes occasionally the HDD light blinks, "life stops", but it resolves after some time -- within a minute or so. Going to watch dmesg... Linux eryn 5.4.0-125-generic #141-Ubuntu SMP Wed Aug 10 13:42:03 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux The situation is present for a while now, can't tell when it started. using iommu=soft Made a query to ADATA and they basically told me to install their "ONLY for windows" software, if I couldn't do that then ask the vendor to help and if that is not possible make an RMA, asked for Linux program and no. I have come across this same issue with a (SATA) KINGSTON SSD 2.5" it was freezing every 10 minutes "5.13 kernel", somebody said that there are issues with certain SSD controllers, I updated KINGSTON*S firmware and it seems to not freeze anymore. This ADATA freezin only started to happen this summer. Ironically after I updated the kingston sdd's firmware. No log entries here when this occurs. Looks like just disk access grinds to a halt, browsers happily play videos and whatnot. I'll throw out the SSD soonish, it nears 80% lifetime. @Adam Groszer yeah, same here, i can't wait to change this thing. [ 1.390941] nvme 0000:01:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x000b address=0xffffc000 flags=0x0000] [ 1.390976] nvme 0000:01:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x000b address=0xffffc080 flags=0x0000] Still getting this, and this https://bugzilla.kernel.org/show_bug.cgi?id=216809 Obviously vendor that doesn't care (HP ex900)... well it was "cheap" back then. Once it nears 80% I am also getting a decent one. |
Created attachment 281315 [details] 4.20.12 kernel config Hello. I think I've found a new issue with nvme that was not reported yet. I've installed amd 2400g, asus tuf b450m pro, nvme adata sx8200 yesterday. First of all I've updated to latest BIOS 0604. Than used sysresccd (kernel 4.19) with "iommu=soft" to boot and install gentoo base system with latest toolchain: gcc 8.2.0, binutils 2.30-r4, glibc 2.27-r6, linux-firmware 2019022, linux-headers 4.20, kernel 4.20.12. I've configured kernel with recommended options for raven ridge. I will attach ".config" file to this issue. Than I've created fstab with: / ext4 discard,noatime 0 1 /home ext4 discard,noatime 0 2 and made a first reboot. All things worked as expected. Than I proceed with installation further. After 20 minutes I've checked dmesg and found that AMD-Vi was spamming errors: # dmesg [ 145.127297] nvme 0000:06:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0000 address=0x00000000fd769100 flags=0x0000] [ 145.127301] AMD-Vi: Event logged [IO_PAGE_FAULT device=06:00.0 domain=0x0000 address=0x00000000fd769180 flags=0x0000] I didn't received any issues with my data on SSD. Smartctl said that disk is good, cold and feel fine. I will attach "dmesg.bad". Than I've enabled "iommu=soft", rebooted and issue disappeared. Than I removed "iommu=soft" and "discard" from fstab, rebooted and issue disappeared too. I will attach "dmesg.iommu.soft" and "dmesg.without.discard". So I can reproduce IO_PAGE_FAULT with hardware IOMMU enabled and discard only. People on archlinux forum reproduced same issue on amd 2700, asus tuf b450 plus, and nvme intel SSDPEKKW256G8 without discard but with manual fstrim. I think I will use system with "iommu=soft" and discard for now. Please let me know how to debug this issue to provide more details. Thank you. PS please do not look at "[drm]" errors, I will configure amdgpu later.