Bug 198733 - nvme AMD-Vi IO_PAGE_FAULT
Summary: nvme AMD-Vi IO_PAGE_FAULT
Status: RESOLVED DUPLICATE of bug 202665
Alias: None
Product: Platform Specific/Hardware
Classification: Unclassified
Component: x86-64 (show other bugs)
Hardware: IA-32 Linux
: P1 normal
Assignee: platform_x86_64@kernel-bugs.osdl.org
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2018-02-08 21:48 UTC by Andreas
Modified: 2023-10-07 03:01 UTC (History)
3 users (show)

See Also:
Kernel Version: 4.15
Subsystem:
Regression: No
Bisected commit-id:


Attachments

Description Andreas 2018-02-08 21:48:54 UTC
My hardware: ASUS PRIME X370-PRO, BIOS 3803 01/22/2018

The problem: I get continuous IO_PAGE_FAULT messages in the kernel log concerning the nvme PCIe SSD (Intel M.2 600p).

# dmesg | grep -e nvme -e IO_PAGE_FAULT -e AMD-Vi
[    0.685170] AMD-Vi: IOMMU performance counters supported
[    0.688863] AMD-Vi: Found IOMMU at 0000:00:00.2 cap 0x40
[    0.688865] AMD-Vi: Extended features (0xf77ef22294ada):
[    0.688869] AMD-Vi: Interrupt remapping enabled
[    0.688870] AMD-Vi: virtual APIC enabled
[    0.689000] AMD-Vi: Lazy IO/TLB flushing enabled
[    1.511460] nvme nvme0: pci function 0000:01:00.0
[    1.722440]  nvme0n1: p1 p2 p3 p4 p5 p6 p7 p8 p9 p10 p11 p12
[    5.735095] BTRFS: device label Gentoo Linux devid 1 transid 5607 /dev/nvme0n1p9
[    5.736819] BTRFS info (device nvme0n1p9): disk space caching is enabled
[    5.737905] BTRFS info (device nvme0n1p9): has skinny extents
[    5.743099] BTRFS info (device nvme0n1p9): enabling ssd optimizations
[    6.185042] BTRFS info (device nvme0n1p9): turning on discard
[    6.185918] BTRFS info (device nvme0n1p9): disk space caching is enabled
[    6.237110] EXT4-fs (nvme0n1p12): mounted filesystem with ordered data mode. Opts: errors=remount-ro,acl,user_xattr,discard,commit=60
[    6.285184] BTRFS: device label Linux local devid 1 transid 1102 /dev/nvme0n1p11
[    6.287012] BTRFS: device label Debian testing devid 1 transid 5 /dev/nvme0n1p10
[   20.960456] nvme 0000:01:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x000e address=0x00000000f7afc000 flags=0x0000]
[   36.388257] nvme 0000:01:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x000e address=0x00000000fdf3c000 flags=0x0000]
[   36.436361] nvme 0000:01:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x000e address=0x00000000f1bbf000 flags=0x0000]
[   36.495781] nvme 0000:01:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x000e address=0x00000000f1b15000 flags=0x0000]
[ 1099.263465] nvme 0000:01:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x000e address=0x00000000f287a000 flags=0x0000]
[ 1099.410352] nvme 0000:01:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x000e address=0x00000000f7bb7000 flags=0x0000]
[ 1099.428882] nvme 0000:01:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x000e address=0x00000000fac1d000 flags=0x0000]
[ 1099.434961] nvme 0000:01:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x000e address=0x00000000fdf46000 flags=0x0000]
[ 1119.202606] nvme 0000:01:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x000e address=0x00000000f2507000 flags=0x0000]
[ 1150.444529] nvme 0000:01:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x000e address=0x00000000f8324000 flags=0x0000]
[ 1150.448632] nvme 0000:01:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x000e address=0x00000000f6cea000 flags=0x0000]
[ 1150.452425] nvme 0000:01:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x000e address=0x00000000f4322000 flags=0x0000]
[ 1150.454216] nvme 0000:01:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x000e address=0x00000000f4427000 flags=0x0000]
[ 1150.455779] nvme 0000:01:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x000e address=0x00000000f4322000 flags=0x0000]
[ 1150.457662] nvme 0000:01:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x000e address=0x00000000fca31000 flags=0x0000]
[ 1150.459379] nvme 0000:01:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x000e address=0x00000000f6cea000 flags=0x0000]
[ 1150.460935] nvme 0000:01:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x000e address=0x00000000f3493000 flags=0x0000]
[ 1150.462369] nvme 0000:01:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x000e address=0x00000000f8324000 flags=0x0000]
[ 1150.549726] nvme 0000:01:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x000e address=0x00000000ffdeb000 flags=0x0000]
[ 1334.587723] nvme 0000:01:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x000e address=0x00000000f2893000 flags=0x0000]
[ 1389.760657] nvme 0000:01:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x000e address=0x00000000fc56a000 flags=0x0000]
[ 1401.564916] nvme 0000:01:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x000e address=0x00000000f24c2000 flags=0x0000]
[ 1401.569539] nvme 0000:01:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x000e address=0x00000000f2893000 flags=0x0000]
[ 1403.089207] nvme 0000:01:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x000e address=0x00000000f24d1000 flags=0x0000]
[ 1509.056984] nvme 0000:01:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x000e address=0x00000000fccbf000 flags=0x0000]
[ 1527.431274] nvme 0000:01:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x000e address=0x00000000f5f95000 flags=0x0000]
[ 1946.016857] nvme 0000:01:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x000e address=0x00000000f1b80000 flags=0x0000]
[ 2066.958671] nvme 0000:01:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x000e address=0x00000000fc55b000 flags=0x0000]
[ 2066.984956] nvme 0000:01:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x000e address=0x00000000fb40a000 flags=0x0000]
[ 2066.992845] nvme 0000:01:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x000e address=0x00000000fb40e000 flags=0x0000]
[ 2067.014898] nvme 0000:01:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x000e address=0x00000000ff920000 flags=0x0000]
[ 2067.092809] nvme 0000:01:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x000e address=0x00000000fcc94000 flags=0x0000]
[ 2183.842616] nvme 0000:01:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x000e address=0x00000000fbbbf000 flags=0x0000]
[ 3296.024896] nvme 0000:01:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x000e address=0x00000000f1b7f000 flags=0x0000]
[ 3352.022548] nvme 0000:01:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x000e address=0x00000000f7ae2000 flags=0x0000]
[ 3415.623062] nvme 0000:01:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x000e address=0x00000000fdb3f000 flags=0x0000]
[ 3423.021804] nvme 0000:01:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x000e address=0x00000000f4319000 flags=0x0000]
[ 3497.021662] nvme 0000:01:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x000e address=0x00000000fb960000 flags=0x0000]
[ 3570.022103] nvme 0000:01:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x000e address=0x00000000f4338000 flags=0x0000]
[ 3580.730997] nvme 0000:01:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x000e address=0x00000000f24d8000 flags=0x0000]

I don't see any side effects yet. The NVMe SSD seems to work just fine, but I don't like this IO_PAGE_FAULT very much since the NVMe flash drive is a vital system. The error doesn't occur with kernel commandline option iommu=soft.

# cat /proc/cpuinfo (only last CPU):
processor       : 15
vendor_id       : AuthenticAMD
cpu family      : 23
model           : 1
model name      : AMD Ryzen 7 1800X Eight-Core Processor
stepping        : 1
microcode       : 0x8001129
cpu MHz         : 2043.666
cache size      : 512 KB
physical id     : 0
siblings        : 16
core id         : 7
cpu cores       : 8
apicid          : 15
initial apicid  : 15
fpu             : yes
fpu_exception   : yes
cpuid level     : 13
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop_tsc cpuid extd_apicid aperfmperf pni pclmulqdq monitor ssse3 fma cx16 sse4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw skinit wdt tce topoext perfctr_core perfctr_nb bpext perfctr_llc mwaitx cpb hw_pstate sme retpoline retpoline_amd vmmcall fsgsbase bmi1 avx2 smep bmi2 rdseed adx smap clflushopt sha_ni xsaveopt xsavec xgetbv1 xsaves clzero irperf xsaveerptr arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold avic v_vmsave_vmload vgif overflow_recov succor smca
bugs            : sysret_ss_attrs null_seg spectre_v1 spectre_v2
bogomips        : 7685.46
TLB size        : 2560 4K pages
clflush size    : 64
cache_alignment : 64
address sizes   : 43 bits physical, 48 bits virtual
power management: ts ttp tm hwpstate eff_freq_ro [13] [14]

# lspci -nn
00:00.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Device [1022:1450]
00:00.2 IOMMU [0806]: Advanced Micro Devices, Inc. [AMD] Device [1022:1451]
00:01.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Device [1022:1452]
00:01.1 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Device [1022:1453]
00:01.3 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Device [1022:1453]
00:02.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Device [1022:1452]
00:03.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Device [1022:1452]
00:03.1 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Device [1022:1453]
00:04.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Device [1022:1452]
00:07.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Device [1022:1452]
00:07.1 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Device [1022:1454]
00:08.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Device [1022:1452]
00:08.1 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Device [1022:1454]
00:14.0 SMBus [0c05]: Advanced Micro Devices, Inc. [AMD] FCH SMBus Controller [1022:790b] (rev 59)
00:14.3 ISA bridge [0601]: Advanced Micro Devices, Inc. [AMD] FCH LPC Bridge [1022:790e] (rev 51)
00:18.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Device [1022:1460]
00:18.1 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Device [1022:1461]
00:18.2 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Device [1022:1462]
00:18.3 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Device [1022:1463]
00:18.4 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Device [1022:1464]
00:18.5 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Device [1022:1465]
00:18.6 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Device [1022:1466]
00:18.7 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Device [1022:1467]
01:00.0 Non-Volatile memory controller [0108]: Intel Corporation Device [8086:f1a5] (rev 03)
02:00.0 USB controller [0c03]: Advanced Micro Devices, Inc. [AMD] Device [1022:43b9] (rev 02)
02:00.1 SATA controller [0106]: Advanced Micro Devices, Inc. [AMD] Device [1022:43b5] (rev 02)
02:00.2 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Device [1022:43b0] (rev 02)
03:00.0 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Device [1022:43b4] (rev 02)
03:02.0 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Device [1022:43b4] (rev 02)
03:03.0 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Device [1022:43b4] (rev 02)
03:04.0 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Device [1022:43b4] (rev 02)
03:06.0 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Device [1022:43b4] (rev 02)
03:07.0 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Device [1022:43b4] (rev 02)
07:00.0 USB controller [0c03]: ASMedia Technology Inc. Device [1b21:1343]
08:00.0 Ethernet controller [0200]: Intel Corporation I211 Gigabit Network Connection [8086:1539] (rev 03)
0a:00.0 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Device [1022:1470] (rev c1)
0b:00.0 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Device [1022:1471]
0c:00.0 VGA compatible controller [0300]: Advanced Micro Devices, Inc. [AMD/ATI] Vega [Radeon RX Vega] [1002:687f] (rev c1)
0c:00.1 Audio device [0403]: Advanced Micro Devices, Inc. [AMD/ATI] Device [1002:aaf8]
0d:00.0 Non-Essential Instrumentation [1300]: Advanced Micro Devices, Inc. [AMD] Device [1022:145a]
0d:00.2 Encryption controller [1080]: Advanced Micro Devices, Inc. [AMD] Device [1022:1456]
0d:00.3 USB controller [0c03]: Advanced Micro Devices, Inc. [AMD] Device [1022:145c]
0e:00.0 Non-Essential Instrumentation [1300]: Advanced Micro Devices, Inc. [AMD] Device [1022:1455]
0e:00.2 SATA controller [0106]: Advanced Micro Devices, Inc. [AMD] FCH SATA Controller [AHCI mode] [1022:7901] (rev 51)
0e:00.3 Audio device [0403]: Advanced Micro Devices, Inc. [AMD] Device [1022:1457]

# uname -a
Linux ZenMachine 4.15.1-gentoo #2 SMP Thu Feb 8 21:27:00 CET 2018 x86_64 AMD Ryzen 7 1800X Eight-Core Processor AuthenticAMD GNU/Linux

Searching for "AMD-Vi IO_PAGE_FAULT" reveals similar problems with other PCI hardware, such as Ethernet controllers...
Similar issues seem to be bug #194521, bug #94871, bug #108711 and bug #69421.
Comment 1 aladjev.andrew@gmail.com 2019-02-24 16:00:06 UTC
Hello. Today I've reproduced similar issue #202665. I don't know whether it is connected with your issue. Please try to disable discard option in fstab, remove iommu=soft boot param, reboot and run "fstrim -v /". If "IO_PAGE_FAULT" will appear - we will have the same issue.

We need here a developer who works with iommu. He can send us something like debug patch so we can provide better log.

I think that there is a side effect between nvme and iommu implementations. Fstrim can be just a way to reproduce it.
Comment 2 Andreas 2019-03-06 14:08:16 UTC
Yes, I must confirm this behaviour, thus it seems to be the same issue.

I've removed all "discard" mount options I had for the filesystems on NVMe in my /etc/fstab, and removed the "iommu=soft" mount option I had used before. No errors appeared. (With discard I still have those errors on a regular basis.) The fstrim command then triggered these messages.

nvme 0000:01:00.0: Event logged [IO_PAGE_FAULT domain=0x0000 address=0xXXXXXXXX flags=0x0000]
AMD-Vi: Event logged [IO_PAGE_FAULT device=01:00.0 domain=0x0000 address=0xXXXXXXXX flags=0x0000]

Only the addresses varied, domain is always 0x0000 as well as flags.
Comment 3 nutodafozo 2019-03-08 11:16:24 UTC
I can confirm this issue on Asus X470 PRO motherboard (latest bios from today 4406) and HP EX920 ssd (same sm2265 controller) - running fstrim spams dmesg with IO_PAGE_FAULT.

See https://bugzilla.kernel.org/show_bug.cgi?id=202665#c7
Comment 4 Andreas 2019-03-08 12:29:17 UTC

*** This bug has been marked as a duplicate of bug 202665 ***

Note You need to log in before you can comment on or make changes to this bug.