Bug 108711

Summary: AMD-Vi: Event logged [IO_PAGE_FAULT device=00:14.1 domain=0x0000 address=0x0000000100000840 flags=0x0070]
Product: Platform Specific/Hardware Reporter: Eik Binschek (binschek)
Component: x86-64Assignee: platform_x86_64 (platform_x86_64)
Status: NEW ---    
Severity: high CC: alexander.schmiechen, animtim, arnaudh1, bill, edmund.laugasson, germano.massullo, kalle.widell, kaltofen, konoha02, lvml, mtpmoni, piotrszegda, reuben_p, vedran, ZeskoTron
Priority: P1    
Hardware: x86-64   
OS: Linux   
Kernel Version: 4.2.2 4.2.3 Subsystem:
Regression: No Bisected commit-id:
Attachments: Hardware Information
dmesg file

Description Eik Binschek 2015-12-01 14:59:23 UTC
Created attachment 196261 [details]
Hardware Information

Kernel does not start with enabled AMD-Vi and used "[AMD/ATI] SB7x0/SB8x0/SB9x0 SATA Controller [AHCI mode]".

Following lines are shown on console:
AMD-Vi: Event logged [IO_PAGE_FAULT device=00:14.1 domain=0x0000 address=0x0000000100000840 flags=0x0070]

On our hardware (see: attached hardware information) this problem happen with following distributions
- Proxmox VE 4.0 (based on debian jessie with ubuntu 15.10 kernel)
-- Kernel: 4.2.2-1-pve
-- Kernel: 4.2.3-1-pve
- Ubuntu 15.10
-- Kernel: 4.2.3
- Fedora 23
-- Kernel: 4.2.3-300.fc23.x86_64

I think this is common problem of AMD-Vi.
Comment 1 Gordon Kaltofen 2015-12-01 15:57:15 UTC
Hi, I'm a colleague of the bug reporter.

There is an equivalent, specific Bug report on Red Hat Bugzilla: Bug 1274527 (https://bugzilla.redhat.com/show_bug.cgi?id=1274527)

The use a slightly different hardware, but with identical components affected - AMD-Vi (IOMMU) in combination with Chipset AMD FCH SATA Controller.

However, since this error occurred with the 4.2.x kernels of different distributions, we believe in a problem in the kernel itself.
Comment 2 Bill Maidment 2016-01-31 07:29:39 UTC
Hi
I have recently come across this same issue with Scientific Linux 7.1 kernel-3.10.0-327.3 thru 3.10.0-327.4.5
When I revert to kernel-3.10.0-229.20.1 the issue goes away.

CPU = AMD FX(tm)-8120 Eight-Core Processor
MOBO = GA-990FXA-D3

If I boot with amd-iommu=off then the issue goes away, but I need IOMMU switched on for usb devices used by virtual machines.

Cheers
Bill
Comment 3 Gordon Kaltofen 2016-02-01 17:30:59 UTC
Update: 

The following last kernel has the problem still:
- Proxmox VE 4.0 (based on debian jessie with ubuntu 15.10 kernel)
-- Kernel: 4.2.2-1-pve
-- Kernel: 4.2.3-1-pve
-- Kernel: 4.2.6-1-pve <-
Comment 4 Piotr Szegda 2016-04-01 06:23:46 UTC
Update:
In my case problem still occurs on mainline Ubuntu kernels: 4.4, 4.5, 4.6-rc1. Setting amd-iommu=off or disable iommu in bios make system hangs during boot.
Comment 5 alexander.schmiechen 2016-04-05 13:19:23 UTC
Same here. AMD A8-5500, ASRock FM2A55M-DGS motherboard.

Kernel 4.5 from a fresh openSUSE Tumbleweed install throws out AMD Vi IO_PAGE_FAULTs. Boot process stops while trying to connect to the hard drives. Switching off IOMMU in BIOS leads to a full stop with an sp5100_tco already in use issue.

No problems occured with older kernels up to 4.1.
Comment 6 arnaud 2016-04-18 20:28:09 UTC
Hello , I also have this bug on my computer since kernel updates Linux from version 4.2 . Since April 16 , a new version of "linux- lts " package on Arch platform ( which went from version 4.1 to version 4.4) oblige me block the updates because I can not start on new versions.

IO_PAGE_FAULT device=00:11.0 domain=0x0005 address=0x0000000000000000 flags=0x0020]

AMD-Vi: Event logged [IO_PAGE_FAULT device=00:11.0 domain=0x0005 address=0x0000000000000000 flags=0x0020]

$ head -n7 /proc/cpuinfo
processor       : 0
vendor_id       : AuthenticAMD
cpu family      : 21
model           : 16
model name      : AMD A8-5500 APU with Radeon(tm) HD Graphics
stepping        : 1
microcode       : 0x6001119

$ lspci -v -s 00:11.0
00:11.0 RAID bus controller: Advanced Micro Devices, Inc. [AMD] FCH SATA Controller [RAID mode] (rev 40)
        Subsystem: Gigabyte Technology Co., Ltd Device b002
        Flags: bus master, 66MHz, medium devsel, latency 32, IRQ 36
        I/O ports at f140 [size=8]
        I/O ports at f130 [size=4]
        I/O ports at f120 [size=8]
        I/O ports at f110 [size=4]
        I/O ports at f100 [size=16]
        Memory at feb51000 (32-bit, non-prefetchable) [size=2K]
        Capabilities: [50] MSI: Enable+ Count=8/8 Maskable- 64bit+
        Capabilities: [70] SATA HBA v1.0
        Kernel driver in use: ahci
        Kernel modules: ahci
Comment 7 arnaud 2016-04-18 20:40:10 UTC
Created attachment 213261 [details]
dmesg file
Comment 8 Lutz Vieweg 2016-06-06 21:17:02 UTC
We have seen (what appears to be the same symptom) the first time after upgrading a server from (vanilla, stable) kernel 4.4.2 to (vanilla, stable) 4.6.1 - after it happened the ixgbe driver was unable to establish a link again on an Intel 10Gbase-T NIC, "rmmod" plus setup again revives the interface.

I have searched for and found many occurences of the same kind of IO_PAGE_FAULT error messages on the web, spread out over the years, but the text around it was always inconclusive whether this is a bug with AMD-Vi as such or whether this is due to some bad programming of the kernel driver triggering the page fault.

Jun  6 19:09:31 computer kernel: AMD-Vi: Event logged [IO_PAGE_FAULT device=04:00.0 domain=0x000e address=0x000000001004ecc0 flags=0x0050]
Jun  6 19:09:31 computer kernel: AMD-Vi: Event logged [IO_PAGE_FAULT device=04:00.0 domain=0x000e address=0x000000001004ed00 flags=0x0050]
Jun  6 19:09:35 computer kernel: ixgbe 0000:04:00.0 enp4s0: Detected Tx Unit Hang#012  Tx Queue             <3>#012  TDH, TDT             <1ce>, <1e6>#012  next_to_use          <1e6>#012  next_to_clean        <1ce>#012tx_buffer_info[next_to_clean]#012  time_stamp           <10f7b215d>#012  jiffies              <10f7b3244>
Jun  6 19:09:35 computer kernel: ixgbe 0000:04:00.0 enp4s0: Detected Tx Unit Hang#012  Tx Queue             <1>#012  TDH, TDT             <fc>, <108>#012  next_to_use          <108>#012  next_to_clean        <fc>#012tx_buffer_info[next_to_clean]#012  time_stamp           <10f7b28c5>#012  jiffies              <10f7b3244>
Jun  6 19:09:35 computer kernel: ixgbe 0000:04:00.0 enp4s0: Detected Tx Unit Hang#012  Tx Queue             <0>#012  TDH, TDT             <16b>, <16f>#012  next_to_use          <16f>#012  next_to_clean        <16b>#012tx_buffer_info[next_to_clean]#012  time_stamp           <10f7b21d0>#012  jiffies              <10f7b3244>
Jun  6 19:09:35 computer kernel: ixgbe 0000:04:00.0 enp4s0: Detected Tx Unit Hang#012  Tx Queue             <4>#012  TDH, TDT             <69>, <8b>#012  next_to_use          <8b>#012  next_to_clean        <69>#012tx_buffer_info[next_to_clean]#012  time_stamp           <10f7b215d>#012  jiffies              <10f7b3244>
Jun  6 19:09:35 computer kernel: ixgbe 0000:04:00.0 enp4s0: tx hang 1 detected on queue 1, resetting adapter
Jun  6 19:09:35 computer kernel: ixgbe 0000:04:00.0 enp4s0: tx hang 1 detected on queue 0, resetting adapter
Jun  6 19:09:35 computer kernel: ixgbe 0000:04:00.0 enp4s0: Detected Tx Unit Hang#012  Tx Queue             <10>#012  TDH, TDT             <1c3>, <1c9>#012  next_to_use          <1c9>#012  next_to_clean        <1c3>#012tx_buffer_info[next_to_clean]#012  time_stamp           <10f7b215d>#012  jiffies              <10f7b3244>
Jun  6 19:09:35 computer kernel: ixgbe 0000:04:00.0 enp4s0: tx hang 1 detected on queue 4, resetting adapter
Jun  6 19:09:35 computer kernel: ixgbe 0000:04:00.0 enp4s0: initiating reset due to tx timeout
Jun  6 19:09:35 computer kernel: ixgbe 0000:04:00.0 enp4s0: initiating reset due to tx timeout
Jun  6 19:09:35 computer kernel: ixgbe 0000:04:00.0 enp4s0: tx hang 1 detected on queue 10, resetting adapter
Jun  6 19:09:35 computer kernel: ixgbe 0000:04:00.0 enp4s0: initiating reset due to tx timeout
Jun  6 19:09:35 computer kernel: ixgbe 0000:04:00.0 enp4s0: initiating reset due to tx timeout
Jun  6 19:09:35 computer kernel: ixgbe 0000:04:00.0 enp4s0: Reset adapter
Jun  6 19:09:35 computer kernel: ixgbe 0000:04:00.0 enp4s0: tx hang 2 detected on queue 3, resetting adapter
Jun  6 19:09:35 computer kernel: ixgbe 0000:04:00.0: master disable timed out
Jun  6 19:09:36 computer kernel: br0: port 1(enp4s0) entered disabled state
Jun  6 19:09:42 computer kernel: ixgbe 0000:04:00.0 enp4s0: NIC Link is Up 10 Gbps, Flow Control: RX/TX
Jun  6 19:09:42 computer kernel: br0: port 1(enp4s0) entered blocking state
Jun  6 19:09:42 computer kernel: br0: port 1(enp4s0) entered forwarding state
Jun  6 19:09:44 computer kernel: ixgbe 0000:04:00.0 enp4s0: Detected Tx Unit Hang#012  Tx Queue             <12>#012  TDH, TDT             <0>, <2>#012  next_to_use          <2>#012  next_to_clean        <0>#012tx_buffer_info[next_to_clean]#012  time_stamp           <10f7b4c20>#012  jiffies              <10f7b544c>
Jun  6 19:09:44 computer kernel: ixgbe 0000:04:00.0 enp4s0: tx hang 2 detected on queue 12, resetting adapter
Jun  6 19:09:44 computer kernel: ixgbe 0000:04:00.0 enp4s0: initiating reset due to tx timeout
Jun  6 19:09:44 computer kernel: ixgbe 0000:04:00.0 enp4s0: Reset adapter
Comment 9 animtim 2016-08-19 12:04:08 UTC
Hi,
I had the exact same bug from kernel 4.2 to 4.7

[IO_PAGE_FAULT device=00:11.0 domain=0x0005 address=0x0000000000000000 flags=0x0020]


and I'm happy to say, with 4.8rc2, it doesn't happen anymore and boots perfectly.
Comment 10 Vedran Miletić 2016-09-24 19:33:07 UTC
I have the same problem with

01:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Tonga XT / Amethyst XT [Radeon R9 380X / R9 M295X] (rev f1)

getting errors like

[    2.142597] AMD-Vi: Event logged [IO_PAGE_FAULT device=01:00.0 domain=0x000f address=0x000000f400622000 flags=0x0010]

still the same problem on Fedora 25 with kernel 4.8.0-0.rc7.git0.1.fc25.x86_64.
Comment 11 Germano Massullo 2016-11-25 14:07:19 UTC
I confirm the error message on motherboard Gigabyte GA-970A-DS3P with IOMMU enabled on CentOS 7 with kernel 3.10
Comment 12 Edmund Laugasson 2022-06-08 18:29:21 UTC
Having also similar issues, please see https://pastebin.com/wJ6x5ete
Comment 13 Edmund Laugasson 2022-06-08 18:30:46 UTC
Currently when screen turns off, computer freezes and only holding power button down until forcibly turns off does help.
Comment 14 Kalle 2023-01-26 17:13:43 UTC
Running ubuntu 22.04. Seems to occur randomly on resume after suspend.
Comment 15 mtpmoni 2023-01-30 07:34:19 UTC
I running an Ubuntu-server 22.04 with kernel 5.15.0-52-generic on ASRockRack Product Name: X470D4U and 
Samsung_SSD_870_QVO_2TB SSD`s.
after "kernel: ahci 0000:03:00.1: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x000f address=0x9520e000 flags=0x0000]"
ATAPorts have the following error. Example:
"ata8.00: exception Emask 0x10 SAct 0x200400b8 SErr 0x0 action 0x6 frozen"

the end reads:
blk_update_request: I/O error, dev sdd,...

I have the error since the installation with Ubuntu Server 22.04.
Comment 16 ZeskoTron 2023-03-08 11:07:22 UTC
I have the similar issue:

https://gitlab.freedesktop.org/drm/amd/-/issues/2451