Bug 108711 - AMD-Vi: Event logged [IO_PAGE_FAULT device=00:14.1 domain=0x0000 address=0x0000000100000840 flags=0x0070]
Summary: AMD-Vi: Event logged [IO_PAGE_FAULT device=00:14.1 domain=0x0000 address=0x00...
Status: NEW
Alias: None
Product: Platform Specific/Hardware
Classification: Unclassified
Component: x86-64 (show other bugs)
Hardware: x86-64 Linux
: P1 high
Assignee: platform_x86_64@kernel-bugs.osdl.org
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2015-12-01 14:59 UTC by Eik Binschek
Modified: 2016-12-04 12:58 UTC (History)
10 users (show)

See Also:
Kernel Version: 4.2.2 4.2.3
Tree: Mainline
Regression: No


Attachments
Hardware Information (71.15 KB, application/zip)
2015-12-01 14:59 UTC, Eik Binschek
Details
dmesg file (49.32 KB, text/x-log)
2016-04-18 20:40 UTC, arnaud
Details

Description Eik Binschek 2015-12-01 14:59:23 UTC
Created attachment 196261 [details]
Hardware Information

Kernel does not start with enabled AMD-Vi and used "[AMD/ATI] SB7x0/SB8x0/SB9x0 SATA Controller [AHCI mode]".

Following lines are shown on console:
AMD-Vi: Event logged [IO_PAGE_FAULT device=00:14.1 domain=0x0000 address=0x0000000100000840 flags=0x0070]

On our hardware (see: attached hardware information) this problem happen with following distributions
- Proxmox VE 4.0 (based on debian jessie with ubuntu 15.10 kernel)
-- Kernel: 4.2.2-1-pve
-- Kernel: 4.2.3-1-pve
- Ubuntu 15.10
-- Kernel: 4.2.3
- Fedora 23
-- Kernel: 4.2.3-300.fc23.x86_64

I think this is common problem of AMD-Vi.
Comment 1 Gordon Kaltofen 2015-12-01 15:57:15 UTC
Hi, I'm a colleague of the bug reporter.

There is an equivalent, specific Bug report on Red Hat Bugzilla: Bug 1274527 (https://bugzilla.redhat.com/show_bug.cgi?id=1274527)

The use a slightly different hardware, but with identical components affected - AMD-Vi (IOMMU) in combination with Chipset AMD FCH SATA Controller.

However, since this error occurred with the 4.2.x kernels of different distributions, we believe in a problem in the kernel itself.
Comment 2 Bill Maidment 2016-01-31 07:29:39 UTC
Hi
I have recently come across this same issue with Scientific Linux 7.1 kernel-3.10.0-327.3 thru 3.10.0-327.4.5
When I revert to kernel-3.10.0-229.20.1 the issue goes away.

CPU = AMD FX(tm)-8120 Eight-Core Processor
MOBO = GA-990FXA-D3

If I boot with amd-iommu=off then the issue goes away, but I need IOMMU switched on for usb devices used by virtual machines.

Cheers
Bill
Comment 3 Gordon Kaltofen 2016-02-01 17:30:59 UTC
Update: 

The following last kernel has the problem still:
- Proxmox VE 4.0 (based on debian jessie with ubuntu 15.10 kernel)
-- Kernel: 4.2.2-1-pve
-- Kernel: 4.2.3-1-pve
-- Kernel: 4.2.6-1-pve <-
Comment 4 Piotr Szegda 2016-04-01 06:23:46 UTC
Update:
In my case problem still occurs on mainline Ubuntu kernels: 4.4, 4.5, 4.6-rc1. Setting amd-iommu=off or disable iommu in bios make system hangs during boot.
Comment 5 alexander.schmiechen 2016-04-05 13:19:23 UTC
Same here. AMD A8-5500, ASRock FM2A55M-DGS motherboard.

Kernel 4.5 from a fresh openSUSE Tumbleweed install throws out AMD Vi IO_PAGE_FAULTs. Boot process stops while trying to connect to the hard drives. Switching off IOMMU in BIOS leads to a full stop with an sp5100_tco already in use issue.

No problems occured with older kernels up to 4.1.
Comment 6 arnaud 2016-04-18 20:28:09 UTC
Hello , I also have this bug on my computer since kernel updates Linux from version 4.2 . Since April 16 , a new version of "linux- lts " package on Arch platform ( which went from version 4.1 to version 4.4) oblige me block the updates because I can not start on new versions.

IO_PAGE_FAULT device=00:11.0 domain=0x0005 address=0x0000000000000000 flags=0x0020]

AMD-Vi: Event logged [IO_PAGE_FAULT device=00:11.0 domain=0x0005 address=0x0000000000000000 flags=0x0020]

$ head -n7 /proc/cpuinfo
processor       : 0
vendor_id       : AuthenticAMD
cpu family      : 21
model           : 16
model name      : AMD A8-5500 APU with Radeon(tm) HD Graphics
stepping        : 1
microcode       : 0x6001119

$ lspci -v -s 00:11.0
00:11.0 RAID bus controller: Advanced Micro Devices, Inc. [AMD] FCH SATA Controller [RAID mode] (rev 40)
        Subsystem: Gigabyte Technology Co., Ltd Device b002
        Flags: bus master, 66MHz, medium devsel, latency 32, IRQ 36
        I/O ports at f140 [size=8]
        I/O ports at f130 [size=4]
        I/O ports at f120 [size=8]
        I/O ports at f110 [size=4]
        I/O ports at f100 [size=16]
        Memory at feb51000 (32-bit, non-prefetchable) [size=2K]
        Capabilities: [50] MSI: Enable+ Count=8/8 Maskable- 64bit+
        Capabilities: [70] SATA HBA v1.0
        Kernel driver in use: ahci
        Kernel modules: ahci
Comment 7 arnaud 2016-04-18 20:40:10 UTC
Created attachment 213261 [details]
dmesg file
Comment 8 Lutz Vieweg 2016-06-06 21:17:02 UTC
We have seen (what appears to be the same symptom) the first time after upgrading a server from (vanilla, stable) kernel 4.4.2 to (vanilla, stable) 4.6.1 - after it happened the ixgbe driver was unable to establish a link again on an Intel 10Gbase-T NIC, "rmmod" plus setup again revives the interface.

I have searched for and found many occurences of the same kind of IO_PAGE_FAULT error messages on the web, spread out over the years, but the text around it was always inconclusive whether this is a bug with AMD-Vi as such or whether this is due to some bad programming of the kernel driver triggering the page fault.

Jun  6 19:09:31 computer kernel: AMD-Vi: Event logged [IO_PAGE_FAULT device=04:00.0 domain=0x000e address=0x000000001004ecc0 flags=0x0050]
Jun  6 19:09:31 computer kernel: AMD-Vi: Event logged [IO_PAGE_FAULT device=04:00.0 domain=0x000e address=0x000000001004ed00 flags=0x0050]
Jun  6 19:09:35 computer kernel: ixgbe 0000:04:00.0 enp4s0: Detected Tx Unit Hang#012  Tx Queue             <3>#012  TDH, TDT             <1ce>, <1e6>#012  next_to_use          <1e6>#012  next_to_clean        <1ce>#012tx_buffer_info[next_to_clean]#012  time_stamp           <10f7b215d>#012  jiffies              <10f7b3244>
Jun  6 19:09:35 computer kernel: ixgbe 0000:04:00.0 enp4s0: Detected Tx Unit Hang#012  Tx Queue             <1>#012  TDH, TDT             <fc>, <108>#012  next_to_use          <108>#012  next_to_clean        <fc>#012tx_buffer_info[next_to_clean]#012  time_stamp           <10f7b28c5>#012  jiffies              <10f7b3244>
Jun  6 19:09:35 computer kernel: ixgbe 0000:04:00.0 enp4s0: Detected Tx Unit Hang#012  Tx Queue             <0>#012  TDH, TDT             <16b>, <16f>#012  next_to_use          <16f>#012  next_to_clean        <16b>#012tx_buffer_info[next_to_clean]#012  time_stamp           <10f7b21d0>#012  jiffies              <10f7b3244>
Jun  6 19:09:35 computer kernel: ixgbe 0000:04:00.0 enp4s0: Detected Tx Unit Hang#012  Tx Queue             <4>#012  TDH, TDT             <69>, <8b>#012  next_to_use          <8b>#012  next_to_clean        <69>#012tx_buffer_info[next_to_clean]#012  time_stamp           <10f7b215d>#012  jiffies              <10f7b3244>
Jun  6 19:09:35 computer kernel: ixgbe 0000:04:00.0 enp4s0: tx hang 1 detected on queue 1, resetting adapter
Jun  6 19:09:35 computer kernel: ixgbe 0000:04:00.0 enp4s0: tx hang 1 detected on queue 0, resetting adapter
Jun  6 19:09:35 computer kernel: ixgbe 0000:04:00.0 enp4s0: Detected Tx Unit Hang#012  Tx Queue             <10>#012  TDH, TDT             <1c3>, <1c9>#012  next_to_use          <1c9>#012  next_to_clean        <1c3>#012tx_buffer_info[next_to_clean]#012  time_stamp           <10f7b215d>#012  jiffies              <10f7b3244>
Jun  6 19:09:35 computer kernel: ixgbe 0000:04:00.0 enp4s0: tx hang 1 detected on queue 4, resetting adapter
Jun  6 19:09:35 computer kernel: ixgbe 0000:04:00.0 enp4s0: initiating reset due to tx timeout
Jun  6 19:09:35 computer kernel: ixgbe 0000:04:00.0 enp4s0: initiating reset due to tx timeout
Jun  6 19:09:35 computer kernel: ixgbe 0000:04:00.0 enp4s0: tx hang 1 detected on queue 10, resetting adapter
Jun  6 19:09:35 computer kernel: ixgbe 0000:04:00.0 enp4s0: initiating reset due to tx timeout
Jun  6 19:09:35 computer kernel: ixgbe 0000:04:00.0 enp4s0: initiating reset due to tx timeout
Jun  6 19:09:35 computer kernel: ixgbe 0000:04:00.0 enp4s0: Reset adapter
Jun  6 19:09:35 computer kernel: ixgbe 0000:04:00.0 enp4s0: tx hang 2 detected on queue 3, resetting adapter
Jun  6 19:09:35 computer kernel: ixgbe 0000:04:00.0: master disable timed out
Jun  6 19:09:36 computer kernel: br0: port 1(enp4s0) entered disabled state
Jun  6 19:09:42 computer kernel: ixgbe 0000:04:00.0 enp4s0: NIC Link is Up 10 Gbps, Flow Control: RX/TX
Jun  6 19:09:42 computer kernel: br0: port 1(enp4s0) entered blocking state
Jun  6 19:09:42 computer kernel: br0: port 1(enp4s0) entered forwarding state
Jun  6 19:09:44 computer kernel: ixgbe 0000:04:00.0 enp4s0: Detected Tx Unit Hang#012  Tx Queue             <12>#012  TDH, TDT             <0>, <2>#012  next_to_use          <2>#012  next_to_clean        <0>#012tx_buffer_info[next_to_clean]#012  time_stamp           <10f7b4c20>#012  jiffies              <10f7b544c>
Jun  6 19:09:44 computer kernel: ixgbe 0000:04:00.0 enp4s0: tx hang 2 detected on queue 12, resetting adapter
Jun  6 19:09:44 computer kernel: ixgbe 0000:04:00.0 enp4s0: initiating reset due to tx timeout
Jun  6 19:09:44 computer kernel: ixgbe 0000:04:00.0 enp4s0: Reset adapter
Comment 9 animtim 2016-08-19 12:04:08 UTC
Hi,
I had the exact same bug from kernel 4.2 to 4.7

[IO_PAGE_FAULT device=00:11.0 domain=0x0005 address=0x0000000000000000 flags=0x0020]


and I'm happy to say, with 4.8rc2, it doesn't happen anymore and boots perfectly.
Comment 10 Vedran Miletić 2016-09-24 19:33:07 UTC
I have the same problem with

01:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Tonga XT / Amethyst XT [Radeon R9 380X / R9 M295X] (rev f1)

getting errors like

[    2.142597] AMD-Vi: Event logged [IO_PAGE_FAULT device=01:00.0 domain=0x000f address=0x000000f400622000 flags=0x0010]

still the same problem on Fedora 25 with kernel 4.8.0-0.rc7.git0.1.fc25.x86_64.
Comment 11 Germano Massullo 2016-11-25 14:07:19 UTC
I confirm the error message on motherboard Gigabyte GA-970A-DS3P with IOMMU enabled on CentOS 7 with kernel 3.10

Note You need to log in before you can comment on or make changes to this bug.