Bug 108711 - AMD-Vi: Event logged [IO_PAGE_FAULT device=00:14.1 domain=0x0000 address=0x0000000100000840 flags=0x0070]
Summary: AMD-Vi: Event logged [IO_PAGE_FAULT device=00:14.1 domain=0x0000 address=0x00...
Status: NEW
Alias: None
Product: Platform Specific/Hardware
Classification: Unclassified
Component: x86-64 (show other bugs)
Hardware: x86-64 Linux
: P1 high
Assignee: platform_x86_64@kernel-bugs.osdl.org
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2015-12-01 14:59 UTC by Eik Binschek
Modified: 2022-06-08 18:30 UTC (History)
11 users (show)

See Also:
Kernel Version: 4.2.2 4.2.3
Tree: Mainline
Regression: No


Attachments
Hardware Information (71.15 KB, application/zip)
2015-12-01 14:59 UTC, Eik Binschek
Details
dmesg file (49.32 KB, text/x-log)
2016-04-18 20:40 UTC, arnaud
Details

Description Eik Binschek 2015-12-01 14:59:23 UTC
Created attachment 196261 [details]
Hardware Information

Kernel does not start with enabled AMD-Vi and used "[AMD/ATI] SB7x0/SB8x0/SB9x0 SATA Controller [AHCI mode]".

Following lines are shown on console:
AMD-Vi: Event logged [IO_PAGE_FAULT device=00:14.1 domain=0x0000 address=0x0000000100000840 flags=0x0070]

On our hardware (see: attached hardware information) this problem happen with following distributions
- Proxmox VE 4.0 (based on debian jessie with ubuntu 15.10 kernel)
-- Kernel: 4.2.2-1-pve
-- Kernel: 4.2.3-1-pve
- Ubuntu 15.10
-- Kernel: 4.2.3
- Fedora 23
-- Kernel: 4.2.3-300.fc23.x86_64

I think this is common problem of AMD-Vi.
Comment 1 Gordon Kaltofen 2015-12-01 15:57:15 UTC
Hi, I'm a colleague of the bug reporter.

There is an equivalent, specific Bug report on Red Hat Bugzilla: Bug 1274527 (https://bugzilla.redhat.com/show_bug.cgi?id=1274527)

The use a slightly different hardware, but with identical components affected - AMD-Vi (IOMMU) in combination with Chipset AMD FCH SATA Controller.

However, since this error occurred with the 4.2.x kernels of different distributions, we believe in a problem in the kernel itself.
Comment 2 Bill Maidment 2016-01-31 07:29:39 UTC
Hi
I have recently come across this same issue with Scientific Linux 7.1 kernel-3.10.0-327.3 thru 3.10.0-327.4.5
When I revert to kernel-3.10.0-229.20.1 the issue goes away.

CPU = AMD FX(tm)-8120 Eight-Core Processor
MOBO = GA-990FXA-D3

If I boot with amd-iommu=off then the issue goes away, but I need IOMMU switched on for usb devices used by virtual machines.

Cheers
Bill
Comment 3 Gordon Kaltofen 2016-02-01 17:30:59 UTC
Update: 

The following last kernel has the problem still:
- Proxmox VE 4.0 (based on debian jessie with ubuntu 15.10 kernel)
-- Kernel: 4.2.2-1-pve
-- Kernel: 4.2.3-1-pve
-- Kernel: 4.2.6-1-pve <-
Comment 4 Piotr Szegda 2016-04-01 06:23:46 UTC
Update:
In my case problem still occurs on mainline Ubuntu kernels: 4.4, 4.5, 4.6-rc1. Setting amd-iommu=off or disable iommu in bios make system hangs during boot.
Comment 5 alexander.schmiechen 2016-04-05 13:19:23 UTC
Same here. AMD A8-5500, ASRock FM2A55M-DGS motherboard.

Kernel 4.5 from a fresh openSUSE Tumbleweed install throws out AMD Vi IO_PAGE_FAULTs. Boot process stops while trying to connect to the hard drives. Switching off IOMMU in BIOS leads to a full stop with an sp5100_tco already in use issue.

No problems occured with older kernels up to 4.1.
Comment 6 arnaud 2016-04-18 20:28:09 UTC
Hello , I also have this bug on my computer since kernel updates Linux from version 4.2 . Since April 16 , a new version of "linux- lts " package on Arch platform ( which went from version 4.1 to version 4.4) oblige me block the updates because I can not start on new versions.

IO_PAGE_FAULT device=00:11.0 domain=0x0005 address=0x0000000000000000 flags=0x0020]

AMD-Vi: Event logged [IO_PAGE_FAULT device=00:11.0 domain=0x0005 address=0x0000000000000000 flags=0x0020]

$ head -n7 /proc/cpuinfo
processor       : 0
vendor_id       : AuthenticAMD
cpu family      : 21
model           : 16
model name      : AMD A8-5500 APU with Radeon(tm) HD Graphics
stepping        : 1
microcode       : 0x6001119

$ lspci -v -s 00:11.0
00:11.0 RAID bus controller: Advanced Micro Devices, Inc. [AMD] FCH SATA Controller [RAID mode] (rev 40)
        Subsystem: Gigabyte Technology Co., Ltd Device b002
        Flags: bus master, 66MHz, medium devsel, latency 32, IRQ 36
        I/O ports at f140 [size=8]
        I/O ports at f130 [size=4]
        I/O ports at f120 [size=8]
        I/O ports at f110 [size=4]
        I/O ports at f100 [size=16]
        Memory at feb51000 (32-bit, non-prefetchable) [size=2K]
        Capabilities: [50] MSI: Enable+ Count=8/8 Maskable- 64bit+
        Capabilities: [70] SATA HBA v1.0
        Kernel driver in use: ahci
        Kernel modules: ahci
Comment 7 arnaud 2016-04-18 20:40:10 UTC
Created attachment 213261 [details]
dmesg file
Comment 8 Lutz Vieweg 2016-06-06 21:17:02 UTC
We have seen (what appears to be the same symptom) the first time after upgrading a server from (vanilla, stable) kernel 4.4.2 to (vanilla, stable) 4.6.1 - after it happened the ixgbe driver was unable to establish a link again on an Intel 10Gbase-T NIC, "rmmod" plus setup again revives the interface.

I have searched for and found many occurences of the same kind of IO_PAGE_FAULT error messages on the web, spread out over the years, but the text around it was always inconclusive whether this is a bug with AMD-Vi as such or whether this is due to some bad programming of the kernel driver triggering the page fault.

Jun  6 19:09:31 computer kernel: AMD-Vi: Event logged [IO_PAGE_FAULT device=04:00.0 domain=0x000e address=0x000000001004ecc0 flags=0x0050]
Jun  6 19:09:31 computer kernel: AMD-Vi: Event logged [IO_PAGE_FAULT device=04:00.0 domain=0x000e address=0x000000001004ed00 flags=0x0050]
Jun  6 19:09:35 computer kernel: ixgbe 0000:04:00.0 enp4s0: Detected Tx Unit Hang#012  Tx Queue             <3>#012  TDH, TDT             <1ce>, <1e6>#012  next_to_use          <1e6>#012  next_to_clean        <1ce>#012tx_buffer_info[next_to_clean]#012  time_stamp           <10f7b215d>#012  jiffies              <10f7b3244>
Jun  6 19:09:35 computer kernel: ixgbe 0000:04:00.0 enp4s0: Detected Tx Unit Hang#012  Tx Queue             <1>#012  TDH, TDT             <fc>, <108>#012  next_to_use          <108>#012  next_to_clean        <fc>#012tx_buffer_info[next_to_clean]#012  time_stamp           <10f7b28c5>#012  jiffies              <10f7b3244>
Jun  6 19:09:35 computer kernel: ixgbe 0000:04:00.0 enp4s0: Detected Tx Unit Hang#012  Tx Queue             <0>#012  TDH, TDT             <16b>, <16f>#012  next_to_use          <16f>#012  next_to_clean        <16b>#012tx_buffer_info[next_to_clean]#012  time_stamp           <10f7b21d0>#012  jiffies              <10f7b3244>
Jun  6 19:09:35 computer kernel: ixgbe 0000:04:00.0 enp4s0: Detected Tx Unit Hang#012  Tx Queue             <4>#012  TDH, TDT             <69>, <8b>#012  next_to_use          <8b>#012  next_to_clean        <69>#012tx_buffer_info[next_to_clean]#012  time_stamp           <10f7b215d>#012  jiffies              <10f7b3244>
Jun  6 19:09:35 computer kernel: ixgbe 0000:04:00.0 enp4s0: tx hang 1 detected on queue 1, resetting adapter
Jun  6 19:09:35 computer kernel: ixgbe 0000:04:00.0 enp4s0: tx hang 1 detected on queue 0, resetting adapter
Jun  6 19:09:35 computer kernel: ixgbe 0000:04:00.0 enp4s0: Detected Tx Unit Hang#012  Tx Queue             <10>#012  TDH, TDT             <1c3>, <1c9>#012  next_to_use          <1c9>#012  next_to_clean        <1c3>#012tx_buffer_info[next_to_clean]#012  time_stamp           <10f7b215d>#012  jiffies              <10f7b3244>
Jun  6 19:09:35 computer kernel: ixgbe 0000:04:00.0 enp4s0: tx hang 1 detected on queue 4, resetting adapter
Jun  6 19:09:35 computer kernel: ixgbe 0000:04:00.0 enp4s0: initiating reset due to tx timeout
Jun  6 19:09:35 computer kernel: ixgbe 0000:04:00.0 enp4s0: initiating reset due to tx timeout
Jun  6 19:09:35 computer kernel: ixgbe 0000:04:00.0 enp4s0: tx hang 1 detected on queue 10, resetting adapter
Jun  6 19:09:35 computer kernel: ixgbe 0000:04:00.0 enp4s0: initiating reset due to tx timeout
Jun  6 19:09:35 computer kernel: ixgbe 0000:04:00.0 enp4s0: initiating reset due to tx timeout
Jun  6 19:09:35 computer kernel: ixgbe 0000:04:00.0 enp4s0: Reset adapter
Jun  6 19:09:35 computer kernel: ixgbe 0000:04:00.0 enp4s0: tx hang 2 detected on queue 3, resetting adapter
Jun  6 19:09:35 computer kernel: ixgbe 0000:04:00.0: master disable timed out
Jun  6 19:09:36 computer kernel: br0: port 1(enp4s0) entered disabled state
Jun  6 19:09:42 computer kernel: ixgbe 0000:04:00.0 enp4s0: NIC Link is Up 10 Gbps, Flow Control: RX/TX
Jun  6 19:09:42 computer kernel: br0: port 1(enp4s0) entered blocking state
Jun  6 19:09:42 computer kernel: br0: port 1(enp4s0) entered forwarding state
Jun  6 19:09:44 computer kernel: ixgbe 0000:04:00.0 enp4s0: Detected Tx Unit Hang#012  Tx Queue             <12>#012  TDH, TDT             <0>, <2>#012  next_to_use          <2>#012  next_to_clean        <0>#012tx_buffer_info[next_to_clean]#012  time_stamp           <10f7b4c20>#012  jiffies              <10f7b544c>
Jun  6 19:09:44 computer kernel: ixgbe 0000:04:00.0 enp4s0: tx hang 2 detected on queue 12, resetting adapter
Jun  6 19:09:44 computer kernel: ixgbe 0000:04:00.0 enp4s0: initiating reset due to tx timeout
Jun  6 19:09:44 computer kernel: ixgbe 0000:04:00.0 enp4s0: Reset adapter
Comment 9 animtim 2016-08-19 12:04:08 UTC
Hi,
I had the exact same bug from kernel 4.2 to 4.7

[IO_PAGE_FAULT device=00:11.0 domain=0x0005 address=0x0000000000000000 flags=0x0020]


and I'm happy to say, with 4.8rc2, it doesn't happen anymore and boots perfectly.
Comment 10 Vedran Miletić 2016-09-24 19:33:07 UTC
I have the same problem with

01:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Tonga XT / Amethyst XT [Radeon R9 380X / R9 M295X] (rev f1)

getting errors like

[    2.142597] AMD-Vi: Event logged [IO_PAGE_FAULT device=01:00.0 domain=0x000f address=0x000000f400622000 flags=0x0010]

still the same problem on Fedora 25 with kernel 4.8.0-0.rc7.git0.1.fc25.x86_64.
Comment 11 Germano Massullo 2016-11-25 14:07:19 UTC
I confirm the error message on motherboard Gigabyte GA-970A-DS3P with IOMMU enabled on CentOS 7 with kernel 3.10
Comment 12 Edmund Laugasson 2022-06-08 18:29:21 UTC
Having also similar issues, please see https://pastebin.com/wJ6x5ete
Comment 13 Edmund Laugasson 2022-06-08 18:30:46 UTC
Currently when screen turns off, computer freezes and only holding power button down until forcibly turns off does help.

Note You need to log in before you can comment on or make changes to this bug.