Bug 108711
Summary: | AMD-Vi: Event logged [IO_PAGE_FAULT device=00:14.1 domain=0x0000 address=0x0000000100000840 flags=0x0070] | ||
---|---|---|---|
Product: | Platform Specific/Hardware | Reporter: | Eik Binschek (binschek) |
Component: | x86-64 | Assignee: | platform_x86_64 (platform_x86_64) |
Status: | NEW --- | ||
Severity: | high | CC: | alexander.schmiechen, animtim, arnaudh1, bill, edmund.laugasson, germano.massullo, kalle.widell, kaltofen, konoha02, lvml, mtpmoni, piotrszegda, reuben_p, vedran, ZeskoTron |
Priority: | P1 | ||
Hardware: | x86-64 | ||
OS: | Linux | ||
Kernel Version: | 4.2.2 4.2.3 | Subsystem: | |
Regression: | No | Bisected commit-id: | |
Attachments: |
Hardware Information
dmesg file |
Hi, I'm a colleague of the bug reporter. There is an equivalent, specific Bug report on Red Hat Bugzilla: Bug 1274527 (https://bugzilla.redhat.com/show_bug.cgi?id=1274527) The use a slightly different hardware, but with identical components affected - AMD-Vi (IOMMU) in combination with Chipset AMD FCH SATA Controller. However, since this error occurred with the 4.2.x kernels of different distributions, we believe in a problem in the kernel itself. Hi I have recently come across this same issue with Scientific Linux 7.1 kernel-3.10.0-327.3 thru 3.10.0-327.4.5 When I revert to kernel-3.10.0-229.20.1 the issue goes away. CPU = AMD FX(tm)-8120 Eight-Core Processor MOBO = GA-990FXA-D3 If I boot with amd-iommu=off then the issue goes away, but I need IOMMU switched on for usb devices used by virtual machines. Cheers Bill Update: The following last kernel has the problem still: - Proxmox VE 4.0 (based on debian jessie with ubuntu 15.10 kernel) -- Kernel: 4.2.2-1-pve -- Kernel: 4.2.3-1-pve -- Kernel: 4.2.6-1-pve <- Update: In my case problem still occurs on mainline Ubuntu kernels: 4.4, 4.5, 4.6-rc1. Setting amd-iommu=off or disable iommu in bios make system hangs during boot. Same here. AMD A8-5500, ASRock FM2A55M-DGS motherboard. Kernel 4.5 from a fresh openSUSE Tumbleweed install throws out AMD Vi IO_PAGE_FAULTs. Boot process stops while trying to connect to the hard drives. Switching off IOMMU in BIOS leads to a full stop with an sp5100_tco already in use issue. No problems occured with older kernels up to 4.1. Hello , I also have this bug on my computer since kernel updates Linux from version 4.2 . Since April 16 , a new version of "linux- lts " package on Arch platform ( which went from version 4.1 to version 4.4) oblige me block the updates because I can not start on new versions. IO_PAGE_FAULT device=00:11.0 domain=0x0005 address=0x0000000000000000 flags=0x0020] AMD-Vi: Event logged [IO_PAGE_FAULT device=00:11.0 domain=0x0005 address=0x0000000000000000 flags=0x0020] $ head -n7 /proc/cpuinfo processor : 0 vendor_id : AuthenticAMD cpu family : 21 model : 16 model name : AMD A8-5500 APU with Radeon(tm) HD Graphics stepping : 1 microcode : 0x6001119 $ lspci -v -s 00:11.0 00:11.0 RAID bus controller: Advanced Micro Devices, Inc. [AMD] FCH SATA Controller [RAID mode] (rev 40) Subsystem: Gigabyte Technology Co., Ltd Device b002 Flags: bus master, 66MHz, medium devsel, latency 32, IRQ 36 I/O ports at f140 [size=8] I/O ports at f130 [size=4] I/O ports at f120 [size=8] I/O ports at f110 [size=4] I/O ports at f100 [size=16] Memory at feb51000 (32-bit, non-prefetchable) [size=2K] Capabilities: [50] MSI: Enable+ Count=8/8 Maskable- 64bit+ Capabilities: [70] SATA HBA v1.0 Kernel driver in use: ahci Kernel modules: ahci Created attachment 213261 [details]
dmesg file
We have seen (what appears to be the same symptom) the first time after upgrading a server from (vanilla, stable) kernel 4.4.2 to (vanilla, stable) 4.6.1 - after it happened the ixgbe driver was unable to establish a link again on an Intel 10Gbase-T NIC, "rmmod" plus setup again revives the interface. I have searched for and found many occurences of the same kind of IO_PAGE_FAULT error messages on the web, spread out over the years, but the text around it was always inconclusive whether this is a bug with AMD-Vi as such or whether this is due to some bad programming of the kernel driver triggering the page fault. Jun 6 19:09:31 computer kernel: AMD-Vi: Event logged [IO_PAGE_FAULT device=04:00.0 domain=0x000e address=0x000000001004ecc0 flags=0x0050] Jun 6 19:09:31 computer kernel: AMD-Vi: Event logged [IO_PAGE_FAULT device=04:00.0 domain=0x000e address=0x000000001004ed00 flags=0x0050] Jun 6 19:09:35 computer kernel: ixgbe 0000:04:00.0 enp4s0: Detected Tx Unit Hang#012 Tx Queue <3>#012 TDH, TDT <1ce>, <1e6>#012 next_to_use <1e6>#012 next_to_clean <1ce>#012tx_buffer_info[next_to_clean]#012 time_stamp <10f7b215d>#012 jiffies <10f7b3244> Jun 6 19:09:35 computer kernel: ixgbe 0000:04:00.0 enp4s0: Detected Tx Unit Hang#012 Tx Queue <1>#012 TDH, TDT <fc>, <108>#012 next_to_use <108>#012 next_to_clean <fc>#012tx_buffer_info[next_to_clean]#012 time_stamp <10f7b28c5>#012 jiffies <10f7b3244> Jun 6 19:09:35 computer kernel: ixgbe 0000:04:00.0 enp4s0: Detected Tx Unit Hang#012 Tx Queue <0>#012 TDH, TDT <16b>, <16f>#012 next_to_use <16f>#012 next_to_clean <16b>#012tx_buffer_info[next_to_clean]#012 time_stamp <10f7b21d0>#012 jiffies <10f7b3244> Jun 6 19:09:35 computer kernel: ixgbe 0000:04:00.0 enp4s0: Detected Tx Unit Hang#012 Tx Queue <4>#012 TDH, TDT <69>, <8b>#012 next_to_use <8b>#012 next_to_clean <69>#012tx_buffer_info[next_to_clean]#012 time_stamp <10f7b215d>#012 jiffies <10f7b3244> Jun 6 19:09:35 computer kernel: ixgbe 0000:04:00.0 enp4s0: tx hang 1 detected on queue 1, resetting adapter Jun 6 19:09:35 computer kernel: ixgbe 0000:04:00.0 enp4s0: tx hang 1 detected on queue 0, resetting adapter Jun 6 19:09:35 computer kernel: ixgbe 0000:04:00.0 enp4s0: Detected Tx Unit Hang#012 Tx Queue <10>#012 TDH, TDT <1c3>, <1c9>#012 next_to_use <1c9>#012 next_to_clean <1c3>#012tx_buffer_info[next_to_clean]#012 time_stamp <10f7b215d>#012 jiffies <10f7b3244> Jun 6 19:09:35 computer kernel: ixgbe 0000:04:00.0 enp4s0: tx hang 1 detected on queue 4, resetting adapter Jun 6 19:09:35 computer kernel: ixgbe 0000:04:00.0 enp4s0: initiating reset due to tx timeout Jun 6 19:09:35 computer kernel: ixgbe 0000:04:00.0 enp4s0: initiating reset due to tx timeout Jun 6 19:09:35 computer kernel: ixgbe 0000:04:00.0 enp4s0: tx hang 1 detected on queue 10, resetting adapter Jun 6 19:09:35 computer kernel: ixgbe 0000:04:00.0 enp4s0: initiating reset due to tx timeout Jun 6 19:09:35 computer kernel: ixgbe 0000:04:00.0 enp4s0: initiating reset due to tx timeout Jun 6 19:09:35 computer kernel: ixgbe 0000:04:00.0 enp4s0: Reset adapter Jun 6 19:09:35 computer kernel: ixgbe 0000:04:00.0 enp4s0: tx hang 2 detected on queue 3, resetting adapter Jun 6 19:09:35 computer kernel: ixgbe 0000:04:00.0: master disable timed out Jun 6 19:09:36 computer kernel: br0: port 1(enp4s0) entered disabled state Jun 6 19:09:42 computer kernel: ixgbe 0000:04:00.0 enp4s0: NIC Link is Up 10 Gbps, Flow Control: RX/TX Jun 6 19:09:42 computer kernel: br0: port 1(enp4s0) entered blocking state Jun 6 19:09:42 computer kernel: br0: port 1(enp4s0) entered forwarding state Jun 6 19:09:44 computer kernel: ixgbe 0000:04:00.0 enp4s0: Detected Tx Unit Hang#012 Tx Queue <12>#012 TDH, TDT <0>, <2>#012 next_to_use <2>#012 next_to_clean <0>#012tx_buffer_info[next_to_clean]#012 time_stamp <10f7b4c20>#012 jiffies <10f7b544c> Jun 6 19:09:44 computer kernel: ixgbe 0000:04:00.0 enp4s0: tx hang 2 detected on queue 12, resetting adapter Jun 6 19:09:44 computer kernel: ixgbe 0000:04:00.0 enp4s0: initiating reset due to tx timeout Jun 6 19:09:44 computer kernel: ixgbe 0000:04:00.0 enp4s0: Reset adapter Hi, I had the exact same bug from kernel 4.2 to 4.7 [IO_PAGE_FAULT device=00:11.0 domain=0x0005 address=0x0000000000000000 flags=0x0020] and I'm happy to say, with 4.8rc2, it doesn't happen anymore and boots perfectly. I have the same problem with 01:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Tonga XT / Amethyst XT [Radeon R9 380X / R9 M295X] (rev f1) getting errors like [ 2.142597] AMD-Vi: Event logged [IO_PAGE_FAULT device=01:00.0 domain=0x000f address=0x000000f400622000 flags=0x0010] still the same problem on Fedora 25 with kernel 4.8.0-0.rc7.git0.1.fc25.x86_64. I confirm the error message on motherboard Gigabyte GA-970A-DS3P with IOMMU enabled on CentOS 7 with kernel 3.10 Having also similar issues, please see https://pastebin.com/wJ6x5ete Currently when screen turns off, computer freezes and only holding power button down until forcibly turns off does help. Running ubuntu 22.04. Seems to occur randomly on resume after suspend. I running an Ubuntu-server 22.04 with kernel 5.15.0-52-generic on ASRockRack Product Name: X470D4U and Samsung_SSD_870_QVO_2TB SSD`s. after "kernel: ahci 0000:03:00.1: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x000f address=0x9520e000 flags=0x0000]" ATAPorts have the following error. Example: "ata8.00: exception Emask 0x10 SAct 0x200400b8 SErr 0x0 action 0x6 frozen" the end reads: blk_update_request: I/O error, dev sdd,... I have the error since the installation with Ubuntu Server 22.04. I have the similar issue: https://gitlab.freedesktop.org/drm/amd/-/issues/2451 |
Created attachment 196261 [details] Hardware Information Kernel does not start with enabled AMD-Vi and used "[AMD/ATI] SB7x0/SB8x0/SB9x0 SATA Controller [AHCI mode]". Following lines are shown on console: AMD-Vi: Event logged [IO_PAGE_FAULT device=00:14.1 domain=0x0000 address=0x0000000100000840 flags=0x0070] On our hardware (see: attached hardware information) this problem happen with following distributions - Proxmox VE 4.0 (based on debian jessie with ubuntu 15.10 kernel) -- Kernel: 4.2.2-1-pve -- Kernel: 4.2.3-1-pve - Ubuntu 15.10 -- Kernel: 4.2.3 - Fedora 23 -- Kernel: 4.2.3-300.fc23.x86_64 I think this is common problem of AMD-Vi.