Bug 42782

Summary: IO_PAGE_FAULT while starting xorg
Product: Virtualization Reporter: edm (fuffi.il.fuffo)
Component: kvmAssignee: virtualization_kvm
Status: NEW ---    
Severity: normal CC: alan, joro, rtguille, tango
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: 3.4.9 Subsystem:
Regression: No Bisected commit-id:
Attachments: dmesg output
dmidecode output
lspci -vv output
dmesg at poweron, with "page fault"
dmesg when reset, no "page fault"
lspci -vvvnn

Description edm 2012-02-16 12:13:31 UTC
Hi,

I got this error while doing startx with 3.2.6 kernel and I can't start xorg:

[   54.683907] AMD-Vi: Event logged [IO_PAGE_FAULT device=01:00.0
domain=0x0018 address=0x0000000220583200 flags=0x0010]

I saw the same error in this mailing list
(http://www.spinics.net/lists/kvm/msg48472.html) so I thought it was
the right ml for reporting this problem, am I wrong?

I attached the output of dmesg, lspci -vv and dmidecode.

disabling amd_iommu allow X to start but it's only a workaround.

This is the relevant part of my grub.cfg file:

# (0) Arch Linux
menuentry 'Arch Linux' {
 set root='(hd0,1)'; set legacy_hdbias='0'
 legacy_kernel   '/boot/vmlinuz-linux' '/boot/vmlinuz-linux'
'root=/dev/disk/by-uuid/a4259127-650c-4767-af64-9c86cdc1a5e1' 'ro'
'vga=773' 'amd_iommu_dump' '3'
 legacy_initrd '/boot/initramfs-linux.img' '/boot/initramfs-linux.img'

 # (1) Arch Linux
}


Thanks in advance,

EDM.
Comment 1 edm 2012-02-16 12:14:28 UTC
Created attachment 72398 [details]
dmesg output
Comment 2 edm 2012-02-16 12:17:11 UTC
Created attachment 72399 [details]
dmidecode output
Comment 3 edm 2012-02-16 12:17:43 UTC
Created attachment 72400 [details]
lspci -vv output
Comment 4 Joerg Roedel 2012-02-16 14:02:17 UTC
You are using the closed-source NVidia driver which apparently has a bug. The driver seems not to use the DMA-API properly and tries to DMA to an invalid handle.

Please try the open source NVidia drivers or report the problem to NVidia directly.
Comment 5 Reartes Guillermo 2012-06-25 13:36:37 UTC
I have it too (but using radeon kms):

RHBZ: Bug 827123 - f17 AMD-Vi: Event logged on device device=01:00.0 (it is the VGA card) fx6100 990fx caicos

In my case, it only happens at POWER-ON, not when doing a RESET.
So to reproduce it one must POWER-CYCLE.


AMD-Vi: Event logged [IO_PAGE_FAULT device=01:00.0 domain=0x0016 address=0x0000000f002eaec0 flags=0x0010]
Comment 6 Reartes Guillermo 2012-06-25 13:50:00 UTC
Created attachment 74211 [details]
dmesg at poweron, with "page fault"
Comment 7 Reartes Guillermo 2012-06-25 13:50:24 UTC
Created attachment 74221 [details]
dmesg when reset, no "page fault"
Comment 8 Reartes Guillermo 2012-06-25 13:52:18 UTC
Created attachment 74231 [details]
lspci -vvvnn
Comment 9 Joerg Roedel 2012-06-25 14:06:54 UTC
The addresses where the page-fault happens look strange: 0xf00420800.

This address is beyond TOM2 which means it does not point to system RAM. To me this looks like this happens during firmware load (which explains why it happens at power-on but not on reset).

I Cc Alex Deucher, maybe he has an idea.
Comment 10 Reartes Guillermo 2012-06-25 14:21:54 UTC
Thanks for the answer.

Check comment #3 from:
https://bugzilla.kernel.org/show_bug.cgi?id=42921

A similar motherboard Gigabyte GA-990FXA-UD3
(mine is sabertooth 990fx)

There the "page faults" are with the NIC.
Comment 11 Reartes Guillermo 2012-06-26 15:11:16 UTC
I booted with iommu=pt kernel parameter and i stopped getting these messages.

I am experimenting with pcie kvm pass-through, so i ended using that parameter.
Sadly the whole host freezes when i power on the guest, but that is outside of this bug-report.

> In my case, it only happens at POWER-ON, not when doing a RESET.
> So to reproduce it one must POWER-CYCLE.

I don't think that was actually not 100% accurate.

~~~~~~~~

# dmesg| grep -i amd-vi                                                                                                                               
[    1.774021] AMD-Vi: device: 00:00.2 cap: 0040 seg: 0 flags: 3e info 1300
[    1.774024] AMD-Vi:        mmio-addr: 00000000feb20000
[    1.774259] AMD-Vi:   DEV_SELECT_RANGE_START  devid: 00:00.0 flags: 00
[    1.774262] AMD-Vi:   DEV_RANGE_END           devid: 00:00.2
[    1.774264] AMD-Vi:   DEV_SELECT                      devid: 00:02.0 flags: 00
[    1.774266] AMD-Vi:   DEV_SELECT_RANGE_START  devid: 01:00.0 flags: 00
[    1.774268] AMD-Vi:   DEV_RANGE_END           devid: 01:00.1
[    1.774270] AMD-Vi:   DEV_SELECT                      devid: 00:04.0 flags: 00
[    1.774272] AMD-Vi:   DEV_SELECT                      devid: 02:00.0 flags: 00
[    1.774274] AMD-Vi:   DEV_SELECT                      devid: 00:05.0 flags: 00
[    1.774275] AMD-Vi:   DEV_SELECT                      devid: 03:00.0 flags: 00
[    1.774277] AMD-Vi:   DEV_SELECT                      devid: 00:06.0 flags: 00
[    1.774279] AMD-Vi:   DEV_SELECT                      devid: 04:00.0 flags: 00
[    1.774281] AMD-Vi:   DEV_SELECT                      devid: 00:09.0 flags: 00
[    1.774283] AMD-Vi:   DEV_SELECT                      devid: 05:00.0 flags: 00
[    1.774284] AMD-Vi:   DEV_SELECT                      devid: 00:11.0 flags: 00
[    1.774286] AMD-Vi:   DEV_SELECT_RANGE_START  devid: 00:12.0 flags: 00
[    1.774288] AMD-Vi:   DEV_RANGE_END           devid: 00:12.2
[    1.774290] AMD-Vi:   DEV_SELECT_RANGE_START  devid: 00:13.0 flags: 00
[    1.774292] AMD-Vi:   DEV_RANGE_END           devid: 00:13.2
[    1.774294] AMD-Vi:   DEV_SELECT                      devid: 00:14.0 flags: d7
[    1.774296] AMD-Vi:   DEV_SELECT                      devid: 00:14.3 flags: 00
[    1.774297] AMD-Vi:   DEV_SELECT                      devid: 00:14.4 flags: 00
[    1.774300] AMD-Vi:   DEV_ALIAS_RANGE                 devid: 06:00.0 flags: 00 devid_to: 00:14.4
[    1.774302] AMD-Vi:   DEV_RANGE_END           devid: 06:1f.7
[    1.774311] AMD-Vi:   DEV_SELECT                      devid: 00:14.5 flags: 00
[    1.774313] AMD-Vi:   DEV_SELECT                      devid: 00:15.0 flags: 00
[    1.774315] AMD-Vi:   DEV_SELECT                      devid: 07:00.0 flags: 00
[    1.774317] AMD-Vi:   DEV_SELECT                      devid: 00:15.1 flags: 00
[    1.774319] AMD-Vi:   DEV_SELECT                      devid: 08:00.0 flags: 00
[    1.774320] AMD-Vi:   DEV_SELECT_RANGE_START  devid: 00:16.0 flags: 00
[    1.774322] AMD-Vi:   DEV_RANGE_END           devid: 00:16.2
[    1.774428] AMD-Vi: Enabling IOMMU at 0000:00:00.2 cap 0x40
[    1.827755] AMD-Vi: Initialized for Passthrough Mode
Comment 12 Tango 2012-08-29 14:54:38 UTC
I have this problem as well.  In my case it seems to be a regression.  My 2.6.37-r4 kernel boots properly and X starts without error.

Been trying sense March to get any stable 3.x.x kernel to boot and start X properly.  I have had no problems getting them to boot but none would allow me to start X.

Attempting to start X produces these errors in demsg.
[    8.368285] nvidia: module license 'NVIDIA' taints kernel.
[    8.368289] Disabling lock debugging due to kernel taint
[    8.382108] vgaarb: device changed decodes: PCI:0000:08:00.0,olddecodes=io+mem,decodes=none:owns=io+mem
[    8.382524] NVRM: loading NVIDIA UNIX x86_64 Kernel Module  295.71  Thu Aug  2 19:22:08 PDT 2012
[    9.329795] AMD-Vi: Event logged [IO_PAGE_FAULT device=08:00.0 domain=0x0016 address=0x000000041a386200 flags=0x0010]
[    9.329812] AMD-Vi: Event logged [IO_PAGE_FAULT device=08:00.0 domain=0x0016 address=0x00000004147d0000 flags=0x0010]
[    9.329823] AMD-Vi: Event logged [IO_PAGE_FAULT device=08:00.0 domain=0x0016 address=0x00000004147d0040 flags=0x0010]
[    9.329834] AMD-Vi: Event logged [IO_PAGE_FAULT device=08:00.0 domain=0x0016 address=0x0000000000006000 flags=0x0000]
[    9.329843] AMD-Vi: Event logged [IO_PAGE_FAULT device=08:00.0 domain=0x0016 address=0x0000000000006040 flags=0x0000]
[    9.329853] AMD-Vi: Event logged [IO_PAGE_FAULT device=08:00.0 domain=0x0016 address=0x0000000000006080 flags=0x0000]
[    9.329862] AMD-Vi: Event logged [IO_PAGE_FAULT device=08:00.0 domain=0x0016 address=0x00000000000060c0 flags=0x0000]
[   17.341563] NVRM: RmInitAdapter failed! (0x27:0x38:1190)
[   17.341573] NVRM: rm_init_adapter(0) failed

Early in the kernel boot process all my kernels were reporting,
Please enable the IOMMU option in the BIOS setup
This costs you 64 MB of RAM
I verified that IOMMU was in fact enabled in my BIOS.  So I am not sure if that is just a suggestion or that the kernel isn't detecting the IOMMU settings.

After much research I narrowed the problem down to AMD_IOMMU.  I can boot my 3.4.9-kernel, (configured and striped to absolute essentials for testing) and start X properly only if I boot it with amd_iommu=off.

Kernel command line: root=/dev/sda3 amd_iommu=off video=uvesafb:mtrr:3,ywrap,1280x1024-32@75 splash=verbose,theme:livecd-2007.0 console=tty1

My Linux Distro is Gentoo
Portage 2.1.11.9 (default/linux/amd64/10.0, gcc-4.5.4, glibc-2.15-r2, 3.4.9-gentoo x86_64)

System uname: Linux-3.4.9-gentoo-x86_64-AMD_Phenom-tm-_II_X6_1100T_Processor-with-gentoo-2.1