Bug 42782
Summary: | IO_PAGE_FAULT while starting xorg | ||
---|---|---|---|
Product: | Virtualization | Reporter: | edm (fuffi.il.fuffo) |
Component: | kvm | Assignee: | virtualization_kvm |
Status: | NEW --- | ||
Severity: | normal | CC: | alan, joro, rtguille, tango |
Priority: | P1 | ||
Hardware: | All | ||
OS: | Linux | ||
Kernel Version: | 3.4.9 | Subsystem: | |
Regression: | No | Bisected commit-id: | |
Attachments: |
dmesg output
dmidecode output lspci -vv output dmesg at poweron, with "page fault" dmesg when reset, no "page fault" lspci -vvvnn |
Description
edm
2012-02-16 12:13:31 UTC
Created attachment 72398 [details]
dmesg output
Created attachment 72399 [details]
dmidecode output
Created attachment 72400 [details]
lspci -vv output
You are using the closed-source NVidia driver which apparently has a bug. The driver seems not to use the DMA-API properly and tries to DMA to an invalid handle. Please try the open source NVidia drivers or report the problem to NVidia directly. I have it too (but using radeon kms): RHBZ: Bug 827123 - f17 AMD-Vi: Event logged on device device=01:00.0 (it is the VGA card) fx6100 990fx caicos In my case, it only happens at POWER-ON, not when doing a RESET. So to reproduce it one must POWER-CYCLE. AMD-Vi: Event logged [IO_PAGE_FAULT device=01:00.0 domain=0x0016 address=0x0000000f002eaec0 flags=0x0010] Created attachment 74211 [details]
dmesg at poweron, with "page fault"
Created attachment 74221 [details]
dmesg when reset, no "page fault"
Created attachment 74231 [details]
lspci -vvvnn
The addresses where the page-fault happens look strange: 0xf00420800. This address is beyond TOM2 which means it does not point to system RAM. To me this looks like this happens during firmware load (which explains why it happens at power-on but not on reset). I Cc Alex Deucher, maybe he has an idea. Thanks for the answer. Check comment #3 from: https://bugzilla.kernel.org/show_bug.cgi?id=42921 A similar motherboard Gigabyte GA-990FXA-UD3 (mine is sabertooth 990fx) There the "page faults" are with the NIC. I booted with iommu=pt kernel parameter and i stopped getting these messages.
I am experimenting with pcie kvm pass-through, so i ended using that parameter.
Sadly the whole host freezes when i power on the guest, but that is outside of this bug-report.
> In my case, it only happens at POWER-ON, not when doing a RESET.
> So to reproduce it one must POWER-CYCLE.
I don't think that was actually not 100% accurate.
~~~~~~~~
# dmesg| grep -i amd-vi
[ 1.774021] AMD-Vi: device: 00:00.2 cap: 0040 seg: 0 flags: 3e info 1300
[ 1.774024] AMD-Vi: mmio-addr: 00000000feb20000
[ 1.774259] AMD-Vi: DEV_SELECT_RANGE_START devid: 00:00.0 flags: 00
[ 1.774262] AMD-Vi: DEV_RANGE_END devid: 00:00.2
[ 1.774264] AMD-Vi: DEV_SELECT devid: 00:02.0 flags: 00
[ 1.774266] AMD-Vi: DEV_SELECT_RANGE_START devid: 01:00.0 flags: 00
[ 1.774268] AMD-Vi: DEV_RANGE_END devid: 01:00.1
[ 1.774270] AMD-Vi: DEV_SELECT devid: 00:04.0 flags: 00
[ 1.774272] AMD-Vi: DEV_SELECT devid: 02:00.0 flags: 00
[ 1.774274] AMD-Vi: DEV_SELECT devid: 00:05.0 flags: 00
[ 1.774275] AMD-Vi: DEV_SELECT devid: 03:00.0 flags: 00
[ 1.774277] AMD-Vi: DEV_SELECT devid: 00:06.0 flags: 00
[ 1.774279] AMD-Vi: DEV_SELECT devid: 04:00.0 flags: 00
[ 1.774281] AMD-Vi: DEV_SELECT devid: 00:09.0 flags: 00
[ 1.774283] AMD-Vi: DEV_SELECT devid: 05:00.0 flags: 00
[ 1.774284] AMD-Vi: DEV_SELECT devid: 00:11.0 flags: 00
[ 1.774286] AMD-Vi: DEV_SELECT_RANGE_START devid: 00:12.0 flags: 00
[ 1.774288] AMD-Vi: DEV_RANGE_END devid: 00:12.2
[ 1.774290] AMD-Vi: DEV_SELECT_RANGE_START devid: 00:13.0 flags: 00
[ 1.774292] AMD-Vi: DEV_RANGE_END devid: 00:13.2
[ 1.774294] AMD-Vi: DEV_SELECT devid: 00:14.0 flags: d7
[ 1.774296] AMD-Vi: DEV_SELECT devid: 00:14.3 flags: 00
[ 1.774297] AMD-Vi: DEV_SELECT devid: 00:14.4 flags: 00
[ 1.774300] AMD-Vi: DEV_ALIAS_RANGE devid: 06:00.0 flags: 00 devid_to: 00:14.4
[ 1.774302] AMD-Vi: DEV_RANGE_END devid: 06:1f.7
[ 1.774311] AMD-Vi: DEV_SELECT devid: 00:14.5 flags: 00
[ 1.774313] AMD-Vi: DEV_SELECT devid: 00:15.0 flags: 00
[ 1.774315] AMD-Vi: DEV_SELECT devid: 07:00.0 flags: 00
[ 1.774317] AMD-Vi: DEV_SELECT devid: 00:15.1 flags: 00
[ 1.774319] AMD-Vi: DEV_SELECT devid: 08:00.0 flags: 00
[ 1.774320] AMD-Vi: DEV_SELECT_RANGE_START devid: 00:16.0 flags: 00
[ 1.774322] AMD-Vi: DEV_RANGE_END devid: 00:16.2
[ 1.774428] AMD-Vi: Enabling IOMMU at 0000:00:00.2 cap 0x40
[ 1.827755] AMD-Vi: Initialized for Passthrough Mode
I have this problem as well. In my case it seems to be a regression. My 2.6.37-r4 kernel boots properly and X starts without error. Been trying sense March to get any stable 3.x.x kernel to boot and start X properly. I have had no problems getting them to boot but none would allow me to start X. Attempting to start X produces these errors in demsg. [ 8.368285] nvidia: module license 'NVIDIA' taints kernel. [ 8.368289] Disabling lock debugging due to kernel taint [ 8.382108] vgaarb: device changed decodes: PCI:0000:08:00.0,olddecodes=io+mem,decodes=none:owns=io+mem [ 8.382524] NVRM: loading NVIDIA UNIX x86_64 Kernel Module 295.71 Thu Aug 2 19:22:08 PDT 2012 [ 9.329795] AMD-Vi: Event logged [IO_PAGE_FAULT device=08:00.0 domain=0x0016 address=0x000000041a386200 flags=0x0010] [ 9.329812] AMD-Vi: Event logged [IO_PAGE_FAULT device=08:00.0 domain=0x0016 address=0x00000004147d0000 flags=0x0010] [ 9.329823] AMD-Vi: Event logged [IO_PAGE_FAULT device=08:00.0 domain=0x0016 address=0x00000004147d0040 flags=0x0010] [ 9.329834] AMD-Vi: Event logged [IO_PAGE_FAULT device=08:00.0 domain=0x0016 address=0x0000000000006000 flags=0x0000] [ 9.329843] AMD-Vi: Event logged [IO_PAGE_FAULT device=08:00.0 domain=0x0016 address=0x0000000000006040 flags=0x0000] [ 9.329853] AMD-Vi: Event logged [IO_PAGE_FAULT device=08:00.0 domain=0x0016 address=0x0000000000006080 flags=0x0000] [ 9.329862] AMD-Vi: Event logged [IO_PAGE_FAULT device=08:00.0 domain=0x0016 address=0x00000000000060c0 flags=0x0000] [ 17.341563] NVRM: RmInitAdapter failed! (0x27:0x38:1190) [ 17.341573] NVRM: rm_init_adapter(0) failed Early in the kernel boot process all my kernels were reporting, Please enable the IOMMU option in the BIOS setup This costs you 64 MB of RAM I verified that IOMMU was in fact enabled in my BIOS. So I am not sure if that is just a suggestion or that the kernel isn't detecting the IOMMU settings. After much research I narrowed the problem down to AMD_IOMMU. I can boot my 3.4.9-kernel, (configured and striped to absolute essentials for testing) and start X properly only if I boot it with amd_iommu=off. Kernel command line: root=/dev/sda3 amd_iommu=off video=uvesafb:mtrr:3,ywrap,1280x1024-32@75 splash=verbose,theme:livecd-2007.0 console=tty1 My Linux Distro is Gentoo Portage 2.1.11.9 (default/linux/amd64/10.0, gcc-4.5.4, glibc-2.15-r2, 3.4.9-gentoo x86_64) System uname: Linux-3.4.9-gentoo-x86_64-AMD_Phenom-tm-_II_X6_1100T_Processor-with-gentoo-2.1 |