Hi, With latest kernel, i notice this: dmesg | grep vmwgfx [ 2.959200] vmwgfx 0000:00:0f.0: vgaarb: deactivate vga console [ 2.959764] vmwgfx 0000:00:0f.0: BAR 1: can't reserve [mem 0xf0000000-0xf7ffffff pref] [ 2.959766] vmwgfx: probe of 0000:00:0f.0 failed with error -16 lspci -s 0000:00:0f.0 -nnvv 00:0f.0 VGA compatible controller [0300]: VMware SVGA II Adapter [15ad:0405] (prog-if 00 [VGA controller]) Subsystem: VMware SVGA II Adapter [15ad:0405] Control: I/O+ Mem+ BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx- Status: Cap+ 66MHz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- Interrupt: pin A routed to IRQ 16 Region 0: I/O ports at 2140 [size=16] Region 1: Memory at f0000000 (32-bit, prefetchable) [size=128M] Region 2: Memory at fb800000 (32-bit, non-prefetchable) [size=8M] Expansion ROM at 000c0000 [disabled] [size=128K] Capabilities: [40] Vendor Specific Information: Len=00 <?> Capabilities: [44] PCI Advanced Features AFCap: TP+ FLR+ AFCtrl: FLR- AFStatus: TP- Kernel modules: vmwgfx Driver version: #define VMWGFX_DRIVER_DATE "20211206" Host: Windows 11 & VMware Workstation 16 Pro 16.2.0 build-18760230 Guest: Debian 11
Just to be sure: it works with older kernel versions, like Linux 5.16?
Yes, it's working correctly.
It's a little hard to tell without the full log but this looks like the pci reservation bug that was fixed by: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/drivers/video/fbdev/core/fbmem.c?id=27599aacbaefcbf2af7b06b0029459bbf682000d It should go in through drm-misc tree, drm-misc-next branch.
(In reply to Zack Rusin from comment #3) > It's a little hard to tell without the full log but this looks like the pci > reservation bug that was fixed by: > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/ > drivers/video/fbdev/core/fbmem.c?id=27599aacbaefcbf2af7b06b0029459bbf682000d thx According to https://patchwork.freedesktop.org/series/99243/ this seems to be a patch from a series. Am I right in assuming the patch you specified is enough to fix this (assuming that this bug is triggered by the "pci reservation bug")?
(In reply to The Linux kernel's regression tracker (Thorsten Leemhuis) from comment #4) > (In reply to Zack Rusin from comment #3) > > It's a little hard to tell without the full log but this looks like the pci > > reservation bug that was fixed by: > > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/ > > > drivers/video/fbdev/core/fbmem.c?id=27599aacbaefcbf2af7b06b0029459bbf682000d > > thx > > According to https://patchwork.freedesktop.org/series/99243/ this seems to > be a patch from a series. Am I right in assuming the patch you specified is > enough to fix this (assuming that this bug is triggered by the "pci > reservation bug")? Yes, that's correct. Thomas and Javier were doing more work in those areas so there might be more related changes, but that one specific commit is enough to get platform fb drivers to release pci resources and allow drm drivers like vmwgfx to load correctly.
(In reply to sander44 from comment #2) > Yes, it's working correctly. @sander44: the patch to resolve this actually fixes an issue already 5.11; is it possible that your 5.17-rc kernel is build from a similar configuration as the older kernel that was working (see https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/plain/Documentation/admin-guide/reporting-regressions.rst for an explanation)? And was it actually 5.16? Or a even older kernel? Side note: /me wonders why the fix for this issue wasn't merged this cycle, as it was approved weeks ago...
Hi Thorsten Leemhuis, I will try today to make a compilation with 5.17 mainline to see if it reproduces. I will attach more logs if it reproduces.
(In reply to sander44 from comment #7) > > I will try today to make a compilation with 5.17 mainline to see if it > reproduces. It likely will afaics. > I will attach more logs if it reproduces. No need. Try to apply https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/drivers/video/fbdev/core/fbmem.c?id=27599aacbaefcbf2af7b06b0029459bbf682000d ontop of 5.17. If will be applied soon and like get backported in round about two weeks. Further logs likely won't change much.
I started the system configuration with 5.17. And it seems to have worked for me now. But i notice this: [ 3.415301] ------------[ cut here ]------------ [ 3.415304] refcount_t: addition on 0; use-after-free. [ 3.415310] WARNING: CPU: 1 PID: 713 at lib/refcount.c:25 refcount_warn_saturate+0x9b/0x150 [ 3.415316] Modules linked in: qrtr vsock_loopback vmw_vsock_virtio_transport_common vmw_vsock_vmci_transport vsock intel_rapl_msr intel_rapl_common nls_iso8859_1 vmw_balloon crct10dif_pclmul ghash_clmulni_intel aesni_intel snd_ens1371 crypto_simd cryptd snd_ac97_codec gameport snd_rawmidi snd_seq_device snd_pcsp ac97_bus joydev input_leds snd_pcm snd_timer serio_raw snd efi_pstore soundcore vmw_vmci mac_hid ipmi_devintf ipmi_msghandler msr parport_pc ppdev lp parport ip_tables x_tables autofs4 hid_generic usbmouse usbhid hid vmwgfx drm_ttm_helper ttm drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops cec rc_core crc32_pclmul mptspi mptscsih mptbase psmouse drm e1000 scsi_transport_spi i2c_piix4 pata_acpi [ 3.415348] CPU: 1 PID: 713 Comm: Xorg Not tainted 5.17.0-mainline-vanilla-lowlatency #1 [ 3.415350] Hardware name: VMware, Inc. VMware7,1/440BX Desktop Reference Platform, BIOS VMW71.00V.18452719.B64.2108091906 08/09/2021 [ 3.415351] RIP: 0010:refcount_warn_saturate+0x9b/0x150 [ 3.415353] Code: c9 c3 0f b6 1d a5 6f be 01 80 fb 01 0f 87 5e c3 6c 00 83 e3 01 75 e5 48 c7 c7 20 dd e1 84 c6 05 89 6f be 01 01 e8 77 de 68 00 <0f> 0b eb ce 0f b6 1d 7b 6f be 01 80 fb 01 0f 87 1e c3 6c 00 83 e3 [ 3.415354] RSP: 0018:ffffb06840d5bbf8 EFLAGS: 00010282 [ 3.415356] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000027 [ 3.415357] RDX: ffffa08875e60988 RSI: 0000000000000001 RDI: ffffa08875e60980 [ 3.415357] RBP: ffffb06840d5bc00 R08: 0000000000000003 R09: fffffffffff1b468 [ 3.415358] R10: 000000000000002c R11: 0000000000000001 R12: ffffa0874f206800 [ 3.415358] R13: ffffa0875066fe00 R14: ffffa0875066fe00 R15: ffffa0875066fe00 [ 3.415359] FS: 00007f902f498ec0(0000) GS:ffffa08875e40000(0000) knlGS:0000000000000000 [ 3.415360] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 3.415361] CR2: 000055a6d3823798 CR3: 00000001100ac000 CR4: 0000000000750ee0 [ 3.415379] PKRU: 55555554 [ 3.415380] Call Trace: [ 3.415381] <TASK> [ 3.415384] drm_gem_handle_create_tail+0x197/0x1a0 [drm] [ 3.415398] drm_gem_handle_create+0x36/0x40 [drm] [ 3.415408] vmw_gb_surface_reference_internal+0x9b/0x1d0 [vmwgfx] [ 3.415417] ? vmw_gb_surface_reference_ioctl+0xa0/0xa0 [vmwgfx] [ 3.415423] vmw_gb_surface_reference_ext_ioctl+0x14/0x20 [vmwgfx] [ 3.415428] drm_ioctl_kernel+0xb7/0x150 [drm] [ 3.415439] drm_ioctl+0x264/0x4b0 [drm] [ 3.415448] ? vmw_gb_surface_reference_ioctl+0xa0/0xa0 [vmwgfx] [ 3.415454] vmw_generic_ioctl+0xc0/0x180 [vmwgfx] [ 3.415460] vmw_unlocked_ioctl+0x15/0x20 [vmwgfx] [ 3.415465] __x64_sys_ioctl+0x91/0xc0 [ 3.415468] do_syscall_64+0x5c/0xc0 [ 3.415471] ? syscall_exit_to_user_mode+0x27/0x50 [ 3.415472] ? do_syscall_64+0x69/0xc0 [ 3.415474] ? syscall_exit_to_user_mode+0x27/0x50 [ 3.415475] ? do_syscall_64+0x69/0xc0 [ 3.415476] ? asm_exc_page_fault+0x8/0x30 [ 3.415478] entry_SYSCALL_64_after_hwframe+0x44/0xae [ 3.415480] RIP: 0033:0x7f902f90e397 [ 3.415482] Code: 3c 1c e8 1c ff ff ff 85 c0 79 87 49 c7 c4 ff ff ff ff 5b 5d 4c 89 e0 41 5c c3 66 0f 1f 84 00 00 00 00 00 b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d a9 da 0d 00 f7 d8 64 89 01 48 [ 3.415483] RSP: 002b:00007ffccdb90b18 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 [ 3.415484] RAX: ffffffffffffffda RBX: 00007ffccdb90b90 RCX: 00007f902f90e397 [ 3.415484] RDX: 00007ffccdb90b90 RSI: 00000000c060645c RDI: 0000000000000013 [ 3.415485] RBP: 00000000c060645c R08: 0000000000000013 R09: 00007f902f9ecc00 [ 3.415485] R10: 000055a6d381c540 R11: 0000000000000246 R12: 0000000000000000 [ 3.415486] R13: 0000000000000013 R14: 000055a6d353c110 R15: 00007ffccdb90c68 [ 3.415488] </TASK> [ 3.415489] ---[ end trace 0000000000000000 ]---
Created attachment 300603 [details] dmesg
Created attachment 300604 [details] journalctl
Created attachment 300605 [details] lscpu
Created attachment 300606 [details] lspci
The new logs look like you're just using efifb, it's the sysfb that's broken without the above patch. efifb, besides some one off bug, should be fine now. The gem warning is unrelated. It looks like some userspace app is trying to reference something that hasn't been initialized as a surface. I think I fixed something like that recently, if you have a second and could try to reproduce on drm-tip https://cgit.freedesktop.org/drm-tip (drm-misc-next would be the second best https://cgit.freedesktop.org/drm/drm-misc ) I'd be very interested to know what userspace app triggers this to fix it (probably in a separate bug though).
I'm trying to get a "headful" VM working in qemu / Proxmox, and I'm having trouble with both qxl and vmwgfx. The qxl issue may or may not be related, it also hinges on a memory range mapping being denied: https://bugs.gentoo.org/829759#c7 The qxl kernel module correctly takes over the console during boot, "only" the X.org qxl_drv.so fails to initialize ultimately because it can't mmap() what seems to be the fb region to me. I gave up on qxl and tried vmwgfx next. X.org would segfault on me, console is stuck on the bootloader (reFInd) messages, and I found the same kernel error messages mentioned above, which brought me here: pci 0000:00:01.0: [15ad:0405] type 00 class 0x030000 pci 0000:00:01.0: reg 0x10: [io 0xd320-0xd32f] pci 0000:00:01.0: reg 0x14: [mem 0xc0000000-0xc3ffffff pref] pci 0000:00:01.0: reg 0x18: [mem 0xc5240000-0xc524ffff pref] pci 0000:00:01.0: reg 0x30: [mem 0xffff0000-0xffffffff pref] pci 0000:00:01.0: BAR 1: assigned to efifb pci 0000:00:01.0: Video device with shadowed ROM at [mem 0x000c0000-0x000dffff] [...] pci 0000:00:01.0: vgaarb: setting as boot VGA device pci 0000:00:01.0: vgaarb: VGA device added: decodes=io+mem,owns=io+mem,locks=none pci 0000:00:01.0: vgaarb: bridge control possible vgaarb: loaded [...] pci 0000:00:01.0: can't claim BAR 0 [io 0xd320-0xd32f]: address conflict with PCI Bus 0000:01 [io 0xd000-0xdfff] [...] pci 0000:00:01.0: BAR 0: assigned [io 0x1420-0x142f] [...] pci_bus 0000:01: resource 0 [io 0xd000-0xdfff] pci_bus 0000:01: resource 1 [mem 0xc5000000-0xc51fffff] pci_bus 0000:01: resource 2 [mem 0x800100000-0x8002fffff 64bit pref] [...] vmwgfx 0000:00:01.0: vgaarb: deactivate vga console [TTM] Zone kernel: Available graphics memory: 11974078 KiB [TTM] Zone dma32: Available graphics memory: 2097152 KiB vmwgfx 0000:00:01.0: BAR 1: can't reserve [mem 0xc0000000-0xc3ffffff pref] [TTM] Zone kernel: Used memory at exit: 0 KiB [TTM] Zone dma32: Used memory at exit: 0 KiB vmwgfx: probe of 0000:00:01.0 failed with error -16 Applying just the patch mentioned in comment #8 does NOT change this at all for me. Then, I applied the whole series of 5 patches it belongs to: https://patchwork.freedesktop.org/series/99243/#rev2 This still gives exactly the same kernel log messages, I still don't have a console on vmwgfx, but X.org starts correctly now.
Unfortunate correction / addition: With te patches applied, X.org works SOMETIMES. Just restarted X in the process of configuring the display manager, and it broke again: [ 800.506] (EE) Backtrace: [ 800.506] (EE) 0: /usr/bin/X (xorg_backtrace+0x4d) [0x556d7e16a9f2] [ 800.507] (EE) 1: /usr/bin/X (0x556d7e03b000+0x133206) [0x556d7e16e206] [ 800.507] (EE) 2: /lib64/libc.so.6 (0x7f6bfb380000+0x3db40) [0x7f6bfb3bdb40] [ 800.507] (EE) 3: /usr/lib64/xorg/modules/drivers/vmware_drv.so (0x7f6bfac67000+0xa52a) [0x7f6bfac7152a] [ 800.507] (EE) 4: /usr/bin/X (0x556d7e03b000+0x151ff7) [0x556d7e18cff7] [ 800.507] (EE) 5: /usr/bin/X (xf86VTEnter+0x76) [0x556d7e1859f7] [ 800.507] (EE) 6: /usr/bin/X (WakeupHandler+0xa7) [0x556d7e0b01d2] [ 800.507] (EE) 7: /usr/bin/X (WaitForSomething+0x190) [0x556d7e16860b] [ 800.507] (EE) 8: /usr/bin/X (0x556d7e03b000+0x70bb1) [0x556d7e0abbb1] [ 800.507] (EE) 9: /usr/bin/X (0x556d7e03b000+0x7479a) [0x556d7e0af79a] [ 800.507] (EE) 10: /lib64/libc.so.6 (0x7f6bfb380000+0x291ca) [0x7f6bfb3a91ca] [ 800.507] (EE) 11: /lib64/libc.so.6 (__libc_start_main+0x78) [0x7f6bfb3a9278] [ 800.507] (EE) 12: /usr/bin/X (_start+0x21) [0x556d7e075a41] [ 800.507] (EE) [ 800.507] (EE) Segmentation fault at address 0x0
(In reply to Joachim Breuer from comment #15) > I'm trying to get a "headful" VM working in qemu / Proxmox, and I'm having > trouble with both qxl and vmwgfx. > > The qxl issue may or may not be related, it also hinges on a memory range > mapping being denied: https://bugs.gentoo.org/829759#c7 > > The qxl kernel module correctly takes over the console during boot, "only" > the X.org qxl_drv.so fails to initialize ultimately because it can't mmap() > what seems to be the fb region to me. > > I gave up on qxl and tried vmwgfx next. X.org would segfault on me, console > is stuck on the bootloader (reFInd) messages, and I found the same kernel > error messages mentioned above, which brought me here: > > pci 0000:00:01.0: [15ad:0405] type 00 class 0x030000 > pci 0000:00:01.0: reg 0x10: [io 0xd320-0xd32f] > pci 0000:00:01.0: reg 0x14: [mem 0xc0000000-0xc3ffffff pref] > pci 0000:00:01.0: reg 0x18: [mem 0xc5240000-0xc524ffff pref] > pci 0000:00:01.0: reg 0x30: [mem 0xffff0000-0xffffffff pref] > pci 0000:00:01.0: BAR 1: assigned to efifb > pci 0000:00:01.0: Video device with shadowed ROM at [mem > 0x000c0000-0x000dffff] > [...] > pci 0000:00:01.0: vgaarb: setting as boot VGA device > pci 0000:00:01.0: vgaarb: VGA device added: > decodes=io+mem,owns=io+mem,locks=none > pci 0000:00:01.0: vgaarb: bridge control possible > vgaarb: loaded > [...] > pci 0000:00:01.0: can't claim BAR 0 [io 0xd320-0xd32f]: address conflict > with PCI Bus 0000:01 [io 0xd000-0xdfff] > [...] > pci 0000:00:01.0: BAR 0: assigned [io 0x1420-0x142f] > [...] > pci_bus 0000:01: resource 0 [io 0xd000-0xdfff] > pci_bus 0000:01: resource 1 [mem 0xc5000000-0xc51fffff] > pci_bus 0000:01: resource 2 [mem 0x800100000-0x8002fffff 64bit pref] > [...] > vmwgfx 0000:00:01.0: vgaarb: deactivate vga console > [TTM] Zone kernel: Available graphics memory: 11974078 KiB > [TTM] Zone dma32: Available graphics memory: 2097152 KiB > vmwgfx 0000:00:01.0: BAR 1: can't reserve [mem 0xc0000000-0xc3ffffff pref] > [TTM] Zone kernel: Used memory at exit: 0 KiB > [TTM] Zone dma32: Used memory at exit: 0 KiB > vmwgfx: probe of 0000:00:01.0 failed with error -16 > > Applying just the patch mentioned in comment #8 does NOT change this at all > for me. This looks like a different bug. What virtualization platform is this on? It's hard to tell without the full log but it looks like the kernel has to reenumarate pci devices due to bar range conflict and the svga device doesn't acknowledge the new ranges. We fixed a bug related to this in VMware's products. I'm guessing that if you remove the PCI devices that causes the BAR range conflict the vm will be working again.
Hi Zack, (In reply to Zack Rusin from comment #17) > (In reply to Joachim Breuer from comment #15) > > I'm trying to get a "headful" VM working in qemu / Proxmox, and I'm having > > trouble with both qxl and vmwgfx. > > This looks like a different bug. What virtualization platform is this on? > It's hard to tell without the full log but it looks like the kernel has to > reenumarate pci devices due to bar range conflict and the svga device > doesn't acknowledge the new ranges. We fixed a bug related to this in > VMware's products. I'm guessing that if you remove the PCI devices that > causes the BAR range conflict the vm will be working again. This is on/in Proxmox VE 6.3-6, ie their variant/fork/whatever it is of qemu/kvm. In the mean time I can say that X also crashes with qxl and the whole kernel patch series applied; ie it bugs differently - without the kernel patch series, qxl does not initialize: int fd = open("/sys/bus/pci/devices/0000:00:01.0/resource0", O_RDWR | O_CLOEXEC); void *mem = mmap(NULL, 0x20000000, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0); yields MAP_FAILED with errno == EINVAL (This is basically what the qxl X driver does to obtain the framebuffer region.) With the kernel patch series applied, that mmap() works as expected, and the X server crashes quite a bit later within xf86InitViewport(). As a data point, it would seem that qxl requires the patch series fix similarly to vmwgfx, although for both that's not enough to get a working X display for me. Emulated "Standard VGA" works on the same VM. This VM is suitable for testing, so I'd be happy to try things out. I've seen indications that "something changed" between X.org 1.20 and 21.1.3 I'm currently running, so I'll first dig into that next.
Forgot to mention: The "patch series required" issue affects me on released kernel version 5.15.32.