Created attachment 292143 [details] VM Fail Log My primary Windows 10 VM uses GPU/SATA/USB passthrough and it appears a regression has been introduced in kernel 5.9. The VM will not start with both rc1 and rc2 because of the "VFIO_MAP_DMA failed: Cannot allocate memory" error on all passthrough devices. I'm running an R7 3700X CPU, ASUS TUF Gaming X570-Plus MB, 16GB PC3200 DDR4, and two RX 580 GPUs (one dedicated to the VM passthrough). I've attached the relevant log excerpt to this report. By the way, the log only shows one device failing because QEMU/KVM exits on the first error. But I switched around all VM devices so they were the first encountered and each one failed in the same way, so it's not device related. The VM works fantastically with kernels 5.8.3 and 5.4.60.
There's another similar report here: https://lore.kernel.org/kvm/6d0a5da6-0deb-17c5-f8f5-f8113437c2d6@linux.ibm.com/ I don't seem to be able to reproduce on EPYC. Is there any chance you could bisect it?
Oh, that's interesting. Well, at least we know it doesn't have anything to do with pinning since my VM is pinless :) In any case I'll go ahead and bisect it and see if I can identify the bad commit.
Unfortunately bisect failed, and in a very odd way. I've bisected the kernel numerous times over the decades, but this time it didn't work correctly from the start because of module directories it wanted to delete that weren't there. To make a long story short, after the initial compile of 5.9-rc1 I did the normal bisect start and good/bad version definition. But when making the first bisect it failed during the final phase of the modules process with errors saying it couldn't delete "pkg/linux-bisect/usr/lib/modules/5.9.0-rc1-1-bisect/source" or "pkg/linux-bisect/usr/lib/modules/5.9.0-rc1-1-bisect/build". And when I looked they didn't exist. So I spent a few hours trying numerous things, but could never get bisect to work as expected. In the end I just timed the manual creation of the directories correctly as the modules process was completing, but then bisect complained they were directories. So I tried again but this time just touched to make files instead of directories, and bisect completed. Perplexed but undeterred I installed and ran the bisected kernel, the VM worked, and I marked the bisect as good. But when compiling the next bisect the same thing happened, and I did the same thing to fix it. However this time when I installed and ran the kernel my VM seemed to boot, but actually didn't. Neither the QXL or passthrough GPU displays came on, and I couldn't shut it down. I just had to do a power off. So I rebooted with my working 5.8.3 kernel and was surprised that my entire VM disk was completely erased. There were no partitions at all, it was just blank. Of course I made a backup before doing all this so it was easy to restore, but it's the first time I've ever seen anything like it. In any case, the disk is attached to a passthrough Phison NVME controller, so I assume there was some kind of different, silent, VFIO error that wiped out the disk. In summary, I have no idea what's going on. Of course sometimes bisect works and sometimes it doesn't, and the kernel is the most difficult and dangerous to bisect, but I've never seen actual process errors like this before. Compilation errors yes, but not missing source or package files and directories. I'm hoping, and assuming, this is some kind of pilot error on my part. If so, and someone knows what it is, just tell me what it is and I'll give it another try. By the way, I'm running Arch with all the latest updates.
Oh yeah, as I assumed it was pilot error. After I finished my other tasks for the day and had a few minutes to concentrate on the bisect output I realized I probably had to compile the exact initial version for bisect to work in Arch. And indeed once I created a custom PKGBUILD the first bisect compilation completed without error. It's too late to continue this evening, but I don't have any tasks scheduled for the first part of the day tomorrow so I'll concentrate solely on the bisect and hopefully get a bit further this time.
Hi, it's me Niklas from the KVM mailinglist discussion and yes this is a very old pre-IBM, pre any work, Bugzilla account :D I too did a bisect yesterday and also encountered a few commits that had KVM in a very weird state where not even the UEFI in the VM would boot, funnily enough a BIOS based FreeBSD VM did still boot. Anyway my bisect was successful and reverting the found commit makes things work even on v5.9-rc2. That said it is quite a strange result but I guess it makes sense as that also deals with locked/pinned memory. I'm assuming this might use the same accounting mechanism? f74441e6311a28f0ee89b9c8e296a33730f812fc is the first bad commit commit f74441e6311a28f0ee89b9c8e296a33730f812fc Author: Jens Axboe <axboe@kernel.dk> Date: Wed Aug 5 13:00:44 2020 -0600 io_uring: account locked memory before potential error case The tear down path will always unaccount the memory, so ensure that we have accounted it before hitting any of them. Reported-by: Tomáš Chaloupka <chalucha@gmail.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk> fs/io_uring.c | 18 ++++++++++-------- 1 file changed, 10 insertions(+), 8 deletions(-) I've added Jens to the Bugzilla CC list not sure if he'll see that though.
(In reply to muncrief from comment #4) I'm attaching the patch that should fix this. muncrief, I like to provide proper attribution in patches, would you be willing to share your name and email so I can add it to the patch? If you prefer not to that's totally fine as well, just wanted to give you the option. Attaching the patch after this comment.
Created attachment 292167 [details] Fix sqo_mm accounting
(In reply to Jens Axboe from comment #6) > (In reply to muncrief from comment #4) > > I'm attaching the patch that should fix this. muncrief, I like to provide > proper attribution in patches, would you be willing to share your name and > email so I can add it to the patch? If you prefer not to that's totally fine > as well, just wanted to give you the option. > > Attaching the patch after this comment. Awesome Jens! Thank you for figuring this thing out. I'll try the patch as soon as I'm done with breakfast. And sharing my name and email is fine. I changed my account to my full name (Robert M. Muncrief). By the way, for future reference was my assumption that I have to compile the exact initial kernel version before starting the bisect correct? I switched to Manjaro three or four years ago, and then Arch about two years ago, but I don't recall having to do it that way before. But then again I'm not sure if I've ever bisected the kernel on Arch, it may just have been on Manjaro and Xubuntu. And hey, don't laugh! I'm old! And my memory sure isn't what it used to be ... :)
Hi Robert, git does not know what you compiled so you could just do "git bisect;git good v5.8;git bad v5.9-rc1" with that said it is of course best to always compile the versions you tell git are (not) working. I'm a fellow Arch Linux user (on all my private machines) and actually suspect its current QEMU and other package versions were necessary to expose this bug and are the reason Alex could not reproduce this. I did not do the git bisect with PKGBUILDs though, instead I have a custom systemd-boot entry and in the .config set LOCALVERSION="-niklas" then I used the following commands: cd linux zcat /proc/config.gz > .config # once ton get Arch Config make oldconfig make -j 24 sudo make modules_install -j INSTALL_MOD_STRIP=1 sudo cp arch/x86_64/boot/bzImage /boot/vmlinuz-linux-niklas sudo mkinitcpio -p linux-niklas The last part is arch specific, on other distros there is a special installkernel script that does the copy to /boot and rebuilds the initramfs and also creates bootloader entries. Also only add the strip flag if you don't need debug symbols in modules. The manual cp/modules_install of course means I have to delete the /usr/lib/modules/.. folders manually but they all have "niklas" in the name so that's easy enough ;-)
> I'm a fellow Arch Linux user (on all my private machines) and actually > suspect > its current QEMU and other package versions were necessary to expose this bug > and are the reason Alex could not reproduce this. Newer qemu versions use io_uring for faster IO, hence that's why you'd see it. If you're not using io_uring at all, you would not trigger the imbalance.
Fantastic work Jens! I just tested this patch and my VM ran perfectly, and there were zero dmesg error or fail messages.
Thank you for the information Niklas. I could swear I'd bisected the kernel at least once on Arch, and know I did a few times on Manjaro, and I always started by compiling the bad version. But I must at least be wrong about Arch, because if I'd done anything special I would have written it down in my install notes. In any case I've written your info in my notes and I'll also try to pay more attention to what I'm doing next time.
Thanks everyone, fix is queued up: https://git.kernel.dk/cgit/linux-block/commit/?h=io_uring-5.9&id=6b7898eb180df12767933466b7855b23103ad489