Bug 209025 - The "VFIO_MAP_DMA failed: Cannot allocate memory" bug is back
Summary: The "VFIO_MAP_DMA failed: Cannot allocate memory" bug is back
Status: NEW
Alias: None
Product: Virtualization
Classification: Unclassified
Component: kvm (show other bugs)
Hardware: x86-64 Linux
: P1 high
Assignee: virtualization_kvm
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2020-08-24 16:35 UTC by Robert M. Muncrief
Modified: 2020-08-25 18:06 UTC (History)
3 users (show)

See Also:
Kernel Version: 5.9 rc1 and rc2
Tree: Mainline
Regression: No


Attachments
VM Fail Log (6.18 KB, text/plain)
2020-08-24 16:35 UTC, Robert M. Muncrief
Details
Fix sqo_mm accounting (1.63 KB, patch)
2020-08-25 14:32 UTC, Jens Axboe
Details | Diff

Description Robert M. Muncrief 2020-08-24 16:35:06 UTC
Created attachment 292143 [details]
VM Fail Log

My primary Windows 10 VM uses GPU/SATA/USB passthrough and it appears a regression has been introduced in kernel 5.9. The VM will not start with both rc1 and rc2 because of the "VFIO_MAP_DMA failed: Cannot allocate memory" error on all passthrough devices.

I'm running an R7 3700X CPU, ASUS TUF Gaming X570-Plus MB, 16GB PC3200 DDR4, and two RX 580 GPUs (one dedicated to the VM passthrough). I've attached the relevant log excerpt to this report.

By the way, the log only shows one device failing because QEMU/KVM exits on the first error. But I switched around all VM devices so they were the first encountered and each one failed in the same way, so it's not device related. 

The VM works fantastically with kernels 5.8.3 and 5.4.60.
Comment 1 Alex Williamson 2020-08-24 20:43:36 UTC
There's another similar report here:

https://lore.kernel.org/kvm/6d0a5da6-0deb-17c5-f8f5-f8113437c2d6@linux.ibm.com/

I don't seem to be able to reproduce on EPYC.  Is there any chance you could bisect it?
Comment 2 Robert M. Muncrief 2020-08-24 20:57:47 UTC
Oh, that's interesting. Well, at least we know it doesn't have anything to do with pinning since my VM is pinless :)

In any case I'll go ahead and bisect it and see if I can identify the bad commit.
Comment 3 Robert M. Muncrief 2020-08-25 00:53:13 UTC
Unfortunately bisect failed, and in a very odd way. I've bisected the kernel numerous times over the decades, but this time it didn't work correctly from the start because of module directories it wanted to delete that weren't there.

To make a long story short, after the initial compile of 5.9-rc1 I did the normal bisect start and good/bad version definition. But when making the first bisect it failed during the final phase of the modules process with errors saying it couldn't delete "pkg/linux-bisect/usr/lib/modules/5.9.0-rc1-1-bisect/source" or "pkg/linux-bisect/usr/lib/modules/5.9.0-rc1-1-bisect/build". And when I looked they didn't exist.

So I spent a few hours trying numerous things, but could never get bisect to work as expected. In the end I just timed the manual creation of the directories correctly as the modules process was completing, but then bisect complained they were directories. So I tried again but this time just touched to make files instead of directories, and bisect completed.

Perplexed but undeterred I installed and ran the bisected kernel, the VM worked, and I marked the bisect as good. But when compiling the next bisect the same thing happened, and I did the same thing to fix it. However this time when I installed and ran the kernel my VM seemed to boot, but actually didn't. Neither the QXL or passthrough GPU displays came on, and I couldn't shut it down. I just had to do a power off.

So I rebooted with my working 5.8.3 kernel and was surprised that my entire VM disk was completely erased. There were no partitions at all, it was just blank. Of course I made a backup before doing all this so it was easy to restore, but it's the first time I've ever seen anything like it.

In any case, the disk is attached to a passthrough Phison NVME controller, so I assume there was some kind of different, silent, VFIO error that wiped out the disk.

In summary, I have no idea what's going on. Of course sometimes bisect works and sometimes it doesn't, and the kernel is the most difficult and dangerous to bisect, but I've never seen actual process errors like this before. Compilation errors yes, but not missing source or package files and directories.

I'm hoping, and assuming, this is some kind of pilot error on my part. If so, and someone knows what it is, just tell me what it is and I'll give it another try. By the way, I'm running Arch with all the latest updates.
Comment 4 Robert M. Muncrief 2020-08-25 03:14:55 UTC
Oh yeah, as I assumed it was pilot error. After I finished my other tasks for the day and had a few minutes to concentrate on the bisect output I realized I probably had to compile the exact initial version for bisect to work in Arch. And indeed once I created a custom PKGBUILD the first bisect compilation completed without error.

It's too late to continue this evening, but I don't have any tasks scheduled for the first part of the day tomorrow so I'll concentrate solely on the bisect and hopefully get a bit further this time.
Comment 5 Niklas Schnelle 2020-08-25 07:26:26 UTC
Hi,

it's me Niklas from the KVM mailinglist discussion and yes
this is a very old pre-IBM, pre any work, Bugzilla account :D

I too did a bisect yesterday and also
encountered a few commits that had KVM in a very weird state
where not even the UEFI in the VM would boot, funnily enough
a BIOS based FreeBSD VM did still boot.

Anyway my bisect was successful and reverting the found
commit makes things work even on v5.9-rc2.

That said it is quite a strange result but I guess it makes
sense as that also deals with locked/pinned memory.
I'm assuming this might use the same accounting mechanism?

f74441e6311a28f0ee89b9c8e296a33730f812fc is the first bad commit
commit f74441e6311a28f0ee89b9c8e296a33730f812fc
Author: Jens Axboe <axboe@kernel.dk>
Date:   Wed Aug 5 13:00:44 2020 -0600

    io_uring: account locked memory before potential error case

    The tear down path will always unaccount the memory, so ensure that we
    have accounted it before hitting any of them.

    Reported-by: Tomáš Chaloupka <chalucha@gmail.com>
    Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
    Signed-off-by: Jens Axboe <axboe@kernel.dk>

 fs/io_uring.c | 18 ++++++++++--------
 1 file changed, 10 insertions(+), 8 deletions(-)

I've added Jens to the Bugzilla CC list not sure if he'll see
that though.
Comment 6 Jens Axboe 2020-08-25 14:32:04 UTC
(In reply to muncrief from comment #4)

I'm attaching the patch that should fix this. muncrief, I like to provide proper attribution in patches, would you be willing to share your name and email so I can add it to the patch? If you prefer not to that's totally fine as well, just wanted to give you the option.

Attaching the patch after this comment.
Comment 7 Jens Axboe 2020-08-25 14:32:48 UTC
Created attachment 292167 [details]
Fix sqo_mm accounting
Comment 8 Robert M. Muncrief 2020-08-25 16:31:53 UTC
(In reply to Jens Axboe from comment #6)
> (In reply to muncrief from comment #4)
> 
> I'm attaching the patch that should fix this. muncrief, I like to provide
> proper attribution in patches, would you be willing to share your name and
> email so I can add it to the patch? If you prefer not to that's totally fine
> as well, just wanted to give you the option.
> 
> Attaching the patch after this comment.

Awesome Jens! Thank you for figuring this thing out. I'll try the patch as soon as I'm done with breakfast. And sharing my name and email is fine. I changed my account to my full name (Robert M. Muncrief).

By the way, for future reference was my assumption that I have to compile the exact initial kernel version before starting the bisect correct? I switched to Manjaro three or four years ago, and then Arch about two years ago, but I don't recall having to do it that way before. But then again I'm not sure if I've ever bisected the kernel on Arch, it may just have been on Manjaro and Xubuntu.

And hey, don't laugh! I'm old! And my memory sure isn't what it used to be ... :)
Comment 9 Niklas Schnelle 2020-08-25 17:09:05 UTC
Hi Robert,

git does not know what you compiled so you could just
do "git bisect;git good v5.8;git bad v5.9-rc1"
with that said it is of course best to always compile the versions
you tell git are (not) working.

I'm a fellow Arch Linux user (on all my private machines) and actually suspect its current QEMU and other package versions were necessary to expose this bug
and are the reason Alex could not reproduce this.

I did not do the git bisect with PKGBUILDs though, instead I
have a custom systemd-boot entry and in the .config set LOCALVERSION="-niklas"
then I used the following commands:

cd linux
zcat /proc/config.gz > .config # once ton get Arch Config
make oldconfig
make -j 24 
sudo make modules_install -j INSTALL_MOD_STRIP=1
sudo cp arch/x86_64/boot/bzImage /boot/vmlinuz-linux-niklas
sudo mkinitcpio -p linux-niklas 

The last part is arch specific, on other distros there is a special installkernel script that does the copy to /boot and rebuilds the initramfs and also creates bootloader entries.
Also only add the strip flag if you don't need debug symbols
in modules.

The manual cp/modules_install of course means I have
to delete the /usr/lib/modules/.. folders manually but they all have "niklas" in the name so that's easy enough ;-)
Comment 10 Jens Axboe 2020-08-25 17:13:19 UTC
> I'm a fellow Arch Linux user (on all my private machines) and actually
> suspect
> its current QEMU and other package versions were necessary to expose this bug
> and are the reason Alex could not reproduce this.

Newer qemu versions use io_uring for faster IO, hence that's why you'd see it. If you're not using io_uring at all, you would not trigger the imbalance.
Comment 11 Robert M. Muncrief 2020-08-25 17:47:00 UTC
Fantastic work Jens! I just tested this patch and my VM ran perfectly, and there were zero dmesg error or fail messages.
Comment 12 Robert M. Muncrief 2020-08-25 17:55:34 UTC
Thank you for the information Niklas. I could swear I'd bisected the kernel at least once on Arch, and know I did a few times on Manjaro, and I always started by compiling the bad version. But I must at least be wrong about Arch, because if I'd done anything special I would have written it down in my install notes.

In any case I've written your info in my notes and I'll also try to pay more attention to what I'm doing next time.

Note You need to log in before you can comment on or make changes to this bug.