Created attachment 303605 [details]
After 6.1.5 kernel update my vm locks up at boot, and dmesg says "BUG: kernel NULL pointer dereference, address: 0000000000000005" (more details in attachment)
I have few drives attached to vm with io_uring
<driver name="qemu" type="raw" cache="none" io="io_uring" discard="unmap" detect_zeroes="off"/>
After update from 6.1.4 to 6.1.5, rollback to 6.1.4 solve the problem
If you are building your own kernels, cherry pick commit 613b14884b8595e20b9fac4126bf627313827fbe and it should work fine. This was missed in the backport:
axboe@m1max ~/gi/linux-block (6.1-stable)> git cherry-pick 613b14884b8595e20b9fac4126bf627313827fbe
[6.1-stable fecb066b9366] block: handle bio_split_to_limits() NULL return
Date: Wed Jan 4 08:51:19 2023 -0700
8 files changed, 19 insertions(+), 2 deletions(-)
I've been running into the same issue as well. Definitely affects 6.1.x series from 6.1.5 and onward (up through and including 6.1.12). Downgrade to 6.1.4 and everything works fine, anything past that and it's broken. Running on 6.1.12 now and if I disable io_uring in my VM config, also works fine, so seems likely to be related to this issue (unfortunately, I just came across this issue and don't believe I have the logs anymore).
Does anybody know if this issue was fixed in kernel 6.2 release?
Huh, this really should be fixed in 6.1.12, see comment just above yours. I'm OOO until tomorrow, will take a look at it then. But 6.1.7 and newer has the patch, only 6.1.5/6.1.6 should be affected for the original report.
Darrin, please attach your oops here on 6.1.12, please.
(In reply to Jens Axboe from comment #5)
> Darrin, please attach your oops here on 6.1.12, please.
Let me go back through and see if I have the logs from that time. If not, I should be able to reproduce the issue again (not sure if the oops was the same and/or if it's a different issue), but timing of the breakage was the same, and reverting to 6.1.4 and/or disabling io_uring in the VM config fixes it). Will go back and re-test shortly when I'm done with what I'm working on.
Just re-tested this. It's pretty frustrating, as I don't see any dmesg/kernel errors related to this, no errors in the libvirtd/qemu logs, etc. From what I can see from a log perspective, it looks like everything is working fine. However, something has definitely been broken since 6.1.5+. Behavior is slightly different than the original post (Windows 10 VM boots fine, but when I try to access something on the volume that's using io_uring, that application hangs and it won't let me kill it (access denied error via task manager). If I try to shutdown the VM, it hangs on shutdown (end up having to force-off it).
Relevant storage config for that specific volume (data storage volume on an NVMe via LVM) is as follows:
<iothread id="3" thread_pool_min="4" thread_pool_max="16"/>
<iothreadpin iothread="3" cpuset="6,14"/>
<disk type="block" device="disk">
<driver name="qemu" type="raw" cache="none" io="io_uring"/>
<target dev="sdc" bus="scsi"/>
<address type="drive" controller="2" bus="0" target="0" unit="1"/>
<controller type="scsi" index="2" model="virtio-scsi">
<driver queues="4" iothread="3"/>
<address type="pci" domain="0x0000" bus="0x0d" slot="0x00" function="0x0"/>
There are two other volumes (boot + data) but doesn't appear to be affected by the issue, as they're currently using io=threads due to issues with io=io_uring breaking in the past and causing problems (especially on the boot volume).
If I change I/O setting on it from io_uring to threads, everything works fine on kernel 6.1.12. If I leave I/O setting as io_uring and revert back to kernel 6.1.4, everything works fine.
[root@experior ~]# uname -r
[root@experior ~]# libvirtd --version
libvirtd (libvirt) 9.0.0
[root@experior ~]# qemu-system-x86_64 --version
QEMU emulator version 7.2.0
IIRC, I believe there were multiple io_uring related commits in kernel 6.1.5, but since I'm not getting any errors in the logs, it's definitely not giving me much to work with to get a general idea as to what's causing the breakage.
I saw 6.1.13 was released yesterday or today, but I doubt it's going to make it into the Arch repos, as 6.2 is already in the testing repo. I was going to hold off for a bit before testing 6.2, but I might see if the issue is resolved there once it ends up in the standard repo.
How is your dm device setup on top of the nvme? I'm going to try and see if I can reproduce this.
[root@experior ~]# lshw -c storage
description: NVMe device
product: Samsung SSD 970 EVO Plus 2TB
vendor: Samsung Electronics Co Ltd
physical id: 0
bus info: pci@0000:01:00.0
logical name: /dev/nvme0
width: 64 bits
capabilities: nvme pm msi pciexpress msix nvm_express bus_master cap_list
configuration: driver=nvme latency=0 nqn=nqn.2014.08.org.nvmexpress:144d144dS59CNM0R905753H Samsung SSD 970 EVO Plus 2TB state=live
resources: irq:98 memory:fcf00000-fcf03fff
[root@experior ~]# parted /dev/nvme0n1 print
Model: Samsung SSD 970 EVO Plus 2TB (nvme)
Disk /dev/nvme0n1: 2000GB
Sector size (logical/physical): 512B/512B
Partition Table: gpt
Number Start End Size File system Name Flags
1 1049kB 1000GB 1000GB btrfs primary
2 1000GB 2000GB 1000GB primary
[root@experior ~]# lsblk -t /dev/nvme0n1
NAME ALIGNMENT MIN-IO OPT-IO PHY-SEC LOG-SEC ROTA SCHED RQ-SIZE RA WSAME
nvme0n1 0 512 0 512 512 0 none 1023 128 0B
├─nvme0n1p1 0 512 0 512 512 0 none 1023 128 0B
└─nvme0n1p2 0 512 0 512 512 0 none 1023 128 0B
└─vg_VMstore-lv_Win10VM 0 512 0 512 512 0 128 128 0B
[root@experior ~]# pvs
PV VG Fmt Attr PSize PFree
/dev/nvme0n1p2 vg_VMstore lvm2 a-- 931.50g 0
[root@experior ~]# vgs
VG #PV #LV #SN Attr VSize VFree
vg_VMstore 1 1 0 wz--n- 931.50g 0
[root@experior ~]# lvs
LV VG Attr LSize Pool Origin Data% Meta% Move Log Cpy%Sync Convert
lv_Win10VM vg_VMstore -wi-ao---- 931.50g
root@experior ~]# ls -alh /dev/disk/by-id | grep nvme
lrwxrwxrwx 1 root root 15 Feb 23 02:09 lvm-pv-uuid-TKF2kb-FUiD-elg1-3Oew-cCtc-xP2V-EjBesY -> ../../nvme0n1p2
lrwxrwxrwx 1 root root 13 Feb 23 02:09 nvme-eui.0025385911902877 -> ../../nvme0n1
lrwxrwxrwx 1 root root 15 Feb 23 02:09 nvme-eui.0025385911902877-part1 -> ../../nvme0n1p1
lrwxrwxrwx 1 root root 15 Feb 23 02:09 nvme-eui.0025385911902877-part2 -> ../../nvme0n1p2
lrwxrwxrwx 1 root root 13 Feb 23 02:09 nvme-Samsung_SSD_970_EVO_Plus_2TB_S59CNM0R905753H -> ../../nvme0n1
lrwxrwxrwx 1 root root 15 Feb 23 02:09 nvme-Samsung_SSD_970_EVO_Plus_2TB_S59CNM0R905753H-part1 -> ../../nvme0n1p1
lrwxrwxrwx 1 root root 15 Feb 23 02:09 nvme-Samsung_SSD_970_EVO_Plus_2TB_S59CNM0R905753H-part2 -> ../../nvme0n1p2
If you need any other info to attempt to reproduce, please do not hesitate to let me know. Thanks for looking into this.
As a side note, I might do some other testing if/when I have some time, as I have recent backups of the other volume images (primary boot volume is qcow2 backed on SSDs (BTRFS RAID-1) and secondary data volume is qcow2 backed on HDDs (ZFS RAID-10). Those broke with io_uring awhile back as well (I believe when I had transitioned from Ubuntu 20.04 over to Arch IIRC) and I haven't re-tested them since.
Tried many things, cannot seem to reproduce it. Can you grab:
grep . /sys/block/nvme0n1/queue/*
grep . /sys/block/vg_VMstore-lv_Win10VM/queue/*
output for me? And the lv_Win10VM, is that using btrfs? Or some other fs?
Might also help if you can provide a full vm config that I can just run here. If I need to load up a windows10 to reproduce, I'll be happy to do that. Key here is just that it'd be great if it includes all the config I need and the command you run to boot it, so that we avoid any differences there.
Oh, and I'm assuming vg_VMstore-lv_Win10VM is just linear? Here's how I setup my test:
dmsetup create test-linear --table '0 41943040 linear /dev/nvme1n1 0'
[root@experior ~]# grep . /sys/block/nvme0n1/queue/*
/sys/block/nvme0n1/queue/scheduler:[none] mq-deadline kyber bfq
[root@experior ~]# grep . /sys/block/vg_VMstore-lv_Win10VM/queue/*
grep: /sys/block/vg_VMstore-lv_Win10VM/queue/*: No such file or directory
Will respond re questions in next post, and will submit full VM config in separate follow-up post shortly as well.
As quick dumb side question, does BugZilla support code blocks at all?
Re the LVM volume, it's on top of a partition, no underlying filesystem on Linux (just passed raw to Win10 where it's formatted NTFS within Win10.
Original commands to setup were as follows:
parted /dev/nvme0n1 mklabel gpt
parted -a optimal /dev/nvme0n1 mkpart primary 0% 50%
parted -a optimal /dev/nvme0n1 mkpart primary 50% 100%
vgcreate vg_VMstore /dev/nvme0n1p2
lvcreate -l +100%FREE vg_VMstore -n lv_Win10VM
Created attachment 303775 [details]
VM config attached (currently have "io=threads" instead of "io=io_uring" configured for sdc as mentioned previously due to issues).
If you have any other questions or need any addl. details regarding any of the other config (e.g. -- host hardwware specs, storage setup for other 2 volumes, etc.), please do not hesitate to let me know.
Also, just as a heads-up re versions, liburing version is as follows (as I forgot to include this previously):
c0de@experior:~$ yay -Q | grep liburing
And please include instructions on how to launch it too. I don’t generally use libvirt or xml for my QEMU configurations.
> On Feb 23, 2023, at 2:07 PM, firstname.lastname@example.org wrote:
> --- Comment #14 from Darrin Burger (email@example.com) ---
> Created attachment 303775 [details]
> --> https://bugzilla.kernel.org/attachment.cgi?id=303775&action=edit
> VM config attached (currently have "io=threads" instead of "io=io_uring"
> configured for sdc as mentioned previously due to issues).
> You may reply to this email to add a comment.
> You are receiving this mail because:
> You are on the CC list for the bug.
Just out of curiosity, are you testing on Arch or a different distro? On Arch, I'm using the qemu-full package, but I usually create and start/stop VMs via VMM (virt-manager) outside of custom parameters that need to be adjusted manually in the XML.
c0de@experior:~$ yay -Q | grep qemu-full
c0de@experior:~$ yay -Q | grep virt-manager
I believe the libvirt/QEMU logs show the full command-line, so I'll see if I can grab that and throw it in an attachment in a follow-up post, although I suspect that you'll need to edit some of the devices/paths/etc. since they're probably different on your system than on mine. Also, if you need to know exactly how the other two storage volumes are setup, I can dig back through my notes in re to see if I can put together a process and the relevant settings for them, given that one is backed on BTRFS on SSDs and the other is backed on ZFS on HDDs.
Created attachment 303776 [details]
As a quick note, the "MainDesktop1-Win10.xml" file would reside in the /etc/libvirt/qemu directory.
Created attachment 303777 [details]
Also, while it's probably irrelevant to the current issue, I also have CPU isolation configured via "/etc/libvirt/hooks/qemu", contents are in attachment.
Just as a quick follow-up, I'm still seeing the issue/behavior persist on kernel 6.2.2 as well.
Sorry been slow on this, and I'm OOO this week. I'll get back on this on Monday so we can get to the bottom of wtf is going on here.
Jens, thank you for the update. All good, I appreciate the assistance with this. Once you have a chance to dig further into the issue, please do not hesitate to reach out if you need any additional information.