Bug 208875 - btrfs checksum errors in guest when virtio device io=io_uring
Summary: btrfs checksum errors in guest when virtio device io=io_uring
Status: RESOLVED CODE_FIX
Alias: None
Product: IO/Storage
Classification: Unclassified
Component: AIO (show other bugs)
Hardware: All Linux
: P1 normal
Assignee: Badari Pulavarty
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2020-08-11 16:08 UTC by Chris Murphy
Modified: 2021-11-03 18:49 UTC (History)
2 users (show)

See Also:
Kernel Version: 5.8.0
Subsystem:
Regression: No
Bisected commit-id:


Attachments
dmesg (79.50 KB, text/plain)
2020-08-11 16:09 UTC, Chris Murphy
Details
virsh dumpxml (8.66 KB, text/xml)
2020-08-12 00:16 UTC, Chris Murphy
Details
bcc biolatency -DT 15 (165.88 KB, text/plain)
2020-08-12 02:36 UTC, Chris Murphy
Details
journal, full, wall clock time (773.91 KB, text/plain)
2020-08-12 02:37 UTC, Chris Murphy
Details

Description Chris Murphy 2020-08-11 16:08:47 UTC
Host:
5.8.0-1.fc33.x86_64
qemu-kvm-5.0.0-5.fc33.x86_64

Backing file on host
# truncate -s 100g f33default.raw
# lsattr 
---------------C---- ./f33default.raw


Guest:
5.8.0-1.fc33.x86_64

Normally this installs just fine, but if I use io='io_uring' for the virtioblk device used as the installation target. Inside the guest VM: late in the installation the performance falls off a cliff, because super slow, and then btrfs spews checksum errors. Because virtioblk reports rotational, by default mkfs.btrfs does dup metadata. Curiously all of the checksum errors are only the mirror 1 copy. That's curious. And also there are no fixups, assuming it is OK with the mirror 2 copies.
Comment 1 Chris Murphy 2020-08-11 16:09:59 UTC
Created attachment 290833 [details]
dmesg
Comment 2 Chris Murphy 2020-08-12 00:13:42 UTC
# ps aux | grep qemu
qemu       16619  2.4 40.8 4276924 3293748 ?     Sl   15:05   4:39 /usr/bin/qemu-system-x86_64 -name guest=uefivm,debug-threads=on -S -object secret,id=masterKey0,format=raw,file=/var/lib/libvirt/qemu/domain-2-uefivm/master-key.aes -blockdev {"driver":"file","filename":"/usr/share/edk2/ovmf/OVMF_CODE.fd","node-name":"libvirt-pflash0-storage","auto-read-only":true,"discard":"unmap"} -blockdev {"node-name":"libvirt-pflash0-format","read-only":true,"driver":"raw","file":"libvirt-pflash0-storage"} -blockdev {"driver":"file","filename":"/var/lib/libvirt/qemu/nvram/uefivm_VARS.fd","node-name":"libvirt-pflash1-storage","auto-read-only":true,"discard":"unmap"} -blockdev {"node-name":"libvirt-pflash1-format","read-only":false,"driver":"raw","file":"libvirt-pflash1-storage"} -machine pc-q35-5.0,accel=kvm,usb=off,vmport=off,dump-guest-core=off,pflash0=libvirt-pflash0-format,pflash1=libvirt-pflash1-format -cpu SandyBridge-IBRS,vme=on,vmx=on,pdcm=on,pcid=on,hypervisor=on,arat=on,tsc-adjust=on,umip=on,md-clear=on,stibp=on,arch-capabilities=on,ssbd=on,xsaveopt=on,ibpb=on,amd-stibp=on,amd-ssbd=on,pschange-mc-no=on -m 3072 -overcommit mem-lock=off -smp 4,sockets=4,cores=1,threads=1 -uuid 76270619-0043-4c0b-bac6-29805be87ddf -no-user-config -nodefaults -chardev socket,id=charmonitor,fd=35,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc,driftfix=slew -global kvm-pit.lost_tick_policy=delay -no-hpet -no-shutdown -global ICH9-LPC.disable_s3=1 -global ICH9-LPC.disable_s4=1 -boot strict=on -device pcie-root-port,port=0x10,chassis=1,id=pci.1,bus=pcie.0,multifunction=on,addr=0x2 -device pcie-root-port,port=0x11,chassis=2,id=pci.2,bus=pcie.0,addr=0x2.0x1 -device pcie-root-port,port=0x12,chassis=3,id=pci.3,bus=pcie.0,addr=0x2.0x2 -device pcie-root-port,port=0x13,chassis=4,id=pci.4,bus=pcie.0,addr=0x2.0x3 -device pcie-root-port,port=0x14,chassis=5,id=pci.5,bus=pcie.0,addr=0x2.0x4 -device pcie-root-port,port=0x15,chassis=6,id=pci.6,bus=pcie.0,addr=0x2.0x5 -device pcie-root-port,port=0x16,chassis=7,id=pci.7,bus=pcie.0,addr=0x2.0x6 -device pcie-root-port,port=0x17,chassis=8,id=pci.8,bus=pcie.0,addr=0x2.0x7 -device qemu-xhci,p2=15,p3=15,id=usb,bus=pci.2,addr=0x0 -device virtio-serial-pci,id=virtio-serial0,bus=pci.3,addr=0x0 -blockdev {"driver":"file","filename":"/var/lib/libvirt/images/f33default.raw","aio":"io_uring","node-name":"libvirt-2-storage","cache":{"direct":false,"no-flush":true},"auto-read-only":true,"discard":"unmap"} -blockdev {"node-name":"libvirt-2-format","read-only":false,"discard":"unmap","detect-zeroes":"unmap","cache":{"direct":false,"no-flush":true},"driver":"raw","file":"libvirt-2-storage"} -device virtio-blk-pci,bus=pci.4,addr=0x0,drive=libvirt-2-format,id=virtio-disk0,bootindex=2,write-cache=on -blockdev {"driver":"file","filename":"/var/lib/libvirt/images/Fedora-Workstation-Live-x86_64-Rawhide-20200809.n.0.iso","node-name":"libvirt-1-storage","auto-read-only":true,"discard":"unmap"} -blockdev {"node-name":"libvirt-1-format","read-only":true,"driver":"raw","file":"libvirt-1-storage"} -device ide-cd,bus=ide.0,drive=libvirt-1-format,id=sata0-0-0,bootindex=1 -netdev tap,fd=37,id=hostnet0,vhost=on,vhostfd=38 -device virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:9d:96:34,bus=pci.1,addr=0x0 -chardev pty,id=charserial0 -device isa-serial,chardev=charserial0,id=serial0 -chardev socket,id=charchannel0,fd=39,server,nowait -device virtserialport,bus=virtio-serial0.0,nr=1,chardev=charchannel0,id=channel0,name=org.qemu.guest_agent.0 -chardev spicevmc,id=charchannel1,name=vdagent -device virtserialport,bus=virtio-serial0.0,nr=2,chardev=charchannel1,id=channel1,name=com.redhat.spice.0 -device usb-tablet,id=input0,bus=usb.0,port=1 -spice port=5900,addr=127.0.0.1,disable-ticketing,seamless-migration=on -device qxl-vga,id=video0,ram_size=67108864,vram_size=67108864,vram64_size_mb=0,vgamem_mb=16,max_outputs=1,bus=pcie.0,addr=0x1 -device ich9-intel-hda,id=sound0,bus=pcie.0,addr=0x1b -device hda-duplex,id=sound0-codec0,bus=sound0.0,cad=0 -chardev spicevmc,id=charredir0,name=usbredir -device usb-redir,chardev=charredir0,id=redir0,bus=usb.0,port=2 -chardev spicevmc,id=charredir1,name=usbredir -device usb-redir,chardev=charredir1,id=redir1,bus=usb.0,port=3 -device virtio-balloon-pci,id=balloon0,bus=pci.5,addr=0x0 -object rng-random,id=objrng0,filename=/dev/urandom -device virtio-rng-pci,rng=objrng0,id=rng0,bus=pci.6,addr=0x0 -sandbox on,obsolete=deny,elevateprivileges=deny,spawn=deny,resourcecontrol=deny -msg timestamp=on
Comment 3 Chris Murphy 2020-08-12 00:16:27 UTC
Created attachment 290835 [details]
virsh dumpxml
Comment 4 Chris Murphy 2020-08-12 01:43:02 UTC
https://bugzilla.kernel.org/show_bug.cgi?id=208827
Comment 5 Chris Murphy 2020-08-12 02:32:00 UTC
I ran bcc-tool biolatency during an installation.

The /dev/vda device that has io=io_uring enabled seems to have lower latency out of the gate, but also i'm not seeing the bimodal latencies that I do with io=threads.

But immediately following the rsync phase, the installed system is put in a chroot and post install stuff is happening, including dracut running, and the latency become absolutely stratospheric...33 second latencies.

Example:

disk = b'vda'
     usecs               : count     distribution
         0 -> 1          : 0        |                                        |
         2 -> 3          : 0        |                                        |
         4 -> 7          : 0        |                                        |
         8 -> 15         : 0        |                                        |
        16 -> 31         : 0        |                                        |
        32 -> 63         : 28       |****                                    |
        64 -> 127        : 239      |****************************************|
       128 -> 255        : 45       |*******                                 |
       256 -> 511        : 43       |*******                                 |
       512 -> 1023       : 98       |****************                        |
      1024 -> 2047       : 57       |*********                               |
      2048 -> 4095       : 12       |**                                      |
      4096 -> 8191       : 9        |*                                       |
      8192 -> 16383      : 0        |                                        |
     16384 -> 32767      : 0        |                                        |
     32768 -> 65535      : 0        |                                        |
     65536 -> 131071     : 0        |                                        |
    131072 -> 262143     : 0        |                                        |
    262144 -> 524287     : 0        |                                        |
    524288 -> 1048575    : 2        |                                        |
   1048576 -> 2097151    : 0        |                                        |
   2097152 -> 4194303    : 0        |                                        |
   4194304 -> 8388607    : 0        |                                        |
   8388608 -> 16777215   : 0        |                                        |
  16777216 -> 33554431   : 1        |                                        |


Another odd duck?

Aug 11 17:05:32 localhost-live chronyd[1285]: System clock was stepped by -14405.762210 seconds
...
Aug 11 22:08:35 localhost-live chronyd[1285]: System clock wrong by -6.721561 seconds, adjustment started

Ignore the five hour gap, I was just lazy having started the VM but only started the install five hours later. The odd duck is that somehow the system lock is wrong by 6 seconds as if the actual VM in its entirety, stalled, and its own system clock got stuck in the delay.

Attaching journal and matching biolatency output, the times match. So you can see that the skyrocketing latency coincides with chroot and dracut starting. The rsync (and bulk of the installation) was ok. Weird.

The installation never recovers.
Comment 6 Chris Murphy 2020-08-12 02:36:45 UTC
Created attachment 290837 [details]
bcc biolatency -DT 15
Comment 7 Chris Murphy 2020-08-12 02:37:08 UTC
Created attachment 290839 [details]
journal, full, wall clock time
Comment 8 Chris Murphy 2020-08-12 02:41:06 UTC
$ cat /sys/block/vda/queue/scheduler 
[mq-deadline] kyber bfq none


Same for the device the raw backing file is on. rawfile+nodatacow>btrfs>SSD. No other layers.
Comment 9 Chris Murphy 2020-08-12 02:56:15 UTC
Using bcc-tools btrfsdist 15 on the host, I'm not seeing any latency changes during the installation.
Comment 10 Josef Bacik 2020-08-13 14:06:19 UTC
Reproduced the problems, my scripts are here

https://github.com/josefbacik/virt-scripts.git

Just need to make an image file

dd if=/dev/zero of=file.img bs=1M count=1 seek=50000

then run

./fedora-install file.img

You need to have virt-install installed, with all the random libvirt stuff, virt-viewer as well.  This will do the fedora install over the internet and do all the things you need.  Once it reboots into a prompt you'll need to shut down the guest, and then do

virsh edit fedora-rawhide

Way down you'll see something like

<disk type='file' device='disk'>
  <driver name='qemu' type='raw'/>
  <source file='/path/to/file.img'/>

You'll need to add io='io_uring' to the driver section, so it looks like

<driver name='qemu' type='raw' io='io_uring'/>

then you can re-launch the vm, log in as root, password is redhat.  From there you can just do

sh jens-broke-it.sh

and wait for it to fail, it doesn't take very long, it'll break out and you'll see a series of checksum errors.
Comment 11 Chris Murphy 2020-08-15 18:46:27 UTC
I'm curious why the problem doesn't happen until the chroot. 100% find during rsync and then blows up at the chroot+dracut stage.

But I just read this on Fedora devel@ (not io_uring or btrfs related as far as I know) in mock with nspawn chroots.

"It does! Running mock with --isolation=simple works around the issue.
Looks like the glibc 2.32.9000 snapshot broke systemd-nspawn based
chroots with this change:
- Linux: Use faccessat2 to implement faccessat (bug 18683)"


https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org/message/YS6MPDLXP5BENHHHA43EW4YLI5UG435D/

https://sourceware.org/bugzilla/show_bug.cgi?id=18683#c3
Comment 12 David Sterba 2021-11-03 18:49:20 UTC
According to https://bugzilla.kernel.org/show_bug.cgi?id=208827 it got fixed, closing.

Note You need to log in before you can comment on or make changes to this bug.