Bug 198195
Summary: | RSM: set CR3.PCID only after CR4.PCIDE (exposed by guest 10af6235e0d3) | ||
---|---|---|---|
Product: | Virtualization | Reporter: | Laszlo Ersek (laszlo.ersek) |
Component: | kvm | Assignee: | Paolo Bonzini (bonzini) |
Status: | RESOLVED CODE_FIX | ||
Severity: | normal | CC: | bonzini, lenny.szubowicz, luto |
Priority: | P1 | ||
Hardware: | x86-64 | ||
OS: | Linux | ||
Kernel Version: | v4.14.5 | Subsystem: | |
Regression: | No | Bisected commit-id: | |
Attachments: | tentative patch |
Description
Laszlo Ersek
2017-12-18 20:53:10 UTC
* OVMF build commands: (1) perform the steps described in "CryptoPkg/Library/OpensslLib/OpenSSL-HOWTO.txt" (2) run: $ source edksetup.sh $ build \ -a IA32 \ -a X64 \ -p OvmfPkg/OvmfPkgIa32X64.dsc \ -D SMM_REQUIRE \ -D SECURE_BOOT_ENABLE \ -t GCC48 \ -b NOOPT \ -D HTTP_BOOT_ENABLE * QEMU command line (generated by libvirt): /opt/qemu-installed/bin/qemu-system-x86_64 \ -name guest=ovmf.fedora.q35,debug-threads=on \ -S \ -object secret,id=masterKey0,format=raw,file=/var/lib/libvirt/qemu/domain-37-ovmf.fedora.q35/master-key.aes \ -machine pc-q35-2.11,accel=kvm,usb=off,smm=on,dump-guest-core=off \ -cpu Haswell-noTSX,vmx=on \ -global driver=cfi.pflash01,property=secure,value=on \ -drive file=/home/virt-images/OVMF_CODE.4m.3264.fd,if=pflash,format=raw,unit=0,readonly=on \ -drive file=/var/lib/libvirt/qemu/nvram/ovmf.fedora.q35_VARS.fd,if=pflash,format=raw,unit=1 \ -m 5120 \ -realtime mlock=off \ -smp 4,sockets=1,cores=2,threads=2 \ -uuid a51c0e4c-93b1-4485-811e-ea9727eb748c \ -no-user-config \ -nodefaults \ -chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/domain-37-ovmf.fedora.q35/monitor.sock,server,nowait \ -mon chardev=charmonitor,id=monitor,mode=control \ -rtc base=utc \ -no-shutdown \ -global ICH9-LPC.disable_s3=0 \ -global ICH9-LPC.disable_s4=1 \ -boot menu=off,strict=on \ -device i82801b11-bridge,id=pci.1,bus=pcie.0,addr=0x1e \ -device pci-bridge,chassis_nr=2,id=pci.2,bus=pci.1,addr=0x1 \ -device pcie-root-port,port=0x10,chassis=3,id=pci.3,bus=pcie.0,multifunction=on,addr=0x3 \ -device pcie-root-port,port=0x11,chassis=4,id=pci.4,bus=pcie.0,multifunction=on,addr=0x3.0x1 \ -device ich9-usb-ehci1,id=usb,bus=pci.2,addr=0x2.0x7 \ -device ich9-usb-uhci1,masterbus=usb.0,firstport=0,bus=pci.2,multifunction=on,addr=0x2 \ -device ich9-usb-uhci2,masterbus=usb.0,firstport=2,bus=pci.2,addr=0x2.0x1 \ -device ich9-usb-uhci3,masterbus=usb.0,firstport=4,bus=pci.2,addr=0x2.0x2 \ -device qemu-xhci,p2=15,p3=15,id=usb1,bus=pcie.0,multifunction=on,addr=0x2 \ -device qemu-xhci,p2=15,p3=15,id=usb2,bus=pcie.0,addr=0x2.0x1 \ -device virtio-scsi-pci,id=scsi0,bus=pci.2,addr=0x5 \ -device virtio-serial-pci,id=virtio-serial0,bus=pci.2,addr=0x6 \ -drive file=/mnt/data/virt-images-big/ovmf.fedora.q35.img,format=qcow2,if=none,id=drive-scsi0-0-0-0,cache=writeback,discard=unmap,werror=enospc \ -device scsi-hd,bus=scsi0.0,channel=0,scsi-id=0,lun=0,drive=drive-scsi0-0-0-0,id=scsi0-0-0-0,bootindex=1 \ -drive file=/usr/share/OVMF/UefiShell.iso,format=raw,if=none,media=cdrom,id=drive-sata0-0-0,readonly=on,cache=writeback \ -device ide-cd,bus=ide.0,drive=drive-sata0-0-0,id=sata0-0-0 \ -netdev tap,fd=26,id=hostnet0,vhost=on,vhostfd=28 \ -device virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:29:80:ae,bus=pci.3,addr=0x0,rombar=0 \ -chardev pty,id=charserial0 \ -device isa-serial,chardev=charserial0,id=serial0 \ -chardev socket,id=charchannel0,path=/var/lib/libvirt/qemu/channel/target/domain-37-ovmf.fedora.q35/org.qemu.guest_agent.0,server,nowait \ -device virtserialport,bus=virtio-serial0.0,nr=1,chardev=charchannel0,id=channel0,name=org.qemu.guest_agent.0 \ -chardev spicevmc,id=charchannel1,name=vdagent \ -device virtserialport,bus=virtio-serial0.0,nr=2,chardev=charchannel1,id=channel1,name=com.redhat.spice.0 \ -device usb-tablet,id=input0,bus=usb.0,port=2 \ -spice port=5900,addr=127.0.0.1,disable-ticketing,streaming-video=off,seamless-migration=on \ -device VGA,id=video0,vgamem_mb=16,bus=pcie.0,addr=0x1 \ -device virtio-balloon-pci,id=balloon0,bus=pci.2,addr=0x4 \ -object rng-random,id=objrng0,filename=/dev/urandom \ -device virtio-rng-pci,rng=objrng0,id=rng0,max-bytes=1048576,period=1000,bus=pci.2,addr=0x3 \ -global isa-debugcon.iobase=0x402 \ -debugcon file:/tmp/ovmf.fedora.q35.log \ -global pcie-root-port.io-reserve=0 \ -global pcie-root-port.pref64-reserve=4G \ -s \ -msg timestamp=on Is this reproducible using the copy of OVMF that Fedora distributes? If not, can one of you attach or otherwise send me an OVMF binary that reproduces it? Yes, it reproduces with Fedora's most recent OVMF build (minimally -- I didn't check other OVMF builds from Fedora): edk2-20171011git92d07e4-2.fc28 https://koji.fedoraproject.org/koji/buildinfo?buildID=1000604 Please install the sub-package edk2-ovmf-20171011git92d07e4-2.fc28.noarch.rpm https://kojipkgs.fedoraproject.org//packages/edk2/20171011git92d07e4/2.fc28/noarch/edk2-ovmf-20171011git92d07e4-2.fc28.noarch.rpm and from it, please use the firmware binary /usr/share/edk2/ovmf/OVMF_CODE.secboot.fd (sha256sum: ae8c9dd92eac5aa5e16e72ab85b89d2f6f911ada0aa5e5812396509945fb2dc3.) Thanks! After re-reading the message of kernel commit 10af6235e0d3, I've now figured I should try booting the same (problematic) guest kernel with the "nopcid" cmdline parameter added. Indeed, with "nopcid" specified, the guest boots fine. ... Is there perhaps a problem with KVM's handling (or emulation?) of PCIDs? Debugging results so far: I added a bunch of printks calls, and we're dying some time after the first call to efi_call_virt(get_next_variable, ...) in efivar_init() when called by efivar_ssdt_load(). On inspection of the code, we seem to turn off interrupts *after* arch_efi_call_virt_setup(), which makes no sense to me. We do *not* want to take an interrupt while running with the funny EFI CR3. But fixing that doesn't seem to help. I can reproduce the problem with two cpus but not with only one cpu. I'm rather mystified as to what's actually happening, since efi->systab appears to be a bogus pointer. Maybe efi->systab only points anywhere when we're using EFI's CR3? I can easily imagine that something explodes if we enter SMM with CR3[11:0] != 0, but I'm skeptical that this is the actual problem we're seeing. I'm going to go out on a limb and suggest that the problem is that either QEMU is emulating SMM incorrectly when a non-trapping CPU is forced into SMM and PCID is in use or that OVMF has a bug. No one has reported this problem on real hardware... Paolo, I think that rsm_load_state_64() may be buggy. It loads CR3, then CR4 & ~PCIDE, then CR0, then CR4. But, if the emulation logic works like the a real CPU, setting CR4.PCIDE is bogus if CR3[11:0] != 0. If the emulation does *not* work like a real CPU but CR3 is parsed at load time, then we might have a bogus CR3 state when all is said and done. I would set CR3 to zero, then load CR4 & ~PCIDE, then CR0, then full CR4, and *then* CR3. (This may be busted on 32-bit PAE, too. Loading CR3 has side effects that depend on CR4.) FWIW, I confirm that using an OVMF build that does not have -D SMM_REQUIRE, the same guest boots fine (with PCID enabled, as it is by default). This can be checked e.g. by booting "OVMF_CODE.fd" instead of "OVMF_CODE.secboot.fd", from comment 3. For this to work, the QEMU cmdline option "-global driver=cfi.pflash01,property=secure,value=on" has to be removed as well. Thanks! Created attachment 261263 [details] tentative patch The right fix is probably to add an emulator callback that sets all the special registers (like KVM_SET_SREGS would do), but the attached patch should do it. Untested, will try it of course before posting it formally to LKML and kvm@vger.kernel.org. Lowering importance to "normal" -- symptoms are only present when using SMM, and even in that case, it can be worked around on both the host side ("-cpu MODEL,pcid=off" on the QEMU cmdline) and the guest side ("nopcid" kernel param). (In reply to Paolo Bonzini from comment #9) > Created attachment 261263 [details] > tentative patch > > The right fix is probably to add an emulator callback that sets all the > special registers (like KVM_SET_SREGS would do), but the attached patch > should do it. Untested, will try it of course before posting it formally to > LKML and kvm@vger.kernel.org. I locally rebuilt the kvm and kvm-intel modules only, with this patch applied, for my RHEL-7.4.z (3.10.0-693.11.1.el7.x86_64) host kernel. The patch applies cleanly: > patching file arch/x86/kvm/emulate.c > Hunk #1 succeeded at 2419 (offset 15 lines). > Hunk #2 succeeded at 2452 (offset 15 lines). > Hunk #3 succeeded at 2468 (offset 15 lines). > Hunk #4 succeeded at 2514 (offset 15 lines). > Hunk #5 succeeded at 2538 (offset 15 lines). > Hunk #6 succeeded at 2566 (offset 15 lines). There's one line with superfluous whitespace (right above the comment "In order to later set CR4.PCIDE..."): > warning: 1 line adds whitespace errors. The patch works; the guest boots fine. I also repeated my Linux guest tests from <https://github.com/tianocore/tianocore.github.io/wiki/Testing-SMM-with-QEMU,-KVM-and-libvirt#tests-to-perform-in-the-installed-guest-fedora-26-guest>. When you post the patch to LKML and kvm@vger.kernel.org, please add: Tested-by: Laszlo Ersek <lersek@redhat.com> Thank you both! Updating the BZ metadata to reflect that this report ultimately concerns the host side. Paolo's patch is on the lists: [PATCH] kvm: x86: fix RSM when PCID is non-zero http://mid.mail-archive.com/1513857398-47244-1-git-send-email-pbonzini@redhat.com Fixed in Paolo's commit fae1a3e775cc ("kvm: x86: fix RSM when PCID is non-zero", 2017-12-21). Part of v4.15-rc5. Thanks everyone! Greg Kroah-Hartman queued the patch for the following stable trees: 4.4, 4.9, 4.14. |