Bug 47451

Summary: need to re-load driver in guest to make a hot-plug VF work
Product: Virtualization Reporter: Jay Ren (yongjie.ren)
Component: kvmAssignee: virtualization_kvm
Status: CLOSED UNREPRODUCIBLE    
Severity: normal CC: alex.williamson
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: 3.5.0 Subsystem:
Regression: Yes Bisected commit-id:

Description Jay Ren 2012-09-13 09:21:40 UTC
Environment:
------------
Host OS (ia32/ia32e/IA64):ia32e
Guest OS (ia32/ia32e/IA64):ia32e
Guest OS Type (Linux/Windows):Linux (RHEL6u3)
kvm.git Commit:37e41afa97307a3e54b200a5c9179ada1632a844(master branch)
qemu-kvm Commit:28c3a9b197900c88f27b14f8862a7a15c00dc7f0(master branch)
Host Kernel Version:3.5.0-rc6  (Also exists in 3.6.0-rc3)
Hardware:Romley-EP (SandyBridge system)


Bug detailed description:
--------------------------
After hot plugging a VF to a Linux guest (e.g.RHEL6.3) in qemu monitor, the VF cannot work in the guest by. I need to remove the VF driver (e.g. igbvf, ixgbevf) and probe it again, then the VF can work in guest.
NIC: Intel 82599 NIC, Intel 82576 NIC

It needn't reload VF driver in hot-plug case when using an old kernel.
It's a regression in kernel. (commits are in kvm.git and qemu-kvm.git tree)
kvm      + qemu-kvm =result
37e41afa + 28c3a9b1 =bad
322728e5 + 28c3a9b1 =good

Note:
1. When assigning a VF in qemu-kvm command line (not hot-plug), VF can work fine after boot-up.
2. It's easier to reproduce this in guest with 512/1024MB memory and 1/2 vCPUs.
3. Can't always reproduce with 2048MB and 2vCPUs. (Not very stable.)

Reproduce steps:
----------------
1.start up a host with kvm
2.qemu-system-x86_64 -m 512 smp 2 –net none –hda /root/rhel6u3.img
3.switch to qemu monitor  (ctrl+Alt+2)
4.device_add pci-assign,host=02:10.0,id=mynic   (02:10.0 is VF's BDF number.)
5.switch to guest  (ctrl+Alt+1)
6.check network of the VF.  (it can't work)
7. remove VF driver in guest ('rmmod igbvf')
8. re-probe VF driver in guest ('modprobe igbvf')
9. check network of the VF. (It should work this time.)


Current result:
----------------
The VF cannot work in the guest by default. Need to re-load VF driver in guest.

Expected result:
----------------
VF works well in the guest by default after hot-plug.
Comment 1 Alex Williamson 2012-09-14 16:24:11 UTC
The steps to reproduce don't indicate what you've done in the host to prepare the device.  Is igbvf loaded in the host?  Is the vf attached to igbvf or pci-stub or nothing?  Can we narrow down the kvm.git commit range at all?  The one provided is over 12k commits covering v3.4-rc3 to v3.5-rc6.  Thanks
Comment 2 Jay Ren 2012-09-20 03:36:31 UTC
(In reply to comment #1)
> The steps to reproduce don't indicate what you've done in the host to prepare
> the device.  Is igbvf loaded in the host?  
No, I don't load igbvf driver in host.

>Is the vf attached to igbvf or pci-stub or nothing?  
VF is attached to 'pci-stub' in host.

>Can we narrow down the kvm.git commit range at all?  The
> one provided is over 12k commits covering v3.4-rc3 to v3.5-rc6.  Thanks
I did more testing.
Do you remember the bug #43328 ( VT-d/SR-IOV totally doesn't work in guest)?
Just use your fix commit for that bug, I'll meet this hot-plug issue.
Is there a chance your patch fixed one bug but introduced another one? :)

commit a76beb14123a69ca080f5a5425e28b786d62318d
Author: Alex Williamson <alex.williamson@redhat.com>
Date: Mon Jul 9 10:53:22 2012 -0600

    KVM: Fix device assignment threaded irq handler
Comment 3 Alex Williamson 2012-09-20 04:36:50 UTC
(In reply to comment #2)
> (In reply to comment #1)
> >Can we narrow down the kvm.git commit range at all?  The
> > one provided is over 12k commits covering v3.4-rc3 to v3.5-rc6.  Thanks
> I did more testing.
> Do you remember the bug #43328 ( VT-d/SR-IOV totally doesn't work in guest)?
> Just use your fix commit for that bug, I'll meet this hot-plug issue.
> Is there a chance your patch fixed one bug but introduced another one? :)
> 
> commit a76beb14123a69ca080f5a5425e28b786d62318d
> Author: Alex Williamson <alex.williamson@redhat.com>
> Date: Mon Jul 9 10:53:22 2012 -0600
> 
>     KVM: Fix device assignment threaded irq handler

Thanks for the narrowing it down.  It looks like perhaps that patch was ineffective at trying to keep us out of using IRQF_ONESHOT due to irq_setup_forced_threading() re-enabling it.  Does the problem go away if you change the two calls to request_threaded_irq() in that commit to use IRQF_NO_THREAD for the flag value in place of 0?
Comment 4 Jay Ren 2012-09-28 06:07:50 UTC
(In reply to comment #3)
> (In reply to comment #2)
> > (In reply to comment #1)
> > >Can we narrow down the kvm.git commit range at all?  The
> > > one provided is over 12k commits covering v3.4-rc3 to v3.5-rc6.  Thanks
> > I did more testing.
> > Do you remember the bug #43328 ( VT-d/SR-IOV totally doesn't work in
> guest)?
> > Just use your fix commit for that bug, I'll meet this hot-plug issue.
> > Is there a chance your patch fixed one bug but introduced another one? :)
> > 
> > commit a76beb14123a69ca080f5a5425e28b786d62318d
> > Author: Alex Williamson <alex.williamson@redhat.com>
> > Date: Mon Jul 9 10:53:22 2012 -0600
> > 
> >     KVM: Fix device assignment threaded irq handler
> 
> Thanks for the narrowing it down.  It looks like perhaps that patch was
> ineffective at trying to keep us out of using IRQF_ONESHOT due to
> irq_setup_forced_threading() re-enabling it.  Does the problem go away if you
> change the two calls to request_threaded_irq() in that commit to use
> IRQF_NO_THREAD for the flag value in place of 0?

No, replacing flag value with 'IRQF_NO_THREAD' can't make PCIe NIC hot-plug work.
Can you try with your commit "a76beb14123a6" ?
BTW, sometimes, this bug is not so stable. Using '-m 512 -smp 2' option for qemu-kvm commandline to start a RHEL6.x guest will make it very easy to reproduce.
Comment 5 Jay Ren 2012-12-10 03:00:01 UTC
I re-tested this bug against kvm.git next branch (commit:e6c7d321,kernel 3.7) and qemu-kvm.git master branch (commit:4d9367b7). I can't reproduce it now. It should have been fixed by some patches, but I don't know which patch fixed it. As this bug is unstable to reproduce, it's hard for me to do a bisect to find out the exact fix.
close this bug as unreproducible.