Description: ------------ Running KVM guests with virtio network interfaces, the guest will (probably in some - unidentified - circumstances) stop receiving packets. A tcpdump on the bugged interface will show only ARP requests being sent by the server and unanswered. Possible workarounds: Temporarily: - restart network interface in guest Permanent: - use e1000 network driver as replacement of virtio_net driver How to reproduce: ----------------- 1) start KVM guest: qemu-kvm -nodefaults -name vps -chroot /chroot -runas kvm -pidfile /var/run/kvm/vps.pid -vnc 1.2.3.4:0 -vga std --full-screen -smp 2 -m 1g -cpu host -mem-path /hugepages -mem-prealloc -kvm-shadow-memory 1g -enable-kvm -daemonize -rtc base=localtime,clock=host,driftfix=none -balloon virtio -net nic,model=virtio,vlan=0,macaddr=52:54:00:33:22:11 -net bridge,br=br0,vlan=0 -drive aio=native,index=0,media=disk,cache=none,if=virtio,file=vps.img -boot order=c,menu=off & 2) generate huge trafic (after few minutes virtio network card crashed): from host to guest: screen ping 10.8.7.2 -s 65507 screen iperf -s -i 1 -f M from guest to host: screen ping 10.8.7.1 -s 65507 screen iperf -c 10.8.7.2 -i 1 -f M 3) restart guest network interface and repeat step 2) until network crash again Above situation occurs on (my tests): host kernel 3.3+ qemu-kvm 1.0+ any guest Linux 2.6+ - 3.3+ kernel any guest Windows 7+ Same situation described from other users on some forums: http://bugs.centos.org/view.php?id=5526 http://serverfault.com/questions/362038/ qemu-kvm-virtual-machine-virtio-network-freeze-under-load (possible, not tested old patch) Please fix above crashing of virtio_net driver. Thank you for your time.
Also this sites described similar problems: https://bugzilla.redhat.com/show_bug.cgi?id=520119 http://www.mail-archive.com/scientific-linux-users@ listserv.fnal.gov/msg10661.html Let me know how could I help in solving above issue. Thank you for your time.
Rusty, could you have a look at, please ? Thank you for your time.
Hello, I am pretty sure I read about a very similar bug report recently, dunno if that was on kvm or qemu mailing list, or in a bugzilla entry like this. The guy said that there was a reproducer: running a network test between two affected virtual machines. Maybe that was netperf, not sure. Would reproduce the bug in a short time. He said kernel 3.0 (GUEST!) was the first he tested to have this problem, while much older guest kernels did not have this problem. I suggest Steve tries this reproducer, and tell if it works in his case. If that works, debugging would be much faster.
Thank you for information. On host I used last updated: ---------------------------- kernel from: git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git qemu-kvm from: git://git.kernel.org/pub/scm/virt/kvm/qemu-kvm.git seabios from: git://git.seabios.org/seabios.git used compiler, libs & other stuff: gcc-4.7.0 (feb 2012) glibc-2.14.1-r2 bridge-utils-1.5 iproute2-3.1.0 ethtool-3.2 On guest I can proceed this: ---------------------------- 1) test case one - kernel before <v3.0 = v2.6.39 - generate huge traffic betweeen host and guest - make short analyse - create report & append to this ticket 2) test case two - kernel =v3.0 - generate huge traffic betweeen host and guest - make short analyse - create report & append to this ticket 3) test case three - last kernel =v3.3-rc5+ - generate huge traffic betweeen host and guest - make short analyse - create report & append to this ticket In all test cases host system will be the same (kernel and other stuff). Described test cases will be proceed today 01/march/2012 during evening hours. I would like to help solve above problem ASAP. Let me know how could I help in any time. Thank you for your time.
Results of my test: =================== In all test cases host configuration is the same: ------------------------------------------------- kernel: latest 3.3-rc5+ compiler: gcc (4.7.0-pre9999 20120225 rev. 184573) glibc-2.14.1-r2 bridge-utils-1.5 iproute2-3.1.0 ethtool-3.2 other system stuff: all latest from their git repos util-linux, net-tools, kmod, udev, seabios, qemu-kvm In all test cases guest configuration except kernel is the same: ---------------------------------------------------------------- compiler: gcc 4.6.2 glibc-2.14.1-r2 iproute2-3.1.0 ethtool-3.2 other system stuff: all latest from their git repos util-linux, net-tools, kmod, udev 1) guest kernel =v2.6.39 - works fine more than 5min 2) guest kernel =v3.0 - virtio network goes down in less then 1min of test 3) guest kernel latest v3.3-rc5+ - virtio network goes down in less then 1min of test I continue with tracking in kernel tags after v2.6.39: 4) guest kernel =v3.0-rc1 - works fine more than 5min 5) guest kernel =v3.0-rc5 - virtio network goes down in less then 1min of test 6) guest kernel =v3.0-rc3 - works fine more than 5min I shortly look at =v3.0-rc5, problem could be somewhere in this tag. Later I would like to locate problematic commit with git bisecting and after I report my results.
My results: =========== git bisect log: --------------- git bisect start # good: [2c53b436a30867eb6b47dd7bab23ba638d1fb0d2] Linux 3.0-rc3 git bisect good 2c53b436a30867eb6b47dd7bab23ba638d1fb0d2 # bad: [b0af8dfdd67699e25083478c63eedef2e72ebd85] Linux 3.0-rc5 git bisect bad b0af8dfdd67699e25083478c63eedef2e72ebd85 # bad: [c01ad4081939f91ebd7277e8e731fd90ceb3e632] Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input git bisect bad c01ad4081939f91ebd7277e8e731fd90ceb3e632 # bad: [f4ef084226f82ca923bf0a2658bb2876bd215ec1] Merge branch 'fbdev-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/lethal/fbdev-3.x git bisect bad f4ef084226f82ca923bf0a2658bb2876bd215ec1 # bad: [bd5dc17be87b3a3073d50b23802647db3ae3fa8e] uts: make default hostname configurable, rather than always using "(none)" git bisect bad bd5dc17be87b3a3073d50b23802647db3ae3fa8e # bad: [f39e8409955fad210a9a7169cc53c4c18daaef3a] Merge branch 'drm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/airlied/drm-2.6 git bisect bad f39e8409955fad210a9a7169cc53c4c18daaef3a # bad: [81eb3dd8438802138ac9ce12428632f35562c060] Merge branch 'for-linus' of git://neil.brown.name/md git bisect bad 81eb3dd8438802138ac9ce12428632f35562c060 # good: [9864c0053d3da4c5731ac8a6c4835179310bd40a] md: Using poll /proc/mdstat can monitor the events of adding a spare disks git bisect good 9864c0053d3da4c5731ac8a6c4835179310bd40a # bad: [9b2dc8b665932a8e681a7ab3237f60475e75e161] md/raid5: fix raid5_set_bi_hw_segments git bisect bad 9b2dc8b665932a8e681a7ab3237f60475e75e161 # bad: [27d5ea04d08bea37bf651090e5f3c573d2390df8] md/bitmap: use proper accessor macro git bisect bad 27d5ea04d08bea37bf651090e5f3c573d2390df8 # bad: [01393f3d5836b7d62e925e6f4658a7eb22b83a11] md: check ->hot_remove_disk when removing disk git bisect bad 01393f3d5836b7d62e925e6f4658a7eb22b83a11 git bisect message: ------------------- 01393f3d5836b7d62e925e6f4658a7eb22b83a11 is the first bad commit commit 01393f3d5836b7d62e925e6f4658a7eb22b83a11 Author: Namhyung Kim <namhyung@gmail.com> Date: Thu Jun 9 11:42:54 2011 +1000 md: check ->hot_remove_disk when removing disk Check pers->hot_remove_disk instead of pers->hot_add_disk in slot_store() during disk removal. The linear personality only has ->hot_add_disk and no ->hot_remove_disk, so that removing disk in the array resulted to following kernel bug: $ sudo mdadm --create /dev/md0 --level=linear --raid-devices=4 /dev/loop[0-3] $ echo none | sudo tee /sys/block/md0/md/dev-loop2/slot BUG: unable to handle kernel NULL pointer dereference at (null) IP: [< (null)>] (null) PGD c9f5d067 PUD 8575a067 PMD 0 Oops: 0010 [#1] SMP CPU 2 Modules linked in: linear loop bridge stp llc kvm_intel kvm asus_atk0110 sr_mod cdrom sg Pid: 10450, comm: tee Not tainted 3.0.0-rc1-leonard+ #173 System manufacturer System Product Name/P5G41TD-M PRO RIP: 0010:[<0000000000000000>] [< (null)>] (null) RSP: 0018:ffff880085757df0 EFLAGS: 00010282 RAX: ffffffffa00168e0 RBX: ffff8800d1431800 RCX: 000000000000006e RDX: 0000000000000001 RSI: 0000000000000002 RDI: ffff88008543c000 RBP: ffff880085757e48 R08: 0000000000000002 R09: 000000000000000a R10: 0000000000000000 R11: ffff88008543c2e0 R12: 00000000ffffffff R13: ffff8800b4641000 R14: 0000000000000005 R15: 0000000000000000 FS: 00007fe8c9e05700(0000) GS:ffff88011fa00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: 0000000000000000 CR3: 00000000b4502000 CR4: 00000000000406e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Process tee (pid: 10450, threadinfo ffff880085756000, task ffff8800c9f08000) Stack: ffffffff8138496a ffff8800b4641000 ffff88008543c268 0000000000000000 ffff8800b4641000 ffff88008543c000 ffff8800d1431868 ffffffff81a78a90 ffff8800b4641000 ffff88008543c000 ffff8800d1431800 ffff880085757e98 Call Trace: [<ffffffff8138496a>] ? slot_store+0xaa/0x265 [<ffffffff81384bae>] rdev_attr_store+0x89/0xa8 [<ffffffff8115a96a>] sysfs_write_file+0x108/0x144 [<ffffffff81106b87>] vfs_write+0xb1/0x10d [<ffffffff8106e6c0>] ? trace_hardirqs_on_caller+0x111/0x135 [<ffffffff81106cac>] sys_write+0x4d/0x77 [<ffffffff814fe702>] system_call_fastpath+0x16/0x1b Code: Bad RIP value. RIP [< (null)>] (null) RSP <ffff880085757df0> CR2: 0000000000000000 ---[ end trace ba5fc64319a826fb ]--- Signed-off-by: Namhyung Kim <namhyung@gmail.com> Cc: stable@kernel.org Signed-off-by: NeilBrown <neilb@suse.de> :040000 040000 e3d19f53113f5bb5faef422958607e4f0131d235 eab9913143e1c027df693bf1fa57475da77bd36e M drivers How could I help now ? Thank you for your time.
Git bisect result is not network related? Highly unlikely! The "bug reproducer" technique you are using might not be reliable (sorry if that was the one I suggested). Try to make longer tests...? I suggest you retest the bisect points which you marked good, some of them might instead be bad. Thanks for your time in finding this bug: this is a very important bug!
Thinking again, this could also be a problem of qemu vs qemu-kvm merges in git. Have a look at these: http://permalink.gmane.org/gmane.comp.emulators.kvm.devel/82851 http://lists.gnu.org/archive/html/qemu-devel/2012-02/msg04047.html and all subsequent posts in this thread Git-bisecting in qemu and kvm is becoming really not straightforward now that they merge from one another, you find a lot of interleaved good and bad commits. Git bisect is for bisecting a situation in which all commits before something are good, and all commits after something are bad. It doesn't seem to be the case anymore with qemu and qemu-kvm.
Thank you for information. Now it works fine with last kernel 3.3-rc5+ on guest machine. It seems to be: leak in virtio balloon: Probably it was bug in old commit: author: Amit Shah <amit.shah@redhat.com> commiter: Rusty Russell <rusty@rustcorp.com.au> e562966dbaf49e7804097cd991e5d3a8934fc148 and fixed by this commit: author: Amit Shah <amit.shah@redhat.com> commiter: Rusty Russell <rusty@rustcorp.com.au> 4eb05d562ea1ea34ff607aa877aefbf05b21c140 Please apply above fix commit to all older kernel versions. Thank you for your time.
Results of longer tests: Above problem occurs after longer time, more than 10 minutes. I try to locate problem with new git bisecting. Thank you for your time.
I found bad commit. git bisect log: --------------- git bisect start # bad: [550cf00dbc8ee402bef71628cb71246493dd4500] Merge tag 'mmc-fixes-for-3.3' of git://git.kernel.org/pub/scm/linux/kernel/git/cjb/mmc git bisect bad 550cf00dbc8ee402bef71628cb71246493dd4500 # good: [61c4f2c81c61f73549928dfd9f3e8f26aa36a8cf] Linux 2.6.39 git bisect good 61c4f2c81c61f73549928dfd9f3e8f26aa36a8cf # bad: [8a9ea3237e7eb5c25f09e429ad242ae5a3d5ea22] Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next git bisect bad 8a9ea3237e7eb5c25f09e429ad242ae5a3d5ea22 # bad: [95a943c162d74b20d869917bdf5df11293c35b63] Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next-2.6 into for-davem git bisect bad 95a943c162d74b20d869917bdf5df11293c35b63 # good: [98b98d316349e9a028e632629fe813d07fa5afdd] Merge branch 'drm-core-next' of git://git.kernel.org/pub/scm/linux/kernel/git/airlied/drm-2.6 git bisect good 98b98d316349e9a028e632629fe813d07fa5afdd # bad: [1f6e44a6dc21a5d2abb068063acbbf64f8cee548] pxa168_eth: enable transmit time stamping. git bisect bad 1f6e44a6dc21a5d2abb068063acbbf64f8cee548 # good: [19de85ef574c3a2182e3ccad9581805052f14946] bitops: add #ifndef for each of find bitops git bisect good 19de85ef574c3a2182e3ccad9581805052f14946 # good: [c320afe965bf3f857249d223801d8f2fc95615c2] Blackfin: debug-mmrs: include RSI_PID[4567] MMRs git bisect good c320afe965bf3f857249d223801d8f2fc95615c2 # bad: [23c79d31a3dd2602ee1a5ff31303b2d7a2d3c159] Merge git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi-rc-fixes-2.6 git bisect bad 23c79d31a3dd2602ee1a5ff31303b2d7a2d3c159 # good: [cd1acdf1723d71b28175f95b04305f1cc74ce363] Merge branch 'pnfs-submit' of git://git.open-osd.org/linux-open-osd git bisect good cd1acdf1723d71b28175f95b04305f1cc74ce363 # bad: [cd4ecf877a4d629c38571405fd649077c12dec50] Merge branch 'rmobile-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/lethal/sh-2.6 git bisect bad cd4ecf877a4d629c38571405fd649077c12dec50 # bad: [5c6cce92bc8aee751aafe82c5d9caf7553226a3d] Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/security-testing-2.6 git bisect bad 5c6cce92bc8aee751aafe82c5d9caf7553226a3d # bad: [8ea8cf89e19aeb596b818ee5f2bec8a8b0586b60] vhost: support event index git bisect bad 8ea8cf89e19aeb596b818ee5f2bec8a8b0586b60 # good: [bc805a03c26e1e25171bc627c6264553d27f746c] lguest: fix up compilation after move git bisect good bc805a03c26e1e25171bc627c6264553d27f746c # good: [bf50e69f63d21091e525185c3ae761412be0ba72] virtio balloon: kill tell-host-first logic git bisect good bf50e69f63d21091e525185c3ae761412be0ba72 # good: [770b31a85e000b0194974922f238a30ade4246b6] virtio: event index interface git bisect good 770b31a85e000b0194974922f238a30ade4246b6 # bad: [a5c262c5fd83ece01bd649fb08416c501d4c59d7] virtio_ring: support event idx feature git bisect bad a5c262c5fd83ece01bd649fb08416c501d4c59d7 # good: [bf7035bf20563a6cadcb9e870406e7b21daf5e30] virtio ring: inline function to check for events git bisect good bf7035bf20563a6cadcb9e870406e7b21daf5e30 git bisect message: =================== a5c262c5fd83ece01bd649fb08416c501d4c59d7 is the first bad commit commit a5c262c5fd83ece01bd649fb08416c501d4c59d7 Author: Michael S. Tsirkin <mst@redhat.com> Date: Fri May 20 02:10:44 2011 +0300 virtio_ring: support event idx feature Support for the new event idx feature: 1. When enabling interrupts, publish the current avail index value to the host to get interrupts on the next update. 2. Use the new avail_event feature to reduce the number of exits from the guest. Simple test with the simulator: [virtio]# time ./virtio_test spurious wakeus: 0x7 real 0m0.169s user 0m0.140s sys 0m0.019s [virtio]# time ./virtio_test --no-event-idx spurious wakeus: 0x11 real 0m0.649s user 0m0.295s sys 0m0.335s Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> :040000 040000 933903414419858cf7402aa3fb8c3f675d6ab7cc 0ed603da4671eef88e0702e6438e903b56688b62 M drivers I found bug in include/linux/virtio_ring.h: =========================================== virtio: event index interface author Michael S. Tsirkin <mst@redhat.com> Thu, 19 May 2011 23:10:17 +0000 (02:10 +0300) committer Rusty Russell <rusty@rustcorp.com.au> Mon, 30 May 2011 01:44:14 +0000 (10:44 +0930) commit 770b31a85e000b0194974922f238a30ade4246b6 tree eed81e23f3116858b49af76bcc5831c38662de96 tree | snapshot parent a1b383870a28cfbd1657d4922c0fafc634a62ebd commit | diff virtio: event index interface Define a new feature bit for the guest and host to utilize an event index (like Xen) instead if a flag bit to enable/disable interrupts and kicks. Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Proposal for fix - wrong order of elements in structs: ------------------------------------------------------ struct vring_desc { /* Address (guest-physical). */ @@ -106,6 +112,7 @@ struct vring { * __u16 avail_flags; * __u16 avail_idx; + * __u16 used_event_idx; * __u16 available[num]; - * __u16 used_event_idx; * * // Padding to the next align boundary. * char pad[]; @@ -114,8 +121,14 @@ struct vring { * __u16 used_flags; * __u16 used_idx; + * __u16 avail_event_idx; * struct vring_used_elem used[num]; - * __u16 avail_event_idx; * }; Also double check macros: vring_used_event vring_avail_event Please fix this issue. Thank you for your time.
(In reply to comment #11) > I found bad commit. > > git bisect log: > --------------- > git bisect start > # bad: [550cf00dbc8ee402bef71628cb71246493dd4500] Merge tag > 'mmc-fixes-for-3.3' > of git://git.kernel.org/pub/scm/linux/kernel/git/cjb/mmc > git bisect bad 550cf00dbc8ee402bef71628cb71246493dd4500 > # good: [61c4f2c81c61f73549928dfd9f3e8f26aa36a8cf] Linux 2.6.39 > git bisect good 61c4f2c81c61f73549928dfd9f3e8f26aa36a8cf > # bad: [8a9ea3237e7eb5c25f09e429ad242ae5a3d5ea22] Merge > git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next > git bisect bad 8a9ea3237e7eb5c25f09e429ad242ae5a3d5ea22 > # bad: [95a943c162d74b20d869917bdf5df11293c35b63] Merge branch 'master' of > git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next-2.6 into > for-davem > git bisect bad 95a943c162d74b20d869917bdf5df11293c35b63 > # good: [98b98d316349e9a028e632629fe813d07fa5afdd] Merge branch > 'drm-core-next' > of git://git.kernel.org/pub/scm/linux/kernel/git/airlied/drm-2.6 > git bisect good 98b98d316349e9a028e632629fe813d07fa5afdd > # bad: [1f6e44a6dc21a5d2abb068063acbbf64f8cee548] pxa168_eth: enable transmit > time stamping. > git bisect bad 1f6e44a6dc21a5d2abb068063acbbf64f8cee548 > # good: [19de85ef574c3a2182e3ccad9581805052f14946] bitops: add #ifndef for > each > of find bitops > git bisect good 19de85ef574c3a2182e3ccad9581805052f14946 > # good: [c320afe965bf3f857249d223801d8f2fc95615c2] Blackfin: debug-mmrs: > include RSI_PID[4567] MMRs > git bisect good c320afe965bf3f857249d223801d8f2fc95615c2 > # bad: [23c79d31a3dd2602ee1a5ff31303b2d7a2d3c159] Merge > git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi-rc-fixes-2.6 > git bisect bad 23c79d31a3dd2602ee1a5ff31303b2d7a2d3c159 > # good: [cd1acdf1723d71b28175f95b04305f1cc74ce363] Merge branch 'pnfs-submit' > of git://git.open-osd.org/linux-open-osd > git bisect good cd1acdf1723d71b28175f95b04305f1cc74ce363 > # bad: [cd4ecf877a4d629c38571405fd649077c12dec50] Merge branch > 'rmobile-fixes-for-linus' of > git://git.kernel.org/pub/scm/linux/kernel/git/lethal/sh-2.6 > git bisect bad cd4ecf877a4d629c38571405fd649077c12dec50 > # bad: [5c6cce92bc8aee751aafe82c5d9caf7553226a3d] Merge branch 'for-linus' of > git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/security-testing-2.6 > git bisect bad 5c6cce92bc8aee751aafe82c5d9caf7553226a3d > # bad: [8ea8cf89e19aeb596b818ee5f2bec8a8b0586b60] vhost: support event index > git bisect bad 8ea8cf89e19aeb596b818ee5f2bec8a8b0586b60 > # good: [bc805a03c26e1e25171bc627c6264553d27f746c] lguest: fix up compilation > after move > git bisect good bc805a03c26e1e25171bc627c6264553d27f746c > # good: [bf50e69f63d21091e525185c3ae761412be0ba72] virtio balloon: kill > tell-host-first logic > git bisect good bf50e69f63d21091e525185c3ae761412be0ba72 > # good: [770b31a85e000b0194974922f238a30ade4246b6] virtio: event index > interface > git bisect good 770b31a85e000b0194974922f238a30ade4246b6 > # bad: [a5c262c5fd83ece01bd649fb08416c501d4c59d7] virtio_ring: support event > idx feature > git bisect bad a5c262c5fd83ece01bd649fb08416c501d4c59d7 > # good: [bf7035bf20563a6cadcb9e870406e7b21daf5e30] virtio ring: inline > function > to check for events > git bisect good bf7035bf20563a6cadcb9e870406e7b21daf5e30 > > > git bisect message: > =================== > a5c262c5fd83ece01bd649fb08416c501d4c59d7 is the first bad commit > commit a5c262c5fd83ece01bd649fb08416c501d4c59d7 > Author: Michael S. Tsirkin <mst@redhat.com> > Date: Fri May 20 02:10:44 2011 +0300 > > virtio_ring: support event idx feature > > Support for the new event idx feature: > 1. When enabling interrupts, publish the current avail index > value to the host to get interrupts on the next update. > 2. Use the new avail_event feature to reduce the number > of exits from the guest. > > Simple test with the simulator: > > [virtio]# time ./virtio_test > spurious wakeus: 0x7 > > real 0m0.169s > user 0m0.140s > sys 0m0.019s > [virtio]# time ./virtio_test --no-event-idx > spurious wakeus: 0x11 > > real 0m0.649s > user 0m0.295s > sys 0m0.335s > > Signed-off-by: Michael S. Tsirkin <mst@redhat.com> > Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> > > :040000 040000 933903414419858cf7402aa3fb8c3f675d6ab7cc > 0ed603da4671eef88e0702e6438e903b56688b62 M drivers > > > > I found bug in include/linux/virtio_ring.h: > =========================================== > > virtio: event index interface > author Michael S. Tsirkin <mst@redhat.com> > Thu, 19 May 2011 23:10:17 +0000 (02:10 +0300) > committer Rusty Russell <rusty@rustcorp.com.au> > Mon, 30 May 2011 01:44:14 +0000 (10:44 +0930) > commit 770b31a85e000b0194974922f238a30ade4246b6 > tree eed81e23f3116858b49af76bcc5831c38662de96 tree | snapshot > parent a1b383870a28cfbd1657d4922c0fafc634a62ebd commit | diff > virtio: event index interface > > Define a new feature bit for the guest and host to utilize > an event index (like Xen) instead if a flag bit to enable/disable > interrupts and kicks. > > Signed-off-by: Michael S. Tsirkin <mst@redhat.com> > Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> > > > Proposal for fix - wrong order of elements in structs: > ------------------------------------------------------ > > struct vring_desc { > /* Address (guest-physical). */ > @@ -106,6 +112,7 @@ struct vring { > * __u16 avail_flags; > * __u16 avail_idx; > + * __u16 used_event_idx; > * __u16 available[num]; > - * __u16 used_event_idx; > * > * // Padding to the next align boundary. > * char pad[]; > @@ -114,8 +121,14 @@ struct vring { > * __u16 used_flags; > * __u16 used_idx; > + * __u16 avail_event_idx; > * struct vring_used_elem used[num]; > - * __u16 avail_event_idx; > * }; > > > Also double check macros: > vring_used_event > vring_avail_event > > Please fix this issue. > > Thank you for your time. Hi Steve, does this patch fixes your issue? The patch looks not correct as the event index were supposed in the end of the vring, see virtio spec: http://ozlabs.org/~rusty/virtio-spec/virtio-0.9.4.pdf And please make sure the tree you used contains the fixes of commit a72caae21803b74e04e2afda5e035f149d4ea118 and commit 4dbc5d9f4f791df8a5879f4a655f517adc7f56d1
Hello. I start testing from latest master branch v3.3-rc6+ on both: host, guest. During all test host has the same kernel & other stuff, on guest i changed only kernel versions by git bisecting. I don't change any code, my proposal is only tip and could be wrong. I suppose that I provided sufficient information to detect bug in code. Answer to your question about containing the fixes: --------------------------------------------------- git branch --contains=a72caae21803b74e04e2afda5e035f149d4ea118 * master git branch --contains=4dbc5d9f4f791df8a5879f4a655f517adc7f56d1 * master Let me know how could I help you (when needed) to fix this issue as soon as possible. Thank you for your time.
Please could someone have a look on this issue ? Thank you for your time.
(In reply to comment #14) > Please could someone have a look on this issue ? > > Thank you for your time. Hello Steve: According to you bisect result, maybe the issue of the event index in qemu. Looks like you didn't use vhost according to your qemu-kvm cli, could you please try to see whether the 3.3-rc6 guest works under vhost backend (-netdev tap,vhost=on,id=XXX -device virtio-net-pci,netdev=XXX ... ). And if it works, could you please then try 3.3-rc6 guest without vhost and disable event index feature by -device virtio-net-pci,event_idx=off ? thanks
Thank you for information. Today I test v3.3-rc6+ from today morning: the same version on host and guest also latest seabios, qemu-kvm, net-tools, util-linux, kmod, udev. Now it works correctly after few hours under huge network traffic load both directions in paralell without any network issue. Bellow are my arguments for tested guest running on testing host: ----------------------------------------------------------------- ${VPS} -nodefaults -name ${VPS_NAME} -chroot ${VPS_CHROOT} -runas ${VPS_USER} -pidfile ${VPS_PIDFILE} -vnc ${VPS_WAN}:0 -vga std --full-screen -smp ${VPS_CPU} -m ${VPS_MEM} -cpu host -mem-path /hugepages -mem-prealloc -enable-kvm -daemonize -rtc base=localtime,clock=host,driftfix=none -balloon ${VPS_BALLOON} -netdev tap,id=nic01,ifname=${VPS_NET_IF_NAME}, script=/etc/kvm/kvm-ifup,downscript=/etc/kvm/kvm/ifdown, vnet_hdr=on,vhost=on,vhostforce=off -device virtio-net-pci,netdev=nic01,mac=${VPS_MAC} -drive aio=native,index=0,media=disk,cache=none, if=${VPS_DRIVE_IF_TYPE},file=${VPS_IMG} -boot order=c,menu=off & lsmod on guest (lsmod | grep virtio): virtio_net 12477 0 virtio_balloon 4216 0 virtio_blk 6313 3 virtio_pci 6983 0 virtio 3282 4 virtio_pci,virtio_blk,virtio_balloon,virtio_net virtio_ring 4154 4 virtio_pci,virtio_blk,virtio_balloon,virtio_net lspci on guest (lspci): 00:00.0 Host bridge: Intel Corporation 440FX - 82441FX PMC [Natoma] (rev 02) 00:01.0 ISA bridge: Intel Corporation 82371SB PIIX3 ISA [Natoma/Triton II] 00:01.1 IDE interface: Intel Corporation 82371SB PIIX3 IDE [Natoma/Triton II] 00:01.3 Bridge: Intel Corporation 82371AB/EB/MB PIIX4 ACPI (rev 03) 00:02.0 VGA compatible controller: Device 1234:1111 00:03.0 RAM memory: Red Hat, Inc Virtio memory balloon 00:04.0 Ethernet controller: Red Hat, Inc Virtio network device 00:05.0 SCSI storage controller: Red Hat, Inc Virtio block device I try to test older guest kernels with above setup. Thank you for your time & cooperation. Have a nice day.
Steve, thank you for finding this bug. Is there a chance you could test without vhost but with the event_idx=off ? This is the xml line in case you use libvirt: <interface ....... <driver name="qemu" event_idx="off"/> </interface> That would be an effective workaround until we get a fixed qemu. Thanks for your time.
Hi all: I suspect the issue were fixed by this commit: http://git.kernel.org/?p=linux/kernel/git/davem/net-next.git;a=commitdiff;h=4b727361f0bc7ee7378298941066d8aa15023ffb;hp=e1ac50f64691de9a095ac5d73cb8ac73d3d17dba
Hi, I have tested under Ubuntu 11.10 Server with Kernel 3.0.0-12 only with vhost on and it works well so far.