Bug 38672

Summary: KVM guest boot crashed
Product: Virtualization Reporter: Steve (stefan.bosak)
Component: kvmAssignee: virtualization_kvm
Status: CLOSED INVALID    
Severity: high CC: avi, florian, maciej.rutecki, rjw
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: 3.0.0-rc5+ Subsystem:
Regression: Yes Bisected commit-id:
Bug Depends on:    
Bug Blocks: 36912    

Description Steve 2011-07-02 06:56:16 UTC
Windows Server 2008 R2 KVM guest crashed during boot process.
This situation also occur on other linux based guests.

Bug is in: git://git.kernel.org/pub/scm/virt/kvm/qemu-kvm.git
not in: git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6.git.

Qemu-kvm repository checkouted before 9 days works well.

Problem could be in this:

author	Jan Kiszka <jan.kiszka@siemens.com>	
	Mon, 27 Jun 2011 10:23:35 +0000 (12:23 +0200)
committer	Avi Kivity <avi@redhat.com>	
	Tue, 28 Jun 2011 08:20:08 +0000 (11:20 +0300)
commit	bcd4f22796ebda2934a980060ea704ebedb46173
tree	f16fe58d2d4120c7b94f23ddef3afb61beb30dfc	tree | snapshot
parent	59539c913383fdd3350681301b44f02fa7ee2757	commit | diff


author	Jan Kiszka <jan.kiszka@siemens.com>	
	Mon, 27 Jun 2011 10:22:28 +0000 (12:22 +0200)
committer	Avi Kivity <avi@redhat.com>	
	Tue, 28 Jun 2011 08:18:58 +0000 (11:18 +0300)
commit	59539c913383fdd3350681301b44f02fa7ee2757
tree	bfdf23a13004d08d04589d02d3c4c754da3dd076	tree | snapshot
parent	b7496707af10ce2827d0803c9e46ca8ddc543716	commit | diff
Comment 1 Anonymous Emailer 2011-07-02 08:24:18 UTC
Reply-To: jan.kiszka@web.de

On 2011-07-02 08:56, bugzilla-daemon@bugzilla.kernel.org wrote:
> https://bugzilla.kernel.org/show_bug.cgi?id=38672
> 
>            Summary: KVM guest boot crashed
>            Product: Virtualization
>            Version: unspecified
>     Kernel Version: 3.0.0-rc5+
>           Platform: All
>         OS/Version: Linux
>               Tree: Mainline
>             Status: NEW
>           Severity: high
>           Priority: P1
>          Component: kvm
>         AssignedTo: virtualization_kvm@kernel-bugs.osdl.org
>         ReportedBy: stefan.bosak@gmail.com
>         Regression: Yes
> 
> 
> Windows Server 2008 R2 KVM guest crashed during boot process.
> This situation also occur on other linux based guests.

What other Linux guests precisely? None of the Windows and Linux guest I
have around expose this problem.

What is your qemu command line?

> 
> Bug is in: git://git.kernel.org/pub/scm/virt/kvm/qemu-kvm.git
> not in: git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6.git.
> 
> Qemu-kvm repository checkouted before 9 days works well.
> 
> Problem could be in this:
> 
> author    Jan Kiszka <jan.kiszka@siemens.com>    
>     Mon, 27 Jun 2011 10:23:35 +0000 (12:23 +0200)
> committer    Avi Kivity <avi@redhat.com>    
>     Tue, 28 Jun 2011 08:20:08 +0000 (11:20 +0300)
> commit    bcd4f22796ebda2934a980060ea704ebedb46173
> tree    f16fe58d2d4120c7b94f23ddef3afb61beb30dfc    tree | snapshot
> parent    59539c913383fdd3350681301b44f02fa7ee2757    commit | diff
> 
> 
> author    Jan Kiszka <jan.kiszka@siemens.com>    
>     Mon, 27 Jun 2011 10:22:28 +0000 (12:22 +0200)
> committer    Avi Kivity <avi@redhat.com>    
>     Tue, 28 Jun 2011 08:18:58 +0000 (11:18 +0300)
> commit    59539c913383fdd3350681301b44f02fa7ee2757
> tree    bfdf23a13004d08d04589d02d3c4c754da3dd076    tree | snapshot
> parent    b7496707af10ce2827d0803c9e46ca8ddc543716    commit | diff
> 

Can you bisect which change precisely introduced the regression?

Thanks,
Jan
Comment 2 Steve 2011-07-02 09:04:56 UTC
(In reply to comment #1)
> Reply-To: jan.kiszka@web.de
> 
> On 2011-07-02 08:56, bugzilla-daemon@bugzilla.kernel.org wrote:
> > https://bugzilla.kernel.org/show_bug.cgi?id=38672
> > 
> >            Summary: KVM guest boot crashed
> >            Product: Virtualization
> >            Version: unspecified
> >     Kernel Version: 3.0.0-rc5+
> >           Platform: All
> >         OS/Version: Linux
> >               Tree: Mainline
> >             Status: NEW
> >           Severity: high
> >           Priority: P1
> >          Component: kvm
> >         AssignedTo: virtualization_kvm@kernel-bugs.osdl.org
> >         ReportedBy: stefan.bosak@gmail.com
> >         Regression: Yes
> > 
> > 
> > Windows Server 2008 R2 KVM guest crashed during boot process.
> > This situation also occur on other linux based guests.
> 
> What other Linux guests precisely? None of the Windows and Linux guest I
> have around expose this problem.
I have more guests typeson the same server:
MS Windows Server 2008 R2 (latest updates)
Ubuntu 11.04 (2.6.38-10-virtual)
Debian wheezy/sid (2.6.38-rc4-git4-vs2.3.0.37-rc4)
Gentoo (2.6.38-gentoo-r7)
> 
> What is your qemu command line?

Example of guest - OS Gentoo Linux:

/usr/bin/qemu-system-x86_64 --enable-kvm -name vps-25-gentoo -chroot /vservers1 -runas kvm -pidfile /var/run/kvm/vps-25-gentoo.pid -vnc a.b.c.d:0 -vga std --full-screen -smp 2 -m 12G -cpu host -mem-path /hugepages -mem-prealloc -kvm-shadow-memory 12G -daemonize -tdf -localtime -balloon virtio -net nic,model=virtio,vlan=0,macaddr=XX:XX:XX:XX:XX:XX -net tap,vhost=on,vlan=0,ifname=qtap0,script=/etc/kvm/kvm-ifup,downscript=/etc/kvm/kvm-ifdown -drive aio=native,index=0,media=disk,cache=writeback,if=virtio,boot=on,file=/vservers1/vps-25-gentoo.img -boot c

> 
> > 
> > Bug is in: git://git.kernel.org/pub/scm/virt/kvm/qemu-kvm.git
> > not in:
> git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6.git.
> > 
> > Qemu-kvm repository checkouted before 9 days works well.
> > 
> > Problem could be in this:
> > 
> > author    Jan Kiszka <jan.kiszka@siemens.com>    
> >     Mon, 27 Jun 2011 10:23:35 +0000 (12:23 +0200)
> > committer    Avi Kivity <avi@redhat.com>    
> >     Tue, 28 Jun 2011 08:20:08 +0000 (11:20 +0300)
> > commit    bcd4f22796ebda2934a980060ea704ebedb46173
> > tree    f16fe58d2d4120c7b94f23ddef3afb61beb30dfc    tree | snapshot
> > parent    59539c913383fdd3350681301b44f02fa7ee2757    commit | diff
> > 
> > 
> > author    Jan Kiszka <jan.kiszka@siemens.com>    
> >     Mon, 27 Jun 2011 10:22:28 +0000 (12:22 +0200)
> > committer    Avi Kivity <avi@redhat.com>    
> >     Tue, 28 Jun 2011 08:18:58 +0000 (11:18 +0300)
> > commit    59539c913383fdd3350681301b44f02fa7ee2757
> > tree    bfdf23a13004d08d04589d02d3c4c754da3dd076    tree | snapshot
> > parent    b7496707af10ce2827d0803c9e46ca8ddc543716    commit | diff
> > 
> 
> Can you bisect which change precisely introduced the regression?
> 
Yes, of course, i'm working on this now.
> Thanks,
> Jan

Thank you for your time.
Comment 3 Steve 2011-07-03 00:48:13 UTC
Here is result:

6506e4f995967b1a48cc34418c77b318df92ce35 is the first bad commit
commit 6506e4f995967b1a48cc34418c77b318df92ce35
Author: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Date:   Thu May 19 18:35:44 2011 +0100

    xen: remove xen_map_block and xen_unmap_block
    
    Replace xen_map_block with qemu_map_cache with the appropriate locking
    and size parameters.
    Replace xen_unmap_block with qemu_invalidate_entry.
    
    Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
    Signed-off-by: Alexander Graf <agraf@suse.de>

:100644 100644 01f33bb2bca8ca69ffa03ca5170d1fce3ffd2fb4 e11c1dd97a62669255a35d1628f24fc4adf538fb M      exec.c
:100644 100644 60f712b229b63f63f2fe9e8bf5c867cf4f031d71 8a2380a151978a8735c797529be871b338958b05 M      xen-mapcache-stub.c
:100644 100644 57fe24de86b372775b2a0d4d7537f231626d594e fac47cd9be72bf1201f21745498625fec44c4515 M      xen-mapcache.c
:100644 100644 b89b8f9653a5f58e0ea710ae0db095a7355c9eb6 6216cc3be7eb68d6c53d21c96a950abcc565a1ba M      xen-mapcache.h

Please could you look at it ?

Thank you for your time.
Comment 4 Steve 2011-07-03 01:00:44 UTC
You should have KVM guest with more than 4 GB memory.
Comment 5 Steve 2011-07-05 23:00:47 UTC
I tested reported bug on more servers also with kernel 3.0.0-rc6+ -> same result.

git bisect result:

6506e4f995967b1a48cc34418c77b318df92ce35 is the first bad commit
commit 6506e4f995967b1a48cc34418c77b318df92ce35
Author: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Date:   Thu May 19 18:35:44 2011 +0100

    xen: remove xen_map_block and xen_unmap_block
    
    Replace xen_map_block with qemu_map_cache with the appropriate locking
    and size parameters.
    Replace xen_unmap_block with qemu_invalidate_entry.
    
    Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
    Signed-off-by: Alexander Graf <agraf@suse.de>

:100644 100644 01f33bb2bca8ca69ffa03ca5170d1fce3ffd2fb4 e11c1dd97a62669255a35d1628f24fc4adf538fb M      exec.c
:100644 100644 60f712b229b63f63f2fe9e8bf5c867cf4f031d71 8a2380a151978a8735c797529be871b338958b05 M      xen-mapcache-stub.c
:100644 100644 57fe24de86b372775b2a0d4d7537f231626d594e fac47cd9be72bf1201f21745498625fec44c4515 M      xen-mapcache.c
:100644 100644 b89b8f9653a5f58e0ea710ae0db095a7355c9eb6 6216cc3be7eb68d6c53d21c96a950abcc565a1ba M      xen-mapcache.h

complete git bisect log:

git bisect start
# good: [d32e8d0b8d9e0ef7cf7ab2e74548982972789dfc] Merge commit 'v0.14.1' into stable-0.14
git bisect good d32e8d0b8d9e0ef7cf7ab2e74548982972789dfc
# bad: [525e3df73e40290e95743d4c8f8b64d8d9cbe021] Merge branch 'master' of git://git.qemu.org/qemu into next
git bisect bad 525e3df73e40290e95743d4c8f8b64d8d9cbe021
# good: [2d2339f995d7176dcb2de10d162aed323a1ffbf3] Merge commit 'f487d6278f75f84378833b8c3a67443346d639dc' into upstream-merge
git bisect good 2d2339f995d7176dcb2de10d162aed323a1ffbf3
# good: [0e192fae3c79e7d2830f8b1fa694cd8e128084cf] Update version for 0.14.0-rc0
git bisect good 0e192fae3c79e7d2830f8b1fa694cd8e128084cf
# good: [075360945860ad9bdd491921954b383bf762b0e5] spice: don't call displaystate callbacks from spice server context.
git bisect good 075360945860ad9bdd491921954b383bf762b0e5
# good: [9047c0b40654ce3578c148f6754f878218569252] usb-ehci: move device/vendor/class id to qdev
git bisect good 9047c0b40654ce3578c148f6754f878218569252
# bad: [75ef849696830fc2ddeff8bb90eea5887ff50df6] esp: correctly fill bus id with requested lun
git bisect bad 75ef849696830fc2ddeff8bb90eea5887ff50df6
# bad: [d6034a3a61235042a0d79dcc1dfed0fbf461fb18] Merge remote-tracking branch 'qemu-kvm/uq/master' into staging
git bisect bad d6034a3a61235042a0d79dcc1dfed0fbf461fb18
# good: [b45a9b185120a10455859341d8035cce9b441fc8] Merge remote-tracking branch 'qemu-kvm/uq/master' into staging
git bisect good b45a9b185120a10455859341d8035cce9b441fc8
# bad: [ebed85058b6e89a5202112e9aa2abab3aa3804c3] xen: only track the linear framebuffer
git bisect bad ebed85058b6e89a5202112e9aa2abab3aa3804c3
# good: [22e1e729600dad1639329185614d094243409359] Merge branch 'cocoa-for-upstream' of git://repo.or.cz/qemu/afaerber
git bisect good 22e1e729600dad1639329185614d094243409359
# good: [c13390cd384a9564e6dded127d01ef0627b6b1c5] xen: fix qemu_map_cache with size != MCACHE_BUCKET_SIZE
git bisect good c13390cd384a9564e6dded127d01ef0627b6b1c5
# bad: [38bee5dc94ee355640b030d28f311b03ee2f13d1] exec.c: refactor cpu_physical_memory_map
git bisect bad 38bee5dc94ee355640b030d28f311b03ee2f13d1
# bad: [6506e4f995967b1a48cc34418c77b318df92ce35] xen: remove xen_map_block and xen_unmap_block
git bisect bad 6506e4f995967b1a48cc34418c77b318df92ce35
# good: [cd306087e5a9ea4091071a0a41c0ea99fac60ab0] xen: remove qemu_map_cache_unlock
git bisect good cd306087e5a9ea4091071a0a41c0ea99fac60ab0

Please could you look at ?

Thank you for your time.
Comment 6 Steve 2011-07-05 23:44:45 UTC
Here is bug (xen-mapcache.c):

void qemu_map_cache_init(void) 
{
->    mapcache->entry = qemu_mallocz(size);

should be:

      mapcache->entry = qemu_mallocz(size*sizeof(MapCacheEntry));

}

Should somebody commit this fix ?

Thank you for your time.
Comment 7 Steve 2011-07-05 23:54:31 UTC
After applying above simple fix all tested guests started & running correctly.
Comment 8 Steve 2011-07-06 11:45:19 UTC
Should someone commit fix from comment #6 solving reported bug ?

Thank you for your time.
Comment 9 Avi Kivity 2011-07-06 16:39:39 UTC
Please post the fix on qemu-devel@nongnu.org, with a signed-off-by line.
Comment 10 Rafael J. Wysocki 2011-07-10 10:26:41 UTC
First-Bad-Commit : 6506e4f995967b1a48cc34418c77b318df92ce35
Comment 11 Rafael J. Wysocki 2011-07-10 11:07:58 UTC
Sorry, this is a quemu bug, it appears.  Closing.