Bug 56971 - [nested virt] L1 CPU Stuck when booting a L2 guest
Summary: [nested virt] L1 CPU Stuck when booting a L2 guest
Status: CLOSED CODE_FIX
Alias: None
Product: Virtualization
Classification: Unclassified
Component: kvm (show other bugs)
Hardware: All Linux
: P1 normal
Assignee: virtualization_kvm
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2013-04-22 14:32 UTC by Jay Ren
Modified: 2013-05-29 07:46 UTC (History)
0 users

See Also:
Kernel Version: 3.9.0-rc3
Subsystem:
Regression: Yes
Bisected commit-id:


Attachments
call trace in L1 guest (138.32 KB, text/plain)
2013-04-22 14:32 UTC, Jay Ren
Details

Description Jay Ren 2013-04-22 14:32:40 UTC
Created attachment 99651 [details]
call trace in L1 guest 

Environment:
------------
Host OS (ia32/ia32e/IA64):ia32e
Guest OS (ia32/ia32e/IA64):ia32e
Guest OS Type (Linux/Windows):Linux
kvm.git next Commit:c0d1c770c05ac7051df86914f9627b68f29c1d67
qemu-kvm.git Commit:8912bdea01e8671e59fe0287314379be9c1f40ec
Host Kernel Version:3.9.0-rc3
Hardware: Sandy Bridge - EP


Bug detailed description:
--------------------------
create L1 guest with "-cpu host", then create a L2 guest. We found CPU Stuck in L1 guest when booting L2 guest.
This should be a regression.

kvm(next)+ qemu-kvm   =  result
188424ba + 8912bdea   = good
c0d1c770 + 8912bdea   = bad 


Reproduce steps:
----------------
1. create L1 guests:
qemu-system-x86_64 --enable-kvm -m 10240 -smp 8 -net
nic,macaddr=00:12:45:67:2B:1C -net tap,script=/etc/kvm/qemu-ifup
nested-kvm-rhel6u4.qcow -cpu host
2. create L2 guest
qemu-system-x86_64 --enable-kvm -m 1024 -smp 2 -net none rhel6u4.img


Current result:
----------------
L1 call trace  (CPU Stuck)

Expected result:
----------------
L1 and L2 work fine

Basic root-causing log: 
----------------------


[  312.158002] BUG: soft lockup - CPU#7 stuck for 22s! [qemu-system-x86:2353]

[  312.158002] Modules linked in: bridge nfsv3 nfs_acl nfsv4 auth_rpcgss nfs
fuse fscache dns_resolver lockd sunrpc 8021q garp stp llc binfmt_misc uinput
ppdev parport_pc parport kvm_intel kvm crc32c_intel ghash_clmulni_intel
microcode pcspkr e1000 cirrus ttm drm_kms_helper drm i2c_piix4 i2c_core
floppy(F)

[  312.158002] CPU 7 

[  312.158002] Pid: 2353, comm: qemu-system-x86 Tainted: GF            3.8.5 #1
Bochs Bochs

[  312.158002] RIP: 0010:[<ffffffff810c0982>]  [<ffffffff810c0982>]
smp_call_function_many+0x202/0x270

[  312.158002] RSP: 0018:ffff88027a21dce8  EFLAGS: 00000202

[  312.158002] RAX: 0000000000000008 RBX: 000800008ede8700 RCX:
0000000000000004

[  312.158002] RDX: 0000000000000004 RSI: 0000000000000080 RDI:
0000000000000292

[  312.158002] RBP: ffff88027a21dd38 R08: ffff88029fdd4890 R09:
0000000000000080

[  312.158002] R10: 0000000000000002 R11: 0000000000000000 R12:
0000000000000292

[  312.158002] R13: 0000000000000001 R14: ffffea0009e34640 R15:
0040000000080008

[  312.158002] FS:  00007ff26ef41700(0000) GS:ffff88029fdc0000(0000)
knlGS:0000000000000000

[  312.158002] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033

[  312.158002] CR2: 00007ff271248000 CR3: 00000002902a7000 CR4:
00000000000427e0

[  312.158002] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
0000000000000000

[  312.158002] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7:
0000000000000400

[  312.158002] Process qemu-system-x86 (pid: 2353, threadinfo ffff88027a21c000,
task ffff88027d368000)

[  312.158002] Stack:

[  312.158002]  ffff88027d368000 01ff88027d368000 ffff88027a21dd48
ffffffff8104f110

[  312.158002]  00007ff23fffffff ffff88028ede89c0 ffff88028ede8700
00007ff267dff000

[  312.158002]  ffff88027a21de18 00007ff268000000 ffff88027a21dd68
ffffffff8104ed5e

[  312.158002] Call Trace:

[  312.158002]  [<ffffffff8104f110>] ? flush_tlb_mm_range+0x250/0x250

[  312.158002]  [<ffffffff8104ed5e>] native_flush_tlb_others+0x2e/0x30

[  312.158002]  [<ffffffff8104ef30>] flush_tlb_mm_range+0x70/0x250

[  312.158002]  [<ffffffff8115d822>] tlb_flush_mmu+0xa2/0xb0

[  312.158002]  [<ffffffff8115e15c>] tlb_finish_mmu+0x1c/0x50

[  312.158002]  [<ffffffff8116552a>] unmap_region+0xea/0x110

[  312.158002]  [<ffffffff811672c2>] ? __split_vma+0x1e2/0x230

[  312.158002]  [<ffffffff81167c14>] do_munmap+0x274/0x3a0

[  312.158002]  [<ffffffff81167d91>] vm_munmap+0x51/0x80

[  312.158002]  [<ffffffff81167dec>] sys_munmap+0x2c/0x40

[  312.158002]  [<ffffffff81653999>] system_call_fastpath+0x16/0x1b

[  312.158002] Code: a6 58 00 0f ae f0 4c 89 e7 ff 15 e2 08 b6 00 80 7d bf 00
0f 84 89 fe ff ff f6 43 20 01 0f 84 7f fe ff ff 66 0f 1f 44 00 00 f3 90 <f6> 43
20 01 75 f8 e9 6c fe ff ff 0f 1f 00 4c 89 ea 4c 89 f6 44
Comment 1 Jay Ren 2013-04-27 06:02:59 UTC
CCed Jan Kiszka.
I did some bisection and found the following commit introduced this bug.
The bug still exists in the latest kvm.git next branch.

commit 5f3d5799974b89100268ba813cec8db7bd0693fb
Author: Jan Kiszka <jan.kiszka@siemens.com>
Date:   Sun Apr 14 12:12:46 2013 +0200

    KVM: nVMX: Rework event injection and recovery


Best Regards,
     Yongjie (Jay)


> -----Original Message-----
> From: kvm-owner@vger.kernel.org [mailto:kvm-owner@vger.kernel.org]
> On Behalf Of bugzilla-daemon@bugzilla.kernel.org
> Sent: Monday, April 22, 2013 10:33 PM
> To: kvm@vger.kernel.org
> Subject: [Bug 56971] New: [nested virt] L1 CPU Stuck when booting a L2
> guest
> 
> https://bugzilla.kernel.org/show_bug.cgi?id=56971
> 
>            Summary: [nested virt] L1 CPU Stuck when booting a L2 guest
>            Product: Virtualization
>            Version: unspecified
>     Kernel Version: 3.9.0-rc3
>           Platform: All
>         OS/Version: Linux
>               Tree: Mainline
>             Status: NEW
>           Severity: normal
>           Priority: P1
>          Component: kvm
>         AssignedTo: virtualization_kvm@kernel-bugs.osdl.org
>         ReportedBy: yongjie.ren@intel.com
>         Regression: No
> 
> 
> Created an attachment (id=99651)
>  --> (https://bugzilla.kernel.org/attachment.cgi?id=99651)
> call trace in L1 guest
> 
> Environment:
> ------------
> Host OS (ia32/ia32e/IA64):ia32e
> Guest OS (ia32/ia32e/IA64):ia32e
> Guest OS Type (Linux/Windows):Linux
> kvm.git next Commit:c0d1c770c05ac7051df86914f9627b68f29c1d67
> qemu-kvm.git Commit:8912bdea01e8671e59fe0287314379be9c1f40ec
> Host Kernel Version:3.9.0-rc3
> Hardware: Sandy Bridge - EP
> 
> 
> Bug detailed description:
> --------------------------
> create L1 guest with "-cpu host", then create a L2 guest. We found CPU
> Stuck in
> L1 guest when booting L2 guest.
> This should be a regression.
> 
> kvm(next)+ qemu-kvm   =  result
> 188424ba + 8912bdea   = good
> c0d1c770 + 8912bdea   = bad
> 
> 
> Reproduce steps:
> ----------------
> 1. create L1 guests:
> qemu-system-x86_64 --enable-kvm -m 10240 -smp 8 -net
> nic,macaddr=00:12:45:67:2B:1C -net tap,script=/etc/kvm/qemu-ifup
> nested-kvm-rhel6u4.qcow -cpu host
> 2. create L2 guest
> qemu-system-x86_64 --enable-kvm -m 1024 -smp 2 -net none rhel6u4.img
> 
> 
> Current result:
> ----------------
> L1 call trace  (CPU Stuck)
> 
> Expected result:
> ----------------
> L1 and L2 work fine
> 
> Basic root-causing log:
> ----------------------
> 
> 
> [  312.158002] BUG: soft lockup - CPU#7 stuck for 22s!
> [qemu-system-x86:2353]
> 
> [  312.158002] Modules linked in: bridge nfsv3 nfs_acl nfsv4 auth_rpcgss
> nfs
> fuse fscache dns_resolver lockd sunrpc 8021q garp stp llc binfmt_misc
> uinput
> ppdev parport_pc parport kvm_intel kvm crc32c_intel ghash_clmulni_intel
> microcode pcspkr e1000 cirrus ttm drm_kms_helper drm i2c_piix4 i2c_core
> floppy(F)
> 
> [  312.158002] CPU 7
> 
> [  312.158002] Pid: 2353, comm: qemu-system-x86 Tainted: GF
> 3.8.5 #1
> Bochs Bochs
> 
> [  312.158002] RIP: 0010:[<ffffffff810c0982>]  [<ffffffff810c0982>]
> smp_call_function_many+0x202/0x270
> 
> [  312.158002] RSP: 0018:ffff88027a21dce8  EFLAGS: 00000202
> 
> [  312.158002] RAX: 0000000000000008 RBX: 000800008ede8700 RCX:
> 0000000000000004
> 
> [  312.158002] RDX: 0000000000000004 RSI: 0000000000000080 RDI:
> 0000000000000292
> 
> [  312.158002] RBP: ffff88027a21dd38 R08: ffff88029fdd4890 R09:
> 0000000000000080
> 
> [  312.158002] R10: 0000000000000002 R11: 0000000000000000 R12:
> 0000000000000292
> 
> [  312.158002] R13: 0000000000000001 R14: ffffea0009e34640 R15:
> 0040000000080008
> 
> [  312.158002] FS:  00007ff26ef41700(0000) GS:ffff88029fdc0000(0000)
> knlGS:0000000000000000
> 
> [  312.158002] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> 
> [  312.158002] CR2: 00007ff271248000 CR3: 00000002902a7000 CR4:
> 00000000000427e0
> 
> [  312.158002] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
> 0000000000000000
> 
> [  312.158002] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7:
> 0000000000000400
> 
> [  312.158002] Process qemu-system-x86 (pid: 2353, threadinfo
> ffff88027a21c000,
> task ffff88027d368000)
> 
> [  312.158002] Stack:
> 
> [  312.158002]  ffff88027d368000 01ff88027d368000 ffff88027a21dd48
> ffffffff8104f110
> 
> [  312.158002]  00007ff23fffffff ffff88028ede89c0 ffff88028ede8700
> 00007ff267dff000
> 
> [  312.158002]  ffff88027a21de18 00007ff268000000 ffff88027a21dd68
> ffffffff8104ed5e
> 
> [  312.158002] Call Trace:
> 
> [  312.158002]  [<ffffffff8104f110>] ? flush_tlb_mm_range+0x250/0x250
> 
> [  312.158002]  [<ffffffff8104ed5e>] native_flush_tlb_others+0x2e/0x30
> 
> [  312.158002]  [<ffffffff8104ef30>] flush_tlb_mm_range+0x70/0x250
> 
> [  312.158002]  [<ffffffff8115d822>] tlb_flush_mmu+0xa2/0xb0
> 
> [  312.158002]  [<ffffffff8115e15c>] tlb_finish_mmu+0x1c/0x50
> 
> [  312.158002]  [<ffffffff8116552a>] unmap_region+0xea/0x110
> 
> [  312.158002]  [<ffffffff811672c2>] ? __split_vma+0x1e2/0x230
> 
> [  312.158002]  [<ffffffff81167c14>] do_munmap+0x274/0x3a0
> 
> [  312.158002]  [<ffffffff81167d91>] vm_munmap+0x51/0x80
> 
> [  312.158002]  [<ffffffff81167dec>] sys_munmap+0x2c/0x40
> 
> [  312.158002]  [<ffffffff81653999>] system_call_fastpath+0x16/0x1b
> 
> [  312.158002] Code: a6 58 00 0f ae f0 4c 89 e7 ff 15 e2 08 b6 00 80 7d bf
> 00
> 0f 84 89 fe ff ff f6 43 20 01 0f 84 7f fe ff ff 66 0f 1f 44 00 00 f3 90 <f6>
> 43
> 20 01 75 f8 e9 6c fe ff ff 0f 1f 00 4c 89 ea 4c 89 f6 44
> 
> --
> Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email
> ------- You are receiving this mail because: -------
> You are watching the assignee of the bug.
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
Comment 2 Jan Kiszka 2013-04-27 11:03:27 UTC
On 2013-04-27 08:02, Ren, Yongjie wrote:
> CCed Jan Kiszka.
> I did some bisection and found the following commit introduced this bug.
> The bug still exists in the latest kvm.git next branch.
> 
> commit 5f3d5799974b89100268ba813cec8db7bd0693fb
> Author: Jan Kiszka <jan.kiszka@siemens.com>
> Date:   Sun Apr 14 12:12:46 2013 +0200
> 
>     KVM: nVMX: Rework event injection and recovery
> 
> 

I've reproduced some lock-up of L1 that starts to show with my commit.
Debugging...

Thanks for reporting,
Jan
Comment 3 Jay Ren 2013-05-29 07:45:41 UTC
It's fixed by Jan Kiszka in KVM upstream now.
Comment 4 Jay Ren 2013-05-29 07:46:13 UTC
Let me close it.

Note You need to log in before you can comment on or make changes to this bug.