Created attachment 306485 [details] WARNING/BUG and Call Trace info in dmesg in an regular linux-next guest kernel regression test setup, recently hit following BUG and WARNING, likely related to x86/fpu. --Test Setup-- KVM+QEMU environment to boot normal VM with latest linux-next guest kernel --Kconfig of Guest Kernel Compile-- refer to attachment --Error Kernel Log-- (full log refer to attachment) [ 1.958885] ------------[ cut here ]------------ [ 1.958928] WARNING: CPU: 0 PID: 218 at arch/x86/kernel/fpu/core.c:57 x86_task_fpu+0x17/0x20 [ 1.959004] Modules linked in: [ 1.959034] CPU: 0 PID: 218 Comm: rpcbind Not tainted 6.10.0-rc4-00234-g859e6ded5e41 #1 [ 1.959092] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS unknown 2/2/2022 [ 1.959147] RIP: 0010:x86_task_fpu+0x17/0x20 [ 1.959185] Code: 40 01 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 0f 1f 44 00 00 f6 47 2e 20 48 8d 87 00 25 00 00 75 05 c3 cc cc cc cc <0f> 0b 31 c0 c3 cc cc cc cc 90 90 90 90 90 90 90 90 90 90 90 90 90 [ 1.959303] RSP: 0000:ffa00000009cfe08 EFLAGS: 00010202 [ 1.959340] RAX: ff1100000bcad480 RBX: 0000000000000008 RCX: 0000000000244000 [ 1.959394] RDX: 0000000000242000 RSI: ffffffff811368bd RDI: ff1100000bcaaf80 [ 1.959447] RBP: ff1100000bcaaf80 R08: 0000000000000000 R09: 0000000000000000 [ 1.959501] R10: ff11000001835a00 R11: 0000000000000300 R12: ff1100003d231240 [ 1.959555] R13: 0000000000000007 R14: 0000000000000000 R15: ff1100003d2312b8 [ 1.959611] FS: 00007fa0908c7dc0(0000) GS:ff1100003d200000(0000) knlGS:0000000000000000 [ 1.959666] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 1.959715] CR2: 00007f938c6c0a56 CR3: 000000000c926006 CR4: 0000000000771ef0 [ 1.959772] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 1.959826] DR3: 0000000000000000 DR6: 00000000fffe07f0 DR7: 0000000000000400 [ 1.959886] PKRU: 55555554 [ 1.959908] Call Trace: [ 1.959931] <TASK> [ 1.959953] ? __warn+0x80/0x120 [ 1.959993] ? x86_task_fpu+0x17/0x20 [ 1.960024] ? report_bug+0x1c3/0x1d0 [ 1.960058] ? handle_bug+0x3c/0x70 [ 1.960093] ? exc_invalid_op+0x14/0x70 [ 1.960123] ? asm_exc_invalid_op+0x16/0x20 [ 1.960163] ? free_task+0x2d/0x70 [ 1.960197] ? x86_task_fpu+0x17/0x20 [ 1.960228] arch_release_task_struct+0x27/0x30 [ 1.960272] free_task+0x35/0x70 [ 1.960303] rcu_do_batch+0x1a5/0x460 [ 1.960343] ? rcu_do_batch+0x13b/0x460 [ 1.960374] ? timerqueue_add+0x9b/0xc0 [ 1.960409] rcu_core+0x148/0x300 [ 1.960441] handle_softirqs+0xfa/0x2f0 [ 1.960481] irq_exit_rcu+0x7a/0xc0 [ 1.960513] sysvec_apic_timer_interrupt+0x53/0xd0 [ 1.960552] asm_sysvec_apic_timer_interrupt+0x16/0x20 [ 1.960591] RIP: 0033:0x7fa090d7c2c1 [ 1.960628] Code: 83 c4 01 48 89 c2 0f b7 04 41 f6 c4 20 75 ea 84 d2 74 32 4c 89 e2 eb 12 90 48 0f be 42 01 48 83 c2 01 84 c0 74 73 0f b7 04 41 <f6> c4 20 74 ea 49 39 d4 72 55 0f b6 02 49 89 d4 84 c0 74 07 c6 02 [ 1.960745] RSP: 002b:00007ffe6875df10 EFLAGS: 00000246 [ 1.960783] RAX: 000000000000c608 RBX: 00007ffe6875e0f0 RCX: 00007fa090d9f3c0 [ 1.960837] RDX: 00007ffe6875e5da RSI: 00007ffe6875e600 RDI: 1999999999999999 [ 1.960893] RBP: 00007ffe6875e5d0 R08: ffffffffffffff00 R09: 0000000000000000 [ 1.960945] R10: 00007fa090d9eac0 R11: 00007fa090d9f3c0 R12: 00007ffe6875e5da [ 1.960999] R13: 00007ffe6875e9c0 R14: 00007ffe6875e600 R15: 00007ffe6875e5c0 [ 1.961054] </TASK> [ 1.961075] ---[ end trace 0000000000000000 ]--- [ 1.961114] BUG: kernel NULL pointer dereference, address: 0000000000000010 [ 1.961158] #PF: supervisor read access in kernel mode [ 1.961196] #PF: error_code(0x0000) - not-present page [ 1.961235] PGD bff8067 P4D 0 [ 1.961266] Oops: Oops: 0000 [#1] PREEMPT SMP NOPTI [ 1.961304] CPU: 0 PID: 218 Comm: rpcbind Tainted: G W 6.10.0-rc4-00234-g859e6ded5e41 #1 [ 1.961366] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS unknown 2/2/2022 [ 1.961412] RIP: 0010:fpstate_free+0x5/0x30 [ 1.961441] Code: 41 5c 41 5d 41 5e c3 cc cc cc cc 66 2e 0f 1f 84 00 00 00 00 00 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 0f 1f 44 00 00 <4c> 8b 47 10 4d 85 c0 74 11 48 83 c7 40 49 39 f8 74 08 4c 89 c7 e9 [ 1.961552] RSP: 0000:ffa00000009cfe10 EFLAGS: 00010246 [ 1.961589] RAX: 0000000000000000 RBX: 0000000000000008 RCX: 0000000000244000 [ 1.961646] RDX: 0000000000242000 RSI: ffffffff811368bd RDI: 0000000000000000 [ 1.961700] RBP: ff1100000bcaaf80 R08: 0000000000000000 R09: 0000000000000000 [ 1.961755] R10: ff11000001835a00 R11: 0000000000000300 R12: ff1100003d231240 [ 1.961802] R13: 0000000000000007 R14: 0000000000000000 R15: ff1100003d2312b8 [ 1.961849] FS: 00007fa0908c7dc0(0000) GS:ff1100003d200000(0000) knlGS:0000000000000000 [ 1.961911] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 1.961957] CR2: 0000000000000010 CR3: 000000000c926006 CR4: 0000000000771ef0 [ 1.962005] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 1.962056] DR3: 0000000000000000 DR6: 00000000fffe07f0 DR7: 0000000000000400 [ 1.962102] PKRU: 55555554 [ 1.962118] Call Trace: [ 1.962135] <TASK> [ 1.962155] ? __die+0x20/0x70 [ 1.962186] ? page_fault_oops+0x80/0x150 [ 1.962216] ? do_user_addr_fault+0x5f/0x680 [ 1.962254] ? kvm_read_and_reset_apf_flags+0x45/0x60 [ 1.962292] ? exc_page_fault+0x64/0x140 [ 1.962322] ? asm_exc_page_fault+0x22/0x30 [ 1.962352] ? free_task+0x2d/0x70 [ 1.962383] ? fpstate_free+0x5/0x30 [ 1.962415] free_task+0x35/0x70 [ 1.962446] rcu_do_batch+0x1a5/0x460 [ 1.962478] ? rcu_do_batch+0x13b/0x460 [ 1.962510] ? timerqueue_add+0x9b/0xc0 [ 1.962540] rcu_core+0x148/0x300 [ 1.962568] handle_softirqs+0xfa/0x2f0 [ 1.962598] irq_exit_rcu+0x7a/0xc0 [ 1.962632] sysvec_apic_timer_interrupt+0x53/0xd0 [ 1.962670] asm_sysvec_apic_timer_interrupt+0x16/0x20 [ 1.962709] RIP: 0033:0x7fa090d7c2c1 [ 1.962740] Code: 83 c4 01 48 89 c2 0f b7 04 41 f6 c4 20 75 ea 84 d2 74 32 4c 89 e2 eb 12 90 48 0f be 42 01 48 83 c2 01 84 c0 74 73 0f b7 04 41 <f6> c4 20 74 ea 49 39 d4 72 55 0f b6 02 49 89 d4 84 c0 74 07 c6 02 [ 1.962862] RSP: 002b:00007ffe6875df10 EFLAGS: 00000246 [ 1.962899] RAX: 000000000000c608 RBX: 00007ffe6875e0f0 RCX: 00007fa090d9f3c0 [ 1.962946] RDX: 00007ffe6875e5da RSI: 00007ffe6875e600 RDI: 1999999999999999 [ 1.962992] RBP: 00007ffe6875e5d0 R08: ffffffffffffff00 R09: 0000000000000000 [ 1.963045] R10: 00007fa090d9eac0 R11: 00007fa090d9f3c0 R12: 00007ffe6875e5da [ 1.963090] R13: 00007ffe6875e9c0 R14: 00007ffe6875e600 R15: 00007ffe6875e5c0 [ 1.963137] </TASK> [ 1.963158] Modules linked in: [ 1.963189] CR2: 0000000000000010 [ 1.963220] ---[ end trace 0000000000000000 ]--- [ 1.967997] RIP: 0010:fpstate_free+0x5/0x30 [ 1.968033] Code: 41 5c 41 5d 41 5e c3 cc cc cc cc 66 2e 0f 1f 84 00 00 00 00 00 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 0f 1f 44 00 00 <4c> 8b 47 10 4d 85 c0 74 11 48 83 c7 40 49 39 f8 74 08 4c 89 c7 e9 [ 1.968146] RSP: 0000:ffa00000009cfe10 EFLAGS: 00010246 [ 1.968183] RAX: 0000000000000000 RBX: 0000000000000008 RCX: 0000000000244000 [ 1.968232] RDX: 0000000000242000 RSI: ffffffff811368bd RDI: 0000000000000000 [ 1.968287] RBP: ff1100000bcaaf80 R08: 0000000000000000 R09: 0000000000000000 [ 1.968335] R10: ff11000001835a00 R11: 0000000000000300 R12: ff1100003d231240 [ 1.968389] R13: 0000000000000007 R14: 0000000000000000 R15: ff1100003d2312b8 [ 1.968436] FS: 00007fa0908c7dc0(0000) GS:ff1100003d200000(0000) knlGS:0000000000000000 [ 1.968490] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 1.968535] CR2: 0000000000000010 CR3: 000000000c926006 CR4: 0000000000771ef0 [ 1.968587] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 1.968641] DR3: 0000000000000000 DR6: 00000000fffe07f0 DR7: 0000000000000400 [ 1.968688] PKRU: 55555554 [ 1.968705] Kernel panic - not syncing: Fatal exception in interrupt [ 1.968769] Kernel Offset: disabled [ 1.973595] ---[ end Kernel panic - not syncing: Fatal exception in interrupt ]--- --Notes on Issue Observed-- a. issue seen since linux-next-6.10-rc3-240611 b. issue reproduced on https://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git master branch c. issue hit around 90% of VM booting cycles, not 100% reproducible d. git bisect points to https://lore.kernel.org/all/20240605083557.2051480-4-mingo@kernel.org as bad commit e. attachment info: e1: kconfig.config -> guest kernel kconfig e2: vm_boot_null_pointer_panic_and_fpu_warning.log -> full guest kernel booting log when issue hit e3: vm_boot_pass.log -> full guest kernel booting log when no issue hit
Created attachment 306486 [details] guest kernel kconfig
Created attachment 306487 [details] vm_boot_pass.log
Created attachment 306488 [details] Call_Trace_decoded.log
Quick update: issue won't be reproduced if disable kconfig option CONFIG_X86_DEBUG_FPU, hopefully it could narrow down the regression issue to CONFIG_X86_DEBUG_FPU related code.
On Mon, Jun 24, 2024 at 06:43:54AM +0000, bugzilla-daemon@kernel.org wrote: > https://bugzilla.kernel.org/show_bug.cgi?id=218980 > > Bug ID: 218980 > Summary: [VM boot] Guest Kernel hit BUG: kernel NULL pointer > dereference, address: 0000000000000010 and WARNING: > CPU: 0 PID: 218 at arch/x86/kernel/fpu/core.c:57 > x86_task_fpu+0x17/0x20 > Product: Virtualization > Version: unspecified > Hardware: All > OS: Linux > Status: NEW > Severity: normal > Priority: P3 > Component: kvm > Assignee: virtualization_kvm@kernel-bugs.osdl.org > Reporter: hongyu.ning@intel.com > Regression: No > > Created attachment 306485 [details] > --> https://bugzilla.kernel.org/attachment.cgi?id=306485&action=edit > WARNING/BUG and Call Trace info in dmesg > > in an regular linux-next guest kernel regression test setup, recently hit > following BUG and WARNING, likely related to x86/fpu. > [ 1.962383] ? fpstate_free+0x5/0x30 Yeah we run into the same thing on *all* boots on linux-next on kdevops as well, Cc'ing kdevops list so folks are aware linux-next is broken right now. [ 16.785349] BUG: kernel NULL pointer dereference, address: 0000000000000010 [ 16.785353] #PF: supervisor read access in kernel mode [ OK ] Found device[ 16.785354] #PF: error_code(0x0000) - not-present page dev-disk-by\x2dlabel-…evice - QEMU NVMe Ctrl sparsefiles. [ 16.785356] PGD 0 P4D 0 [ 16.785358] Oops: Oops: 0000 [#1] PREEMPT SMP NOPTI [ 16.785361] CPU: 1 UID: 0 PID: 528 Comm: modprobe Tainted: G W 6.10.0-rc5-next-20240628+ #8 [ 16.785365] Tainted: [W]=WARN [ 16.785366] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-debian-1.16.3-2 04/01/2014 [ 16.785367] RIP: 0010:fpstate_free+0x5/0x30 [ 16.785373] Code: 41 5c 41 5d 41 5e c3 cc cc cc cc 66 2e 0f 1f 84 00 00 00 00 00 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 0f 1f 44 00 00 <48> 8b 47 10 48 85 c0 74 11 48 83 c7 40 48 39 f8 74 08 48 89 c7 e9 [ 16.785374] RSP: 0000:ffffb4dd80673e48 EFLAGS: 00010246 [ 16.785376] RAX: 0000000000000000 RBX: ffff8eca5fdd0000 RCX: 00000000801c0012 [ 16.785378] RDX: ffff8eca54bef500 RSI: ffffffff8aa9b92d RDI: 0000000000000000 [ 16.785379] RBP: ffff8ecabbc72840 R08: ffff8eca54bed100 R09: 00000000801c0012 [ 16.785380] R10: 00000000801c0012 R11: 0000000000000001 R12: ffff8eca605dc800 [ 16.785381] R13: 0000000000030bc8 R14: ffff8ecabbc728b8 R15: 0000000000000004 [ 16.785382] FS: 00007f26f73a35c0(0000) GS:ffff8ecabbc40000(0000) knlGS:0000000000000000 [ 16.785383] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 16.785385] CR2: 0000000000000010 CR3: 00000001175b6006 CR4: 0000000000770ef0 [ 16.785389] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 16.785390] DR3: 0000000000000000 DR6: 00000000fffe07f0 DR7: 0000000000000400 [ 16.785391] PKRU: 55555554 [ 16.785392] Call Trace: [ 16.785394] <TASK> [ 16.785396] ? __die+0x1f/0x60 [ 16.785401] ? page_fault_oops+0x158/0x460 [ 16.785406] ? x86_task_fpu+0x17/0x20 [ 16.785410] ? do_user_addr_fault+0x63/0x6c0 [ 16.785413] ? exc_page_fault+0x79/0x190 [ 16.785417] ? asm_exc_page_fault+0x22/0x30 [ 16.785420] ? free_task+0x2d/0x70 [ 16.785424] ? fpstate_free+0x5/0x30 [ 16.785427] ? arch_release_task_struct+0x27/0x30 [ 16.785429] free_task+0x35/0x70 [ 16.785432] rcu_core+0x499/0x7d0 [ 16.785436] ? rcu_core+0x434/0x7d0 [ 16.785440] handle_softirqs+0xf9/0x300 [ 16.785444] __irq_exit_rcu+0x6e/0xc0 [ 16.785446] sysvec_apic_timer_interrupt+0x51/0xc0 [ 16.785450] asm_sysvec_apic_timer_interrupt+0x16/0x20 [ 16.785452] RIP: 0033:0x7f26f74d0858
On Sun, Jun 30, 2024 at 03:21:10PM -0700, Luis Chamberlain wrote: > [ 16.785424] ? fpstate_free+0x5/0x30 Bisecting leads so far to next-20240619 as good and next-20240624 as bad. Luis
On Sun, Jun 30, 2024 at 03:47:57PM -0700, Luis Chamberlain wrote: > On Sun, Jun 30, 2024 at 03:21:10PM -0700, Luis Chamberlain wrote: > > [ 16.785424] ? fpstate_free+0x5/0x30 > > Bisecting leads so far to next-20240619 as good and next-20240624 as bad. Either way, this is now fixed on next-20240703. Luis
(In reply to Luis Chamberlain from comment #7) > On Sun, Jun 30, 2024 at 03:47:57PM -0700, Luis Chamberlain wrote: > > On Sun, Jun 30, 2024 at 03:21:10PM -0700, Luis Chamberlain wrote: > > > [ 16.785424] ? fpstate_free+0x5/0x30 > > > > Bisecting leads so far to next-20240619 as good and next-20240624 as bad. > > Either way, this is now fixed on next-20240703. > > Luis good to know, on my side, a new issue met and blocked further verify on original issue: https://bugzilla.kernel.org/show_bug.cgi?id=219034 since it's verified by Luis, let's mark it as resolved. -=Hongyu