Bug 212389

Summary: Kernel panic on VIA Nano L2200
Product: Tracing/Profiling Reporter: 8vvbbqzo567a
Component: Kernel PerfAssignee: Frederic Weisbecker (fweisbec)
Status: NEW ---    
Severity: normal CC: a.p.zijlstra, pmenzel+bugzilla.kernel.org, regressions
Priority: P1    
Hardware: x86-64   
OS: Linux   
Kernel Version: 5.12.0-rc4 (since 5.8) Subsystem:
Regression: Yes Bisected commit-id:
Attachments: boot log

Description 8vvbbqzo567a 2021-03-22 17:55:26 UTC
Created attachment 295997 [details]
boot log

I ran into this issue after upgrading my debian kernel from 4.19 to 5.10. Debian bugreport: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=979765

It probably started with kernel 5.8, when the Zhaoxin PMU driver was added, but I've not tested that. The problem also occurs with upstream linux kernels and I re-tested today with upstream 5.12.0-rc4

No panic occurs when kernel boot parameter 'initcall_blacklist=init_hw_perf_events' is used.

PMU init messages (which aren't present when the boot parameter is used):
[    0.911684] Performance Events:
[    0.911688] core: Welcome to zhaoxin pmu!
[    0.919145] core: Version check pass!
[    0.923144] ZXC events, zhaoxin PMU driver.
[    0.927146] ... version:                2
[    0.931144] ... bit width:              40
[    0.935144] ... generic registers:      3
[    0.939144] ... value mask:             000000ffffffffff
[    0.943144] ... max period:             0000007fffffffff
[    0.947144] ... fixed-purpose events:   3
[    0.951144] ... event mask:             0000000700000007

The panic message:
[   13.426551] general protection fault: 0000 [#1] SMP
[   13.426557] CPU: 0 PID: 340 Comm: sed Tainted: G            E     5.12.0-rc4 #3
[   13.426559] Hardware name: VIA Technologies Ltd. VX800 /VX800 , BIOS 6.00 PG 02/26/2009
[   13.426561] RIP: 0010:native_read_pmc+0x4/0x40
[   13.426564] Code: 29 f1 01 c8 48 29 ca 83 e0 f8 83 f8 08 72 16 83 e0 f8 31 c9 89 cf 83 c1 08 4c 8b 04 3a 4c 89 04 3e 39 c1 72 ef c3 41 54 89 f9 <0f> 33 66 66 66 66 90 48 c1 e2 20 49 89 d4 49 09 c4 4c 89 e0 41 5c
[   13.426567] RSP: 0000:ffffa786c0903c88 EFLAGS: 00010046
[   13.426572] RAX: 0000000000000021 RBX: fffffffc466ed740 RCX: 0000000040000001
[   13.426575] RDX: 0000000000000021 RSI: 0000000000000040 RDI: 0000000040000001
[   13.426577] RBP: ffff92cdc1194000 R08: 000000000000030a R09: 0000000000000000
[   13.426579] R10: 0000000000000000 R11: 0000000000000000 R12: ffff92cdc11941e0
[   13.426581] R13: 0000000000000018 R14: 0000000000000021 R15: ffff92ce35c119e0
[   13.426584] FS:  00007f51bac35800(0000) GS:ffff92ce35c00000(0000) knlGS:0000000000000000
[   13.426586] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   13.426588] CR2: 0000563baa3db048 CR3: 0000000001ca1000 CR4: 00000000000006f0
[   13.426590] Call Trace:
[   13.426591]  x86_perf_event_update+0x4a/0xa0
[   13.426593]  zhaoxin_pmu_handle_irq+0x13a/0x250
[   13.426595]  ? __alloc_pages_nodemask+0x16a/0x310
[   13.426597]  ? mem_cgroup_charge_statistics.constprop.0+0x21/0x50
[   13.426599]  ? __mod_memcg_lruvec_state+0x22/0xe0
[   13.426601]  ? page_add_new_anon_rmap+0x4e/0xf0
[   13.426603]  ? __handle_mm_fault+0xcd5/0x15d0
[   13.426604]  perf_event_nmi_handler+0x28/0x50
[   13.426606]  nmi_handle+0x58/0x100
[   13.426608]  default_do_nmi+0x42/0x130
[   13.426609]  exc_nmi+0x12f/0x150
[   13.426611]  asm_exc_nmi+0x76/0xbf
[   13.426613] RIP: 0033:0x563ba8e57af6
[   13.426615] Code: 00 00 00 00 49 8b 96 70 03 00 00 49 8b be e0 02 00 00 4d 8b 0c cb f3 41 0f 6f 4c 35 00 4c 89 0c cf 49 8b 7c 35 10 0f 11 0c 32 <48> 89 7c 32 10 49 8b 96 80 03 00 00 41 8b 3c 8a 89 3c 8a 48 85 c0
[   13.426618] RSP: 002b:00007ffede5c3680 EFLAGS: 00000246
[   13.426622] RAX: 0000000000000000 RBX: 0000000000000014 RCX: 0000000000000000
[   13.426624] RDX: 0000563baa3da760 RSI: 0000000000000000 RDI: 0000000000000008
[   13.426626] RBP: 0000000000000000 R08: 0000000000000014 R09: 000000000000010b
[   13.426629] R10: 0000563baa3d91d0 R11: 0000563baa3dbcb0 R12: 0000563baa3dbb60
[   13.426631] R13: 0000563baa3dbd60 R14: 0000563baa3d9810 R15: 0000563baa3d8040
[   13.426633] Modules linked in: via_rng(E) rng_core(E) serio_raw(E) evdev(E) pcspkr(E) button(E) configfs(E) ip_tables(E) x_tables(E) ext4(E) crc16(E) mbcache(E) jbd2(E) hid_generic(E) usbhid(E) hid(E) raid10(E) raid456(E) libcrc32c(E) crc32c_generic(E) async_raid6_recov(E) async_memcpy(E) async_pq(E) async_xor(E) xor(E) async_tx(E) raid6_pq(E) raid1(E) raid0(E) linear(E) md_mod(E) sd_mod(E) t10_pi(E) crc_t10dif(E) crct10dif_generic(E) crct10dif_common(E) usb_storage(E) uhci_hcd(E) ehci_pci(E) ehci_hcd(E) sata_sil(E) libata(E) r8169(E) realtek(E) mdio_devres(E) scsi_mod(E) i2c_viapro(E) usbcore(E) usb_common(E) libphy(E) fan(E)
[   13.705884] ---[ end trace 8b28db321ac1f33d ]---
[   13.705886] RIP: 0010:native_read_pmc+0x4/0x40
[   13.705889] Code: 29 f1 01 c8 48 29 ca 83 e0 f8 83 f8 08 72 16 83 e0 f8 31 c9 89 cf 83 c1 08 4c 8b 04 3a 4c 89 04 3e 39 c1 72 ef c3 41 54 89 f9 <0f> 33 66 66 66 66 90 48 c1 e2 20 49 89 d4 49 09 c4 4c 89 e0 41 5c
[   13.705892] RSP: 0000:ffffa786c0903c88 EFLAGS: 00010046
[   13.705895] RAX: 0000000000000021 RBX: fffffffc466ed740 RCX: 0000000040000001
[   13.705898] RDX: 0000000000000021 RSI: 0000000000000040 RDI: 0000000040000001
[   13.705900] RBP: ffff92cdc1194000 R08: 000000000000030a R09: 0000000000000000
[   13.705902] R10: 0000000000000000 R11: 0000000000000000 R12: ffff92cdc11941e0
[   13.705904] R13: 0000000000000018 R14: 0000000000000021 R15: ffff92ce35c119e0
[   13.705906] FS:  00007f51bac35800(0000) GS:ffff92ce35c00000(0000) knlGS:0000000000000000
[   13.705909] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   13.705911] CR2: 0000563baa3db048 CR3: 0000000001ca1000 CR4: 00000000000006f0
[   13.705913] Kernel panic - not syncing: Fatal exception in interrupt
[   13.705924] Kernel Offset: 0x31c00000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)

CPU info:
# cat /proc/cpuinfo
processor       : 0
vendor_id       : CentaurHauls
cpu family      : 6
model           : 15
model name      : VIA Nano processor L2200@1600MHz
stepping        : 2
cpu MHz         : 1200.000
cache size      : 1024 KB
physical id     : 0
siblings        : 1
core id         : 0
cpu cores       : 1
apicid          : 0
initial apicid  : 0
fpu             : yes
fpu_exception   : yes
cpuid level     : 10
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat clflush acpi mmx fxsr sse sse2 ss tm pbe syscall nx fxsr_opt rdtscp lm constant_tsc arch_perfmon rep_good nopl cpuid pni monitor est tm2 ssse3 cx16 xtpr rng rng_en ace ace_en ace2 phe phe_en lahf_lm pti
bugs            : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit
bogomips        : 3199.57
clflush size    : 64
cache_alignment : 128
address sizes   : 36 bits physical, 48 bits virtual
power management:
Comment 1 8vvbbqzo567a 2021-06-12 09:25:05 UTC
This patch fixes the issue:
https://lkml.org/lkml/2021/6/6/443

When it is merged, please also apply it to the LTS and stable kernels >= 5.8
Comment 2 8vvbbqzo567a 2022-10-08 15:48:04 UTC
The patch from the previous post still hasn't been applied in 6.0.

Should I email the perf maintainers to get it merged?