Bug 150851 - general protection fault: 0000 [#1] SMP; native_read_pmc+0x7/0x40
Summary: general protection fault: 0000 [#1] SMP; native_read_pmc+0x7/0x40
Status: NEW
Alias: None
Product: Virtualization
Classification: Unclassified
Component: kvm (show other bugs)
Hardware: x86-64 Linux
: P1 normal
Assignee: virtualization_kvm
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2016-08-01 04:04 UTC by JianhongYin
Modified: 2016-08-12 17:24 UTC (History)
3 users (show)

See Also:
Kernel Version: 4.7.0
Subsystem:
Regression: No
Bisected commit-id:


Attachments

Description JianhongYin 2016-08-01 04:04:52 UTC
[14654.868027] dracut Warning: Unmounted /oldroot. 
[14654.897505] dracut: Disassembling device-mapper devices 
Rebooting. 
[14654.932392] kvm: exiting hardware virtualization 
[14654.937114] general protection fault: 0000 [#1] SMP 
[14654.938084] Modules linked in: rpcsec_gss_krb5 nfsv4 dns_resolver nfs fscache coretemp kvm_intel kvm nfsd ipmi_ssif ipmi_devintf gpio_ich iTCO_wdt iTCO_vendor_support ibmpex ses i5000_edac irqbypass enclosure scsi_transport_sas ibmaem edac_core auth_rpcgss lpc_ich i2c_i801 shpchp ipmi_si pcspkr sg nfs_acl i5k_amb i2c_smbus ipmi_msghandler lockd acpi_cpufreq grace sunrpc ip_tables xfs libcrc32c sr_mod cdrom sd_mod ata_generic pata_acpi amdkfd amd_iommu_v2 radeon i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm ixgbe drm e1000e mdio lpfc serio_raw i2c_core ata_piix dca libata ptp scsi_transport_fc bnx2 aacraid pps_core fjes dm_mirror dm_region_hash dm_log dm_mod 
[14654.938084] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.7.0 #1 
[14654.938084] Hardware name: IBM IBM System x3650 -[7979AC1]-/System Planar, BIOS -[GGE142AUS-1.13]- 12/03/2008 
[14654.938084] task: ffffffff81c0d4c0 ti: ffffffff81c00000 task.ti: ffffffff81c00000 
[14654.938084] RIP: 0010:[<ffffffff810659a7>]  [<ffffffff810659a7>] native_read_pmc+0x7/0x40 
[14654.938084] RSP: 0018:ffff8808cfc03e18  EFLAGS: 00010087 
[14654.938084] RAX: 0000000000000001 RBX: ffff8808cfc0a380 RCX: 000000000000001e 
[14654.938084] RDX: 0000000000000000 RSI: 000000000013003c RDI: 000000000000001e 
[14654.938084] RBP: ffff8808cfc03e20 R08: 0000000000000000 R09: 0000000000000000 
[14654.938084] R10: 0000000000000201 R11: 0000000000000005 R12: ffffffff80000001 
[14654.938084] R13: ffff88017fc1e000 R14: ffff88017fc1e1c0 R15: 0000000000000018 
[14654.938084] FS:  0000000000000000(0000) GS:ffff8808cfc00000(0000) knlGS:0000000000000000 
[14654.938084] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033 
[14654.938084] CR2: 0000000000f0cde0 CR3: 000000089df17000 CR4: 00000000000006f0 
[14654.938084] Stack: 
[14654.938084]  ffff8808cfc0a380 ffff8808cfc03e58 ffffffff81005b15 ffff8808cfc0a380 
[14654.938084]  ffff88017fc1e000 0000000000000004 ffff8808cfc1d444 00000000000502aa 
[14654.938084]  ffff8808cfc03e80 ffffffff81005bc9 ffff88017fc1e000 ffff8808cfc0a380 
[14654.938084] Call Trace: 
[14654.938084]  <IRQ>  
[14654.938084]  [<ffffffff81005b15>] x86_perf_event_update+0x45/0xa0 
[14654.938084]  [<ffffffff81005bc9>] x86_pmu_stop+0x59/0xd0 
[14654.938084]  [<ffffffff81005c83>] x86_pmu_del+0x43/0x140 
[14654.938084]  [<ffffffff81188d76>] event_sched_out.isra.96+0xd6/0x310 
[14654.938084]  [<ffffffff8118918d>] __perf_remove_from_context+0x2d/0xb0 
[14654.938084]  [<ffffffff8118925d>] __perf_event_exit_context+0x4d/0x70 
[14654.938084]  [<ffffffff81189210>] ? __perf_remove_from_context+0xb0/0xb0 
[14654.938084]  [<ffffffff8111279b>] flush_smp_call_function_queue+0x7b/0x160 
[14654.938084]  [<ffffffff81113283>] genei[-- MARK -- Fri Jul 29 13:40:00 2016] 
[-- MARK -- Fri Jul 29 13:45:00 2016] 
�
Comment 1 Wanpeng Li 2016-08-01 06:25:28 UTC
How to reproduce it?
Comment 2 Huaitong Han 2016-08-01 10:43:21 UTC
I guess it has nothing with virtulization, please give me your CPU information with "cat /proc/cpuinfo | head -n 30"
Comment 3 JianhongYin 2016-08-02 08:49:32 UTC
(In reply to Wanpeng Li from comment #1)
> How to reproduce it?

have not reproduced again yet.
Comment 4 JianhongYin 2016-08-02 09:00:49 UTC
(In reply to Huaitong Han from comment #2)
> I guess it has nothing with virtulization, please give me your CPU
> information with "cat /proc/cpuinfo | head -n 30"

The machine is used by others,
 got it's info from internal manage system:

System
------------------------------------------
Host Hypervisor 	(not virtualized)
Vendor 	IBM
Model 	System x3650 -[7979AC1]-
Serial Number 	KQHTLVV
MAC Address 	00:1A:64:C7:EC:08
Memory 	34029 MB
NUMA Nodes 	1

CPU
------------------------------------------
Vendor 	Intel Corp.
Model Name 	Xeon
Family 	6
Model 	23
Stepping 	6
Speed 	1992.0
Processors 	8
Cores 	8
Sockets 	2
Hyper 	False
Flags 	fpu fpu_exception wp vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx x86-64 constant_tsc arch_perfmon pebs bts rep_good nopl aperfmperf pni dtes64 monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr pdcm dca sse4_1 lahf_lm dtherm tpr_shadow vnmi flexpriority cpufreq
Arch(s) 	i386 x86_64
Comment 5 Mark Asselstine 2016-08-11 20:31:36 UTC
You are not being clear if this is falling over in the guest or native, which is it?

Can you use gdb to confirm the instruction which is failing?

I am no expert but I have built the latest mainline kernel as of today and I believe the failure is in the call to 'rdpmc'. The PCE bit (9th bit) in CR4 appears to be 0 and thus RDPMC can only be used in ring 0. If it is a guest which is failing this might explain the oops.
Comment 6 Mark Asselstine 2016-08-11 20:36:52 UTC
If this is the guest and you are using "-cpu host" you can try "-cpu host,level=9" to disable PMU emulation (see the comments here https://bugs.launchpad.net/qemu/+bug/1037675)
Comment 7 Mark Asselstine 2016-08-11 21:46:50 UTC
RCX: 000000000000001e is suspicious. This seems to be out of range.
Comment 8 Mark Asselstine 2016-08-12 17:24:59 UTC
This looks very much like what is fixed in commit 65ea11ec6a82b1d44aba62b59e9eb20247e57c6e [x86/hweight: Don't clobber %rdi]

Can you try with the above applied?

Note You need to log in before you can comment on or make changes to this bug.