Bug 14521 - mce: Warning at boot - native_apic_write_dummy
Summary: mce: Warning at boot - native_apic_write_dummy
Status: CLOSED OBSOLETE
Alias: None
Product: Platform Specific/Hardware
Classification: Unclassified
Component: i386 (show other bugs)
Hardware: All Linux
: P1 normal
Assignee: platform_i386
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2009-10-31 12:32 UTC by Torsten Krah
Modified: 2012-06-14 16:32 UTC (History)
8 users (show)

See Also:
Kernel Version: 2.6.31
Subsystem:
Regression: Yes
Bisected commit-id:


Attachments
dmesg from failing boot (59.05 KB, text/plain)
2009-11-04 21:54 UTC, Torsten Krah
Details
acpidump DSDT (98.37 KB, application/octet-stream)
2009-11-04 21:55 UTC, Torsten Krah
Details
config (108.76 KB, text/plain)
2009-11-04 21:55 UTC, Torsten Krah
Details
disableapic dmesg output (44.87 KB, text/plain)
2009-11-07 12:07 UTC, Torsten Krah
Details
cpuinfo (460 bytes, text/plain)
2009-11-07 19:47 UTC, Torsten Krah
Details
dmesg output for a Samsung P30 (CORONA) (46.94 KB, text/plain)
2009-12-02 15:30 UTC, another.invisible.woman
Details
dmesg on Dell Inspiron 1100 (46.38 KB, application/octet-stream)
2009-12-12 02:15 UTC, kernel
Details
dmesg with lapic on Dell Inspiron 1100 (44.78 KB, application/octet-stream)
2009-12-12 02:17 UTC, kernel
Details

Description Torsten Krah 2009-10-31 12:32:27 UTC
Switching from 2.6.28 to 2.6.31 dmesg shows this:


[    0.004961] ------------[ cut here ]------------
[    0.005022] WARNING: at /build/buildd/linux-2.6.31/arch/x86/kernel/apic/apic.c:247 native_apic_write_dummy+0x33/0x40()
[    0.005083] Hardware name: CoronaR
[    0.005127] Modules linked in:
[    0.005205] Pid: 0, comm: swapper Not tainted 2.6.31-14-generic #48-Ubuntu
[    0.005255] Call Trace:
[    0.005309]  [<c014518d>] warn_slowpath_common+0x6d/0xa0
[    0.005358]  [<c011d7e3>] ? native_apic_write_dummy+0x33/0x40
[    0.005408]  [<c011d7e3>] ? native_apic_write_dummy+0x33/0x40
[    0.005458]  [<c01451d5>] warn_slowpath_null+0x15/0x20
[    0.005507]  [<c011d7e3>] native_apic_write_dummy+0x33/0x40
[    0.005560]  [<c011278c>] intel_init_thermal+0xac/0x1a0
[    0.005609]  [<c0111dbb>] mce_intel_feature_init+0xb/0x60
[    0.005658]  [<c010fcf0>] mce_cpu_features+0x10/0x40
[    0.005710]  [<c056ac3c>] mcheck_init+0x14a/0x188
[    0.005758]  [<c0569078>] ? init_hypervisor+0xb/0x2c
[    0.005807]  [<c0569030>] identify_cpu+0x20e/0x21d
[    0.005859]  [<c0795732>] identify_boot_cpu+0xd/0x23
[    0.005907]  [<c07958c8>] check_bugs+0xb/0xe9
[    0.005959]  [<c078e8c3>] start_kernel+0x2dc/0x2ec
[    0.006007]  [<c078e406>] ? unknown_bootoption+0x0/0x1ab
[    0.006056]  [<c078e07c>] i386_start_kernel+0x7c/0x83
[    0.006113] ---[ end trace a7919e7f17c0a725 ]---
Comment 1 Torsten Krah 2009-10-31 12:35:06 UTC
(In reply to comment #0)

I am using a Samsung P35, Ubuntu 9.10 distribution - it seems to work all things fine, except #14293 where my sensors are not detected anymore - but i don't know if they are related to each other, guess not.
Comment 2 ykzhao 2009-11-03 01:54:33 UTC
Will you please attach the output of acpidump, .config?
Thanks.
Comment 3 Len Brown 2009-11-03 02:41:18 UTC
looks like the APIC is disabled, since native_apic_write_dummy()
is complaining.  The mystery is why intel_init_thermal() got this far --
as it should have hit native_apic_read_dummy() first...

please supply the complete output from dmesg -s6400 from the failing boot.

BTW. this appears to have nothing to do with ACPI.
Comment 4 Torsten Krah 2009-11-04 21:54:57 UTC
Created attachment 23653 [details]
dmesg from failing boot
Comment 5 Torsten Krah 2009-11-04 21:55:21 UTC
Created attachment 23654 [details]
acpidump DSDT
Comment 6 Torsten Krah 2009-11-04 21:55:42 UTC
Created attachment 23655 [details]
config
Comment 7 ykzhao 2009-11-06 06:28:54 UTC
From the dmesg log it seems that there is no MADT table. In such case it can't find the SMP configuration for this box(local APIC and IO APIC).

Will you please add the boot option of "disableapic" and attach the output of "cat /proc/cpuinfo"?

thanks.
Comment 8 ykzhao 2009-11-06 06:30:00 UTC
Will you please also try the boot option of "lapic" and attach the output of dmesg?
thanks.
Comment 9 Torsten Krah 2009-11-07 12:07:39 UTC
Created attachment 23690 [details]
disableapic dmesg output

lapic instantly hard locks up the machine at:

bio: create slab <bio-0> at 0

attachment with disableapic attached.
Comment 10 Torsten Krah 2009-11-07 19:47:04 UTC
Created attachment 23695 [details]
cpuinfo

Sorry, did forgot cpuinfo output.
Comment 11 Jose Marino 2009-11-14 19:22:11 UTC
I'm also seeing this warning singe 2.6.31 vanilla. I did a quick bisect between 
v2.6.30 and v2.6.31 and found that the first commit that introduces the warning is:

commit 48b1fddbb100a64f3983ca9768b8ea629a09aa20
Merge: 3873607 ee4c24a
Author: H. Peter Anvin <hpa@zytor.com>
Date:   Mon Jun 1 15:13:02 2009 -0700

    Merge branch 'irq/numa' into x86/mce3
    
    Merge reason: arch/x86/kernel/irqinit_{32,64}.c unified in irq/numa
    and modified in x86/mce3; this merge resolves the conflict.
    
    Conflicts:
        arch/x86/kernel/irqinit.c
    
    Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Comment 12 another.invisible.woman 2009-12-02 15:30:28 UTC
Created attachment 23998 [details]
dmesg output for a Samsung P30 (CORONA)

I own a Samsung P30 which shows the same warning when upgrading from 2.6.30 to 2.6.31. I attach the dmesg output. Tell me when you need more.

M.
Comment 13 kernel 2009-12-12 02:15:13 UTC
Created attachment 24157 [details]
dmesg on Dell Inspiron 1100

I see the following section in dmesg on my Dell Inspiron 1100 laptop running Ubuntu 9.10 (Karmic Koala) with the 2.6.31-16-generic kernel:

[    0.002076] ------------[ cut here ]------------
[    0.002093] WARNING: at /build/buildd/linux-2.6.31/arch/x86/kernel/apic/apic.c:247 native_apic_write_dummy+0x33/0x40()
[    0.002099] Hardware name: Inspiron 1100                   
[    0.002103] Modules linked in:
[    0.002113] Pid: 0, comm: swapper Not tainted 2.6.31-14-generic #48-Ubuntu
[    0.002118] Call Trace:
[    0.002133]  [<c014518d>] warn_slowpath_common+0x6d/0xa0
[    0.002142]  [<c011d7e3>] ? native_apic_write_dummy+0x33/0x40
[    0.002150]  [<c011d7e3>] ? native_apic_write_dummy+0x33/0x40
[    0.002159]  [<c01451d5>] warn_slowpath_null+0x15/0x20
[    0.002167]  [<c011d7e3>] native_apic_write_dummy+0x33/0x40
[    0.002177]  [<c011278c>] intel_init_thermal+0xac/0x1a0
[    0.002185]  [<c0111dbb>] mce_intel_feature_init+0xb/0x60
[    0.002193]  [<c010fcf0>] mce_cpu_features+0x10/0x40
[    0.002205]  [<c056ac3c>] mcheck_init+0x14a/0x188
[    0.002213]  [<c0569078>] ? init_hypervisor+0xb/0x2c
[    0.002221]  [<c0569030>] identify_cpu+0x20e/0x21d
[    0.002234]  [<c0795732>] identify_boot_cpu+0xd/0x23
[    0.002241]  [<c07958c8>] check_bugs+0xb/0xe9
[    0.002252]  [<c078e8c3>] start_kernel+0x2dc/0x2ec
[    0.002260]  [<c078e406>] ? unknown_bootoption+0x0/0x1ab
[    0.002268]  [<c078e07c>] i386_start_kernel+0x7c/0x83
[    0.002285] ---[ end trace a7919e7f17c0a725 ]---

The full output of dmesg is attached.
Comment 14 kernel 2009-12-12 02:17:47 UTC
Created attachment 24158 [details]
dmesg with lapic on Dell Inspiron 1100

Adding the lapic kernel parameter appears to resolve this problem for me. I no longer see the warning mentioned above when using the lapic parameter. The full output of dmesg with the lapic option is attached.
Comment 15 kernel 2009-12-12 02:19:45 UTC
Output of "cat /proc/cpuinfo" is as follows:

processor	: 0
vendor_id	: GenuineIntel
cpu family	: 15
model		: 2
model name	: Intel(R) Pentium(R) 4 CPU 2.20GHz
stepping	: 9
cpu MHz		: 2193.051
cache size	: 512 KB
fdiv_bug	: no
hlt_bug		: no
f00f_bug	: no
coma_bug	: no
fpu		: yes
fpu_exception	: yes
cpuid level	: 2
wp		: yes
flags		: fpu vme de pse tsc msr pae mce cx8 apic mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe up pebs bts cid xtpr
bogomips	: 4386.10
clflush size	: 64
power management:
Comment 16 H. Peter Anvin 2009-12-12 02:20:22 UTC
This is a known bug, and a patch is already on the way upstream.
Comment 17 kernel 2009-12-12 18:27:25 UTC
Thanks Peter. Is there any link available to the bug on an upstream bug tracker?
Comment 18 Ozan Caglayan 2010-01-11 13:38:14 UTC
What's the situation in here? Having same issue with a Dell Latitude D600:

[    0.000041] Console: colour dummy device 80x25
[    0.000045] console [tty0] enabled
[    0.000094] Calibrating delay loop (skipped), value calculated using timer frequency.. 3189.41 BogoMIPS (lpj=1594709)
[    0.000121] Security Framework initialized
[    0.000133] Mount-cache hash table entries: 512
[    0.000289] Initializing cgroup subsys ns
[    0.000295] Initializing cgroup subsys cpuacct
[    0.000300] Initializing cgroup subsys memory
[    0.000310] Initializing cgroup subsys devices
[    0.000313] Initializing cgroup subsys freezer
[    0.000335] CPU: L1 I cache: 32K, L1 D cache: 32K
[    0.000339] CPU: L2 cache: 2048K
[    0.000344] mce: CPU supports 5 MCE banks
[    0.000352] ------------[ cut here ]------------
[    0.000362] WARNING: at arch/x86/kernel/apic/apic.c:247 native_apic_write_dummy+0x2d/0x39()
[    0.000365] Hardware name: Latitude D600                   
[    0.000368] Modules linked in:
[    0.000373] Pid: 0, comm: swapper Not tainted 2.6.31.9-129 #1
[    0.000376] Call Trace:
[    0.000383]  [<c01384f5>] warn_slowpath_common+0x60/0x90
[    0.000388]  [<c0138532>] warn_slowpath_null+0xd/0x10
[    0.000392]  [<c0119d48>] native_apic_write_dummy+0x2d/0x39
[    0.000398]  [<c011012c>] intel_init_thermal+0x55/0x178
[    0.000403]  [<c010f376>] ? mce_init+0xaa/0xbe
[    0.000409]  [<c010e102>] ? mce_cap_init+0x86/0x105
[    0.000414]  [<c010f929>] mce_intel_feature_init+0xb/0x4e
[    0.000419]  [<c010e073>] mce_cpu_features+0x16/0x1f
[    0.000426]  [<c04326f3>] mcheck_init+0x14b/0x187
[    0.000431]  [<c0430b6e>] identify_cpu+0x1fe/0x20d
[    0.000437]  [<c01be39d>] ? kmem_cache_alloc+0x7b/0xeb
[    0.000444]  [<c01828df>] ? __delayacct_tsk_init+0x15/0x28
[    0.000452]  [<c0631b0b>] identify_boot_cpu+0xd/0x23
[    0.000456]  [<c0631b5d>] check_bugs+0xb/0xdb
[    0.000461]  [<c0182934>] ? delayacct_init+0x42/0x46
[    0.000465]  [<c062b803>] start_kernel+0x2c4/0x2d3
[    0.000470]  [<c062b06a>] i386_start_kernel+0x6a/0x6f
[    0.000481] ---[ end trace 4eaa2a86a8e2da22 ]---
[    0.000484] CPU0: Thermal monitoring enabled (TM1)
[    0.000496] Performance Counters: 
[    0.000499] no APIC, boot with the "lapic" boot parameter to force-enable it.
[    0.000502] no hardware sampling interrupt available.
[    0.000505] p6 PMU driver.
[    0.000512] ... version:                 0
[    0.000515] ... bit width:               32
[    0.000517] ... generic counters:        2
[    0.000519] ... value mask:              00000000ffffffff
[    0.000522] ... max period:              000000007fffffff
[    0.000524] ... fixed-purpose counters:  0
[    0.000527] ... counter mask:            0000000000000003
[    0.000532] Checking 'hlt' instruction... OK.
Comment 19 Ozan Caglayan 2010-01-13 16:34:19 UTC
The following 2 commits from linux-2.6 fixes the issue on 2.6.31.11 on my system. I think they should at least be sent to stable@kernel.org for 2.6.32.y inclusion if they are the correct/complete fixes:

From 485a2e1973fd9f98c2c6776e66ac4721882b69e0 Mon Sep 17 00:00:00 2001
From: Cyrill Gorcunov <gorcunov@openvz.org>
Date: Mon, 14 Dec 2009 17:56:34 +0900
Subject: [PATCH] x86, mce: Thermal monitoring depends on APIC being enabled

From 70fe440718d9f42bf963c2cffe12008eb5556165 Mon Sep 17 00:00:00 2001
From: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Date: Mon, 14 Dec 2009 17:57:00 +0900
Subject: [PATCH] x86, mce: Clean up thermal init by introducing intel_thermal_supported()

Note You need to log in before you can comment on or make changes to this bug.