Bug 15238 - Oops on startup: Kernel failure: EDAC amd64: WARNING: ECC is disabled by BIOS
Summary: Oops on startup: Kernel failure: EDAC amd64: WARNING: ECC is disabled by BIOS
Status: CLOSED CODE_FIX
Alias: None
Product: Memory Management
Classification: Unclassified
Component: Other (show other bugs)
Hardware: All Linux
: P1 normal
Assignee: doug thompson
URL:
Keywords:
: 15335 (view as bug list)
Depends on:
Blocks:
 
Reported: 2010-02-06 12:00 UTC by Thomas PIERSON
Modified: 2010-05-01 16:30 UTC (History)
3 users (show)

See Also:
Kernel Version: 2.6.32-5 amd64 (Debian package)
Subsystem:
Regression: No
Bisected commit-id:


Attachments
Kernel log (27.02 KB, text/plain)
2010-02-06 12:00 UTC, Thomas PIERSON
Details

Description Thomas PIERSON 2010-02-06 12:00:04 UTC
Created attachment 24927 [details]
Kernel log

Hi,

I just reinstall my distribution and change from x86 arch to amd64 arch.
I am using a Debian testing amd64 distribution.

On fist startup, when kde load is finished, a kernel oops is raised:
"
Kernel failure message 1: EDAC amd64: WARNING: ECC is disabled by BIOS.
Module will NOT be loaded. Either Enable ECC in the BIOS, or set
'ecc_enable_override'.
Also, use of the override can cause unknown side effects. amd64_edac: probe
of 0000:00:18.2 failed with error -22 k8temp 0000:00:18.3: Temperature
readouts might be wrong - check erratum #141
input: PC Speaker as /devices/platform/pcspkr/input/input5 parport_pc 00:0a:
reported by Plug and Play ACPI parport0: PC-style at 0x378, irq 7
[PCSPP,TRISTATE,EPP] i2c i2c-0: nForce2 SMBus adapter at 0x1c00 ACPI: I/O
resource nForce2_smbus [0x1c40-0x1c7f] conflicts with ACPI region SM00
[0x1c40-0x1c45]
ACPI: If an ACPI driver is available for this device, you should use it
instead of the native driver nForce2_smbus 0000:00:01.1: Error probing SMB2.
ACPI: PCI Interrupt Link [AAZA] enabled at IRQ 20 HDA Intel 0000:00:06.1:
PCI INT B -> Link[AAZA] -> GSI 20 (level, low) -> IRQ 20 HDA Intel
0000:00:06.1: setting latency timer to 64
input: HDA Digital PCBeep as /devices/pci0000:00/0000:00:06.1/input/input6
Adding 2650684k swap on /dev/sda5. Priority:-1 extents:1 across:2650684k
loop: module loaded EXT4-fs (sda3): mounted filesystem with ordered data
mode alloc irq_desc for 30 on node 0 alloc kstat_irqs on node 0 forcedeth
0000:00:08.0: irq 30 for MSI/MSI-X fuse init (API version 7.13)
powernow-k8: Found 1 AMD Athlon(tm) 64 X2 Dual Core Processor 5600+
processors (2 cpu cores) (version 2.20.00)
[Firmware Bug]: powernow-k8: No compatible ACPI _PSS objects found.
[Firmware Bug]: powernow-k8: Try again with latest BIOS.
"

I found the same problem on ubuntu bug tracker:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/422536 and it is seem
to be fixed.
I found this maybe explain my problem : http://patchwork.kernel.org/patch/36833/
I found this maybe explain my problem : http://www.pubbs.net/kernel/200909/98769/

I did not notice any visible consequences on the system yet.
Is there a risk of data loss or others serious consequences? (I hesitate to
reconnect and set up a raid1 soft on this system.)

Best regards,

Thomas PIERSON
Comment 1 Andrew Morton 2010-02-08 21:42:40 UTC
Doug, could you please take a look at this one?
Comment 2 doug thompson 2010-02-10 17:46:45 UTC
This comes from the detection of the BIOS not enabling ECC operation during POST

To enable ECC, the BIOS must turn on ECC, then write to all memory locations in order to properly set the ECC bits in memory

the AMD EDAC modules detects that ECC is not on and notifies the admin of this fact and then FAILS the module load. It does notify the admin that a FORCE LOAD overide is available to force the module to load and check ECC status

BUT that allows for the possibility of false positives being harvested which could cause a panic on a double memory error.

You indicate you got an OOPS.  What was the OOPS message?  Did it occur with the override ON with the EDAC module forced loaded? If so, that is an unsafe condition really.

Ok I read the ubuntu link where "WARNING" looks like a WARN

I suggest a mod be made to use a "NOTICE" label instead

I will forward to "Borislav Petkov" <borislav.petkov@amd.com> a link to this

doug t
Comment 3 Thomas PIERSON 2010-02-10 19:09:55 UTC
OK thanks for your help.

Maybe it is not an oops message but a popup window is triggered at each startup. On this window the above message is displayed. I thought it is a kernel exception. Sorry if it is wrong.

I did not try to do a "force load module override".

So this is just a notice and not a serious problem?

Best regards,

Thomas Pierson
Comment 4 doug thompson 2010-02-10 19:53:29 UTC
The not loading of the module is not a serious problem if your Mobo has ECC disabled, it is just a warning.

the problem is the kerneloops system that maps WARNING to WARN() and "thinks"" it is an OOPs

I have suggested the word WARNING be refactored in the module. The link to the ubuntu listed MANY bug reports on this - not a good thing really. 

It is just a notice of the AMD EDAC module not being loaded

doug t
Comment 5 Thomas PIERSON 2010-02-10 22:04:43 UTC
Ok, I understand now! Thanks a lot for all these explanations.

Best regards,

Thomas PIERSON
Comment 6 Borislav Petkov 2010-02-12 06:43:21 UTC
Hi,

a fix for this just went upstream: 

http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=cab4d27764d5a8654212b3e96eb0ae793aec5b94

Closing...
Comment 7 Cristian Aravena Romero 2010-05-01 16:30:34 UTC
*** Bug 15335 has been marked as a duplicate of this bug. ***

Note You need to log in before you can comment on or make changes to this bug.