Bug 14568 - Kernel panic: EDAC MC0: INTERNAL ERROR: channel-b out of range
Summary: Kernel panic: EDAC MC0: INTERNAL ERROR: channel-b out of range
Status: CLOSED CODE_FIX
Alias: None
Product: Drivers
Classification: Unclassified
Component: EDAC (show other bugs)
Hardware: All Linux
: P1 normal
Assignee: drivers_edac@kernel-bugs.osdl.org
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2009-11-09 15:23 UTC by Tamas Vincze
Modified: 2012-06-14 16:51 UTC (History)
3 users (show)

See Also:
Kernel Version: 2.6.18
Subsystem:
Regression: No
Bisected commit-id:


Attachments
Full console log (36.05 KB, text/plain)
2009-11-09 15:23 UTC, Tamas Vincze
Details
Fix i5000 panic (1.18 KB, patch)
2009-12-09 11:28 UTC, Mauro Carvalho Chehab
Details | Diff

Description Tamas Vincze 2009-11-09 15:23:39 UTC
Created attachment 23713 [details]
Full console log

EDAC MC0: INTERNAL ERROR: channel-b out of range (4 >= 4)
Kernel panic - not syncing: EDAC MC0: Uncorrected Error
 (XEN) Domain 0 crashed: 'noreboot' set - not rebooting.

Kernel is 2.6.18-164.el5xen
Server board is a Supermicro X7DBi+ (Intel 5000P chipset) with 16GB RAM.
Comment 1 Tamas Vincze 2009-11-09 15:38:10 UTC
The bug is probably in i5000_edac.c function i5000_process_nonfatal_error_info(),
also in Kernel 2.6.31.

The i5000X datasheet (page 211) says that FERR_NF_FBD bit 28 has no significance for M4Err through M12Err (=FERR_NF_UNCORRECTABLE bits), however the driver code uses both bits.

The channel number is taken from bits 29:28 by this macro:
EXTRACT_FBDCHAN_INDX(x) (((x)>>28) & 0x3)

Both bits are stored and then incremented:

ue_errors = allErrors & FERR_NF_UNCORRECTABLE;
if (ue_errors) {
    debugf0("\tUncorrected bits= 0x%x\n", ue_errors);
    branch = EXTRACT_FBDCHAN_INDX(info->ferr_nf_fbd);
    channel = branch;
    [...]
    /* Call the helper to output message */
    edac_mc_handle_fbd_ue(mci, rank, channel, channel + 1, msg);
}

If both of bits 29:28 are set then channel+1 in the last line above yields 4,
which is out of range.

Bit 28 should be masked, for example by replacing
    channel = branch;
with:
    channel = branch & 2;
Comment 2 doug thompson 2009-11-09 19:38:40 UTC
Thanks for the bug report and your investigation

will look at it

doug t
Comment 3 Mauro Carvalho Chehab 2009-12-09 11:09:24 UTC
Doug,

I've checked the specs and the fix seems right to me.

Cheers,
Mauro.
Comment 4 Mauro Carvalho Chehab 2009-12-09 11:28:25 UTC
Created attachment 24118 [details]
Fix i5000 panic

The enclosed patch should fix the bug.

Note You need to log in before you can comment on or make changes to this bug.