Bug 14568

Summary: Kernel panic: EDAC MC0: INTERNAL ERROR: channel-b out of range
Product: Drivers Reporter: Tamas Vincze (tom)
Component: EDACAssignee: drivers_edac (drivers_edac)
Status: CLOSED CODE_FIX    
Severity: normal CC: alan, mchehab, norsk5
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: 2.6.18 Subsystem:
Regression: No Bisected commit-id:
Attachments: Full console log
Fix i5000 panic

Description Tamas Vincze 2009-11-09 15:23:39 UTC
Created attachment 23713 [details]
Full console log

EDAC MC0: INTERNAL ERROR: channel-b out of range (4 >= 4)
Kernel panic - not syncing: EDAC MC0: Uncorrected Error
 (XEN) Domain 0 crashed: 'noreboot' set - not rebooting.

Kernel is 2.6.18-164.el5xen
Server board is a Supermicro X7DBi+ (Intel 5000P chipset) with 16GB RAM.
Comment 1 Tamas Vincze 2009-11-09 15:38:10 UTC
The bug is probably in i5000_edac.c function i5000_process_nonfatal_error_info(),
also in Kernel 2.6.31.

The i5000X datasheet (page 211) says that FERR_NF_FBD bit 28 has no significance for M4Err through M12Err (=FERR_NF_UNCORRECTABLE bits), however the driver code uses both bits.

The channel number is taken from bits 29:28 by this macro:
EXTRACT_FBDCHAN_INDX(x) (((x)>>28) & 0x3)

Both bits are stored and then incremented:

ue_errors = allErrors & FERR_NF_UNCORRECTABLE;
if (ue_errors) {
    debugf0("\tUncorrected bits= 0x%x\n", ue_errors);
    branch = EXTRACT_FBDCHAN_INDX(info->ferr_nf_fbd);
    channel = branch;
    [...]
    /* Call the helper to output message */
    edac_mc_handle_fbd_ue(mci, rank, channel, channel + 1, msg);
}

If both of bits 29:28 are set then channel+1 in the last line above yields 4,
which is out of range.

Bit 28 should be masked, for example by replacing
    channel = branch;
with:
    channel = branch & 2;
Comment 2 doug thompson 2009-11-09 19:38:40 UTC
Thanks for the bug report and your investigation

will look at it

doug t
Comment 3 Mauro Carvalho Chehab 2009-12-09 11:09:24 UTC
Doug,

I've checked the specs and the fix seems right to me.

Cheers,
Mauro.
Comment 4 Mauro Carvalho Chehab 2009-12-09 11:28:25 UTC
Created attachment 24118 [details]
Fix i5000 panic

The enclosed patch should fix the bug.