Bug 14568
Summary: | Kernel panic: EDAC MC0: INTERNAL ERROR: channel-b out of range | ||
---|---|---|---|
Product: | Drivers | Reporter: | Tamas Vincze (tom) |
Component: | EDAC | Assignee: | drivers_edac (drivers_edac) |
Status: | CLOSED CODE_FIX | ||
Severity: | normal | CC: | alan, mchehab, norsk5 |
Priority: | P1 | ||
Hardware: | All | ||
OS: | Linux | ||
Kernel Version: | 2.6.18 | Subsystem: | |
Regression: | No | Bisected commit-id: | |
Attachments: |
Full console log
Fix i5000 panic |
The bug is probably in i5000_edac.c function i5000_process_nonfatal_error_info(), also in Kernel 2.6.31. The i5000X datasheet (page 211) says that FERR_NF_FBD bit 28 has no significance for M4Err through M12Err (=FERR_NF_UNCORRECTABLE bits), however the driver code uses both bits. The channel number is taken from bits 29:28 by this macro: EXTRACT_FBDCHAN_INDX(x) (((x)>>28) & 0x3) Both bits are stored and then incremented: ue_errors = allErrors & FERR_NF_UNCORRECTABLE; if (ue_errors) { debugf0("\tUncorrected bits= 0x%x\n", ue_errors); branch = EXTRACT_FBDCHAN_INDX(info->ferr_nf_fbd); channel = branch; [...] /* Call the helper to output message */ edac_mc_handle_fbd_ue(mci, rank, channel, channel + 1, msg); } If both of bits 29:28 are set then channel+1 in the last line above yields 4, which is out of range. Bit 28 should be masked, for example by replacing channel = branch; with: channel = branch & 2; Thanks for the bug report and your investigation will look at it doug t Doug, I've checked the specs and the fix seems right to me. Cheers, Mauro. Created attachment 24118 [details]
Fix i5000 panic
The enclosed patch should fix the bug.
|
Created attachment 23713 [details] Full console log EDAC MC0: INTERNAL ERROR: channel-b out of range (4 >= 4) Kernel panic - not syncing: EDAC MC0: Uncorrected Error (XEN) Domain 0 crashed: 'noreboot' set - not rebooting. Kernel is 2.6.18-164.el5xen Server board is a Supermicro X7DBi+ (Intel 5000P chipset) with 16GB RAM.