Created attachment 23713 [details] Full console log EDAC MC0: INTERNAL ERROR: channel-b out of range (4 >= 4) Kernel panic - not syncing: EDAC MC0: Uncorrected Error (XEN) Domain 0 crashed: 'noreboot' set - not rebooting. Kernel is 2.6.18-164.el5xen Server board is a Supermicro X7DBi+ (Intel 5000P chipset) with 16GB RAM.
The bug is probably in i5000_edac.c function i5000_process_nonfatal_error_info(), also in Kernel 2.6.31. The i5000X datasheet (page 211) says that FERR_NF_FBD bit 28 has no significance for M4Err through M12Err (=FERR_NF_UNCORRECTABLE bits), however the driver code uses both bits. The channel number is taken from bits 29:28 by this macro: EXTRACT_FBDCHAN_INDX(x) (((x)>>28) & 0x3) Both bits are stored and then incremented: ue_errors = allErrors & FERR_NF_UNCORRECTABLE; if (ue_errors) { debugf0("\tUncorrected bits= 0x%x\n", ue_errors); branch = EXTRACT_FBDCHAN_INDX(info->ferr_nf_fbd); channel = branch; [...] /* Call the helper to output message */ edac_mc_handle_fbd_ue(mci, rank, channel, channel + 1, msg); } If both of bits 29:28 are set then channel+1 in the last line above yields 4, which is out of range. Bit 28 should be masked, for example by replacing channel = branch; with: channel = branch & 2;
Thanks for the bug report and your investigation will look at it doug t
Doug, I've checked the specs and the fix seems right to me. Cheers, Mauro.
Created attachment 24118 [details] Fix i5000 panic The enclosed patch should fix the bug.