The function toggle_ecc_err_reporting() is defined in file drivers/edac/amd64_edac.c. It calls zalloc_cpumask_var() (at line 2540) to allocate memory, and it returns "false" to its callers if the allocation fails. The real value of "false" is 0. We then review the source code of the callers of function toggle_ecc_err_reporting(). Take enable_ecc_error_reporting() for an example. It calls toggle_ecc_err_reporting(), and checks the return value. From the check behavior we can infer that toggle_ecc_err_reporting() is believed success if it returns 0, which may be inconsistent with the fact. I guess the statement "return false;" in function toggle_ecc_err_reporting() may be a typo, the author may want to return an error code (e.g. -ENOMEM). Codes related to this bug are summarised as follows. toggle_ecc_err_reporting @@ drivers/edac/amd64_edac.c 2535 static int toggle_ecc_err_reporting(struct ecc_settings *s, u16 nid, bool on) 2536 { 2537 cpumask_var_t cmask; 2538 int cpu; 2539 2540 if (!zalloc_cpumask_var(&cmask, GFP_KERNEL)) { 2541 amd64_warn("%s: error allocating mask\n", __func__); // Bug. "return -ENOMEM;" ? 2542 return false; 2543 } ... 2570 return 0; 2571 } enable_ecc_error_reporting @@ drivers/edac/amd64_edac.c 2573 static bool enable_ecc_error_reporting(struct ecc_settings *s, u16 nid, 2574 struct pci_dev *F3) 2575 { 2576 bool ret = true; 2577 u32 value, mask = 0x3; /* UECC/CECC enable */ 2578 // It believes 0 is success, while non-zero means failures 2579 if (toggle_ecc_err_reporting(s, nid, ON)) { 2580 amd64_warn("Error enabling ECC reporting over MCGCTL!\n"); 2581 return false; 2582 } ... 2622 return ret; 2623 } Thanks very much!
Created attachment 256455 [details] The patch fixes the bug The patch has been merged into the latest version of the Linux kernel. So I will close the bug.