Bug 189071

Summary: Function toggle_ecc_err_reporting() returns an incorrect value when the call to zalloc_cpumask_var() fails
Product: Drivers Reporter: bianpan (bianpan2010)
Component: EDACAssignee: drivers_edac (drivers_edac)
Status: RESOLVED CODE_FIX    
Severity: normal    
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: linux-4.9-rc6 Subsystem:
Regression: No Bisected commit-id:
Attachments: The patch fixes the bug

Description bianpan 2016-11-25 11:21:25 UTC
The function toggle_ecc_err_reporting() is defined in file drivers/edac/amd64_edac.c. It calls zalloc_cpumask_var() (at line 2540) to allocate memory, and it returns "false" to its callers if the allocation fails. The real value of "false" is 0. We then review the source code of the callers of function toggle_ecc_err_reporting(). Take enable_ecc_error_reporting() for an example. It calls toggle_ecc_err_reporting(), and checks the return value. From the check behavior we can infer that toggle_ecc_err_reporting() is believed success if it returns 0, which may be inconsistent with the fact. I guess the statement "return false;" in function toggle_ecc_err_reporting() may be a typo, the author may want to return an error code (e.g. -ENOMEM). Codes related to this bug are summarised as follows.

toggle_ecc_err_reporting @@ drivers/edac/amd64_edac.c
2535 static int toggle_ecc_err_reporting(struct ecc_settings *s, u16 nid, bool on)
2536 {
2537     cpumask_var_t cmask;
2538     int cpu;
2539 
2540     if (!zalloc_cpumask_var(&cmask, GFP_KERNEL)) {
2541         amd64_warn("%s: error allocating mask\n", __func__);
             // Bug. "return -ENOMEM;" ?
2542         return false;
2543     }
         ...
2570     return 0;
2571 }

enable_ecc_error_reporting @@ drivers/edac/amd64_edac.c
2573 static bool enable_ecc_error_reporting(struct ecc_settings *s, u16 nid,
2574                        struct pci_dev *F3)
2575 {
2576     bool ret = true;
2577     u32 value, mask = 0x3;      /* UECC/CECC enable */
2578 
         // It believes 0 is success, while non-zero means failures
2579     if (toggle_ecc_err_reporting(s, nid, ON)) {
2580         amd64_warn("Error enabling ECC reporting over MCGCTL!\n");
2581         return false;
2582     }
         ...
2622     return ret;
2623 }

Thanks very much!
Comment 1 bianpan 2017-05-12 00:29:00 UTC
Created attachment 256455 [details]
The patch fixes the bug

The patch has been merged into the latest version of the Linux kernel. So I will close the bug.