Bug 189071 - Function toggle_ecc_err_reporting() returns an incorrect value when the call to zalloc_cpumask_var() fails
Summary: Function toggle_ecc_err_reporting() returns an incorrect value when the call ...
Status: RESOLVED CODE_FIX
Alias: None
Product: Drivers
Classification: Unclassified
Component: EDAC (show other bugs)
Hardware: All Linux
: P1 normal
Assignee: drivers_edac@kernel-bugs.osdl.org
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2016-11-25 11:21 UTC by bianpan
Modified: 2017-05-12 00:29 UTC (History)
0 users

See Also:
Kernel Version: linux-4.9-rc6
Subsystem:
Regression: No
Bisected commit-id:


Attachments
The patch fixes the bug (1.02 KB, patch)
2017-05-12 00:29 UTC, bianpan
Details | Diff

Description bianpan 2016-11-25 11:21:25 UTC
The function toggle_ecc_err_reporting() is defined in file drivers/edac/amd64_edac.c. It calls zalloc_cpumask_var() (at line 2540) to allocate memory, and it returns "false" to its callers if the allocation fails. The real value of "false" is 0. We then review the source code of the callers of function toggle_ecc_err_reporting(). Take enable_ecc_error_reporting() for an example. It calls toggle_ecc_err_reporting(), and checks the return value. From the check behavior we can infer that toggle_ecc_err_reporting() is believed success if it returns 0, which may be inconsistent with the fact. I guess the statement "return false;" in function toggle_ecc_err_reporting() may be a typo, the author may want to return an error code (e.g. -ENOMEM). Codes related to this bug are summarised as follows.

toggle_ecc_err_reporting @@ drivers/edac/amd64_edac.c
2535 static int toggle_ecc_err_reporting(struct ecc_settings *s, u16 nid, bool on)
2536 {
2537     cpumask_var_t cmask;
2538     int cpu;
2539 
2540     if (!zalloc_cpumask_var(&cmask, GFP_KERNEL)) {
2541         amd64_warn("%s: error allocating mask\n", __func__);
             // Bug. "return -ENOMEM;" ?
2542         return false;
2543     }
         ...
2570     return 0;
2571 }

enable_ecc_error_reporting @@ drivers/edac/amd64_edac.c
2573 static bool enable_ecc_error_reporting(struct ecc_settings *s, u16 nid,
2574                        struct pci_dev *F3)
2575 {
2576     bool ret = true;
2577     u32 value, mask = 0x3;      /* UECC/CECC enable */
2578 
         // It believes 0 is success, while non-zero means failures
2579     if (toggle_ecc_err_reporting(s, nid, ON)) {
2580         amd64_warn("Error enabling ECC reporting over MCGCTL!\n");
2581         return false;
2582     }
         ...
2622     return ret;
2623 }

Thanks very much!
Comment 1 bianpan 2017-05-12 00:29:00 UTC
Created attachment 256455 [details]
The patch fixes the bug

The patch has been merged into the latest version of the Linux kernel. So I will close the bug.

Note You need to log in before you can comment on or make changes to this bug.