Bug 219360
Summary: | IBECC interrupt not working, with segmentation fault during "rmmod igen6_edac" | ||
---|---|---|---|
Product: | Drivers | Reporter: | Orange Kao (orange) |
Component: | EDAC | Assignee: | drivers_edac (drivers_edac) |
Status: | NEW --- | ||
Severity: | normal | ||
Priority: | P3 | ||
Hardware: | Intel | ||
OS: | Linux | ||
Kernel Version: | Subsystem: | ||
Regression: | No | Bisected commit-id: |
Description
Orange Kao
2024-10-08 07:47:01 UTC
Hello. I dig into the code and created the following patch (based on 6.11.5 05b1367d372aca98a4e09c1a0e7ff0b9d721b2bc), and it seems to resolve the segmentation fault and to support polling. The segmentation fault happens because During modprobe: 1. In igen6_probe(), igen6_pvt will be allocated with kzalloc() 2. In igen6_register_mci(), mci->pvt_info will point to &igen6_pvt->imc[mc] During rmmod: 1. In mci_release() in edac_mc.c, it will kfree(mci->pvt_info) 2. In igen6_remove(), it will kfree(igen6_pvt); And that caused double kfree on the same memory address. My proposal is to set mci->pvt_info to NULL to avoid double-kfree. The other part of the patch is to support polling. I have test it on my machine (N100, with PCI device 8086:461c, DID_ADL_N_SKU4) and it seems to work as expected. I also tried to "modprobe igen6_edac edac_op_state=0" and "rmmod" repeatedly and did not observe any issue. Thank you. diff --git a/drivers/edac/igen6_edac.c b/drivers/edac/igen6_edac.c index 189a2fc29e74..5027070410a5 100644 --- a/drivers/edac/igen6_edac.c +++ b/drivers/edac/igen6_edac.c @@ -1170,6 +1170,19 @@ static int igen6_pci_setup(struct pci_dev *pdev, u64 *mchbar) return -ENODEV; } +static void igen6_check(struct mem_ctl_info *mci) +{ + struct igen6_imc *imc = mci->pvt_info; + + /* errsts_clear() isn't NMI-safe. Delay it in the IRQ context */ + u64 ecclog = ecclog_read_and_clear(imc); + if (!ecclog) + return; + if (!ecclog_gen_pool_add(imc->mc, ecclog)) + irq_work_queue(&ecclog_irq_work); + +} + static int igen6_register_mci(int mc, u64 mchbar, struct pci_dev *pdev) { struct edac_mc_layer layers[2]; @@ -1211,6 +1224,9 @@ static int igen6_register_mci(int mc, u64 mchbar, struct pci_dev *pdev) mci->edac_cap = EDAC_FLAG_SECDED; mci->mod_name = EDAC_MOD_STR; mci->dev_name = pci_name(pdev); + if (edac_op_state == EDAC_OPSTATE_POLL) { + mci->edac_check = igen6_check; + } mci->pvt_info = &igen6_pvt->imc[mc]; imc = mci->pvt_info; @@ -1245,6 +1261,7 @@ static int igen6_register_mci(int mc, u64 mchbar, struct pci_dev *pdev) imc->mci = mci; return 0; fail3: + mci->pvt_info = NULL; kfree(mci->ctl_name); fail2: edac_mc_free(mci); @@ -1269,6 +1286,7 @@ static void igen6_unregister_mcis(void) edac_mc_del_mc(mci->pdev); kfree(mci->ctl_name); + mci->pvt_info = NULL; edac_mc_free(mci); iounmap(imc->window); } @@ -1448,7 +1466,9 @@ static int __init igen6_init(void) if (owner && strncmp(owner, EDAC_MOD_STR, sizeof(EDAC_MOD_STR))) return -EBUSY; - edac_op_state = EDAC_OPSTATE_NMI; + if (edac_op_state == EDAC_OPSTATE_INVAL) { + edac_op_state = EDAC_OPSTATE_NMI; + } rc = pci_register_driver(&igen6_driver); if (rc) @@ -1472,3 +1492,6 @@ module_exit(igen6_exit); MODULE_LICENSE("GPL v2"); MODULE_AUTHOR("Qiuxu Zhuo"); MODULE_DESCRIPTION("MC Driver for Intel client SoC using In-Band ECC"); + +module_param(edac_op_state, int, 0444); +MODULE_PARM_DESC(edac_op_state, "EDAC Error Reporting state: 0=Poll,1=NMI. Default=1"); |