Bug 216730
Summary: | System hang with kernel NULL pointer dereference | ||
---|---|---|---|
Product: | File System | Reporter: | Mandraxx (mandraxx) |
Component: | NFS | Assignee: | Chuck Lever (cel) |
Status: | RESOLVED IMPLEMENTED | ||
Severity: | normal | CC: | jlayton, regressions |
Priority: | P1 | ||
Hardware: | AMD | ||
OS: | Linux | ||
Kernel Version: | 6.0.6 | Subsystem: | |
Regression: | No | Bisected commit-id: |
Description
Mandraxx
2022-11-22 19:34:25 UTC
Regarding differences between kernels 5.16.16 and 6.0.x, I'm wondering if this commit could change __list_lru_walk_one behavior and introduce the bug : https://github.com/torvalds/linux/commit/5abc1e37afa0335c52608d640fd30910b2eeda21 ? Mandraxx. This is not my area of expertise, but there where recently a few fixes in the NFS code. Might be wise to test if the issue occurs with the latest code; ideally test mainline. Yes, it was in my mind. I first encountered the bug with 6.0.6 and upgraded to 6.0.8 with same behavior (crash occurred after 6 days uptime). I just downgraded my server to 5.15.79 (LTS) to be sure that it becomes stable again. Can you try v6.1-rc ? There have been some recent fixes in that area. I suspect what is happening is the nfsd_file being examined in nfsd_file_lru_cb() is getting freed elsewhere, and the resulting reuse of that memory triggers a bad pointer dereference. Hi, Sorry, did not have much time since our last contact. So, first of all, I wish you an happy new year ;-) I just upgraded my config from 5.15.79 LTS (that was stable for 2 months now) with v6.1.4 : let see if it is stable again. Hi, The server is stable for 24 days now with Kernel v6.1.4. It usually was crashed after 14/15 days with 6.0.x. So, I think the issue is fixed. Thank you for your help. |