Bug 215696
Summary: | Kernel Oops since kernel-5.17 on dual socket Intel Xeon Gold servers - kernel NULL pointer dereference | ||
---|---|---|---|
Product: | Other | Reporter: | Jirka Hladky (hladky.jiri) |
Component: | Other | Assignee: | other_other |
Status: | RESOLVED CODE_FIX | ||
Severity: | normal | CC: | regressions |
Priority: | P1 | ||
Hardware: | Intel | ||
OS: | Linux | ||
Kernel Version: | Subsystem: | ||
Regression: | Yes | Bisected commit-id: |
Description
Jirka Hladky
2022-03-17 11:20:48 UTC
You are likely better of reporting the issue by mail, as explained in https://www.kernel.org/doc/html/latest/admin-guide/reporting-issues.html – I assume you don't reach the right people here and guess nobody might see bug reported against this particular component. Also: some developer don't care about distro kernels, even if they are patched only lightly. So you will increase your chances by reproducing this with a vanilla kernel. Would be great if you tried to bisect it. Thorsten, thanks a lot for the hint! I have started a mailing thread here: https://lore.kernel.org/lkml/CAE4VaGDZr_4wzRn2___eDYRtmdPaGGJdzu_LCSkJYuY9BEO3cw@mail.gmail.com/ (In reply to Artem S. Tashkinov from comment #2) > Would be great if you tried to bisect it. I will try that later this week when I get access to the server. I have found the commit causing the trouble [1]. Any ideas how to fix it? $ git bisect visualize commit 393c3714081a53795bbff0e985d24146def6f57f (refs/bisect/bad) Author: Minchan Kim <minchan@kernel.org> Date: Thu Nov 18 15:00:08 2021 -0800 kernfs: switch global kernfs_rwsem lock to per-fs lock The kernfs implementation has big lock granularity(kernfs_rwsem) so every kernfs-based(e.g., sysfs, cgroup) fs are able to compete the lock. It makes trouble for some cases to wait the global lock for a long time even though they are totally independent contexts each other. A general example is process A goes under direct reclaim with holding the lock when it accessed the file in sysfs and process B is waiting the lock with exclusive mode and then process C is waiting the lock until process B could finish the job after it gets the lock from process A. This patch switches the global kernfs_rwsem to per-fs lock, which put the rwsem into kernfs_root. Suggested-by: Tejun Heo <tj@kernel.org> Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Minchan Kim <minchan@kernel.org> Link: https://lore.kernel.org/r/20211118230008.2679780-1-minchan@kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> The issue is fixed by this patch: https://lore.kernel.org/all/YmLznjFdpblHzZiM@google.com/ Fixes: 393c3714081a (kernfs: switch global kernfs_rwsem lock to per-fs lock) Reported-by: Jirka Hladky <jhladky@redhat.com> Signed-off-by: Minchan Kim <minchan@kernel.org> --- fs/kernfs/dir.c | 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-) diff --git a/fs/kernfs/dir.c b/fs/kernfs/dir.c index 61a8edc4ba8b..e205fde7163a 100644 --- a/fs/kernfs/dir.c +++ b/fs/kernfs/dir.c @@ -1406,7 +1406,12 @@ static void __kernfs_remove(struct kernfs_node *kn) */ void kernfs_remove(struct kernfs_node *kn) { - struct kernfs_root *root = kernfs_root(kn); + struct kernfs_root *root; + + if (!kn) + return; + + root = kernfs_root(kn); down_write(&root->kernfs_rwsem); __kernfs_remove(kn); -- I'm going to close this BZ. |