Bug 199593 - "BUG: unable to handle kernel NULL pointer dereference at 0000000000000038 in ida_remove()" due to race between remove_proc_subtree() and proc_readdir_de()
Summary: "BUG: unable to handle kernel NULL pointer dereference at 0000000000000038 in...
Status: NEW
Alias: None
Product: File System
Classification: Unclassified
Component: Other (show other bugs)
Hardware: All Linux
: P1 normal
Assignee: fs_other
URL: https://jira.hpdd.intel.com/browse/LU...
Keywords:
Depends on:
Blocks:
 
Reported: 2018-05-02 12:37 UTC by Bruno Faccini
Modified: 2023-06-20 14:32 UTC (History)
1 user (show)

See Also:
Kernel Version: 3.x, 4.x
Subsystem:
Regression: No
Bisected commit-id:


Attachments
Reproducer's module source code (884 bytes, text/x-csrc)
2018-05-02 12:37 UTC, Bruno Faccini
Details
Patch (1.95 KB, text/plain)
2018-05-02 12:42 UTC, Bruno Faccini
Details
New patch version for v4.18.0 Kernel (663 bytes, patch)
2023-06-19 08:45 UTC, Bruno Faccini
Details | Diff

Description Bruno Faccini 2018-05-02 12:37:30 UTC
Created attachment 275729 [details]
Reproducer's module source code

Problem has occurred several times on a Lustre Client when filesystem is being unmounted and during the same time some user-land thread is still trying to access some of Lustre tunables in /proc.
 
The Panic stack looks like following :
=======================================
[ 7093.049261] BUG: unable to handle kernel NULL pointer dereference at 0000000000000038
[ 7093.050110] IP: [<ffffffff813261e1>] ida_remove+0x91/0xe0
[ 7093.050655] PGD 8000000064f1b067 PUD 5f9a7067 PMD 0 
[ 7093.051182] Oops: 0000 [#1] SMP 
[ 7093.051534] Modules linked in: lustre(OE) obdecho(OE) mgc(OE) lov(OE) mdc(OE) osc(OE) lmv(OE) fid(OE) fld(OE) ptlrpc_gss(OE) ptlrpc(OE) obdclass(OE) ksocklnd(OE) lnet(OE) libcfs(OE) rpcsec_gss_krb5 nfsv4 dns_resolver nfs fscache rpcrdma ib_isert iscsi_target_mod ib_iser libiscsi scsi_transport_iscsi ib_srpt target_core_mod crc_t10dif crct10dif_generic ib_srp scsi_transport_srp scsi_tgt ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm ib_core iosf_mbi crc32_pclmul ghash_clmulni_intel nfsd ppdev aesni_intel lrw gf128mul glue_helper ablk_helper cryptd i2c_piix4 i2c_core pcspkr joydev virtio_balloon nfs_acl lockd parport_pc parport grace auth_rpcgss sunrpc ip_tables ata_generic pata_acpi ext4 mbcache jbd2 ata_piix libata virtio_blk 8139too crct10dif_pclmul crct10dif_common crc32c_intel virtio_pci virtio_ring serio_raw virtio 8139cp floppy mii
[ 7093.060088] CPU: 0 PID: 23436 Comm: lctl Tainted: G           OE  ------------   3.10.0-693.17.1.el7.x86_64 #1
[ 7093.061013] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
[ 7093.061555] task: ffff880062b33f40 ti: ffff8800622f8000 task.ti: ffff8800622f8000
[ 7093.062250] RIP: 0010:[<ffffffff813261e1>]  [<ffffffff813261e1>] ida_remove+0x91/0xe0
[ 7093.063009] RSP: 0018:ffff8800622fbe20  EFLAGS: 00010086
[ 7093.063511] RAX: 0000000000000180 RBX: 0000000000000000 RCX: 00000000000000f4
[ 7093.064172] RDX: 00000000001ec6f4 RSI: 00000000736a1480 RDI: ffff88007c0a0008
[ 7093.064838] RBP: ffff8800622fbe30 R08: ffff88007c0a0000 R09: 00000000000000f4
[ 7093.065506] R10: 00000000636a1001 R11: ffffea00018da840 R12: ffffffff81fd15e0
[ 7093.066164] R13: ffff88007ab27600 R14: ffff8800636a1000 R15: ffffffff81217790
[ 7093.066850] FS:  00007f4e87410740(0000) GS:ffff88007fc00000(0000) knlGS:0000000000000000
[ 7093.067600] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 7093.068132] CR2: 0000000000000038 CR3: 000000006091e000 CR4: 00000000000606f0
[ 7093.068797] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 7093.069461] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[ 7093.070114] Call Trace:
[ 7093.070355]  [<ffffffff8127880f>] proc_free_inum+0x2f/0x50
[ 7093.070879]  [<ffffffff81278c97>] pde_put+0x27/0x50
[ 7093.071333]  [<ffffffff81278e1c>] proc_readdir_de+0x15c/0x220
[ 7093.071887]  [<ffffffff81217790>] ? fillonedir+0xe0/0xe0
[ 7093.072382]  [<ffffffff81278f02>] proc_readdir+0x22/0x30
[ 7093.072884]  [<ffffffff81217680>] vfs_readdir+0xb0/0xe0
[ 7093.073379]  [<ffffffff81217aa5>] SyS_getdents+0x95/0x120
[ 7093.073894]  [<ffffffff816b89fd>] system_call_fastpath+0x16/0x1b
[ 7093.074459]  [<ffffffff816b889d>] ? system_call_after_swapgs+0xca/0x214
[ 7093.075071] Code: 45 0f b3 48 08 4f 8b 44 c8 28 83 e9 08 4d 85 c0 75 e0 4d 85 c0 74 4b 0f b6 ca 49 8d 78 08 41 0f b3 48 08 4c 63 c9 4b 8b 5c c8 28 <0f> a3 43 08 45 19 c0 45 85 c0 74 2b 0f b3 43 08 48 83 2b 01 75 
[ 7093.078106] RIP  [<ffffffff813261e1>] ida_remove+0x91/0xe0
[ 7093.078655]  RSP <ffff8800622fbe20>
[ 7093.078987] CR2: 0000000000000038
=======================================

When umount thread can be found working removing /proc entries in same directory with the following stack in crash-dumps:
=======================================
PID: 23433  TASK: ffff88000016eeb0  CPU: 1   COMMAND: "umount"
 #0 [ffff88007fd08e48] crash_nmi_callback at ffffffff8104fd11
 #1 [ffff88007fd08e58] nmi_handle at ffffffff816b0c57
 #2 [ffff88007fd08eb0] do_nmi at ffffffff816b0e8d
 #3 [ffff88007fd08ef0] end_repeat_nmi at ffffffff816b00b9
    [exception RIP: __pv_queued_spin_lock_slowpath+218]
    RIP: ffffffff810fbf0a  RSP: ffff8800608a7978  RFLAGS: 00000206
    RAX: 0000000000000005  RBX: ffffffff81fd15c0  RCX: 0000000000007b84
    RDX: 0000000000000005  RSI: 0000000000000005  RDI: 0000000000000246
    RBP: ffff8800608a79a8   R8: 0000000000000000   R9: 0000000000000040
    R10: 0000000000000000  R11: 0000000000000001  R12: ffff88007fd195c0
    R13: 0000000000000001  R14: 0000000000090000  R15: 0000000000000000
    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
--- <NMI exception stack> ---
 #4 [ffff8800608a7978] __pv_queued_spin_lock_slowpath at ffffffff810fbf0a
 #5 [ffff8800608a79b0] queued_spin_lock_slowpath at ffffffff816a08df
 #6 [ffff8800608a79c0] _raw_spin_lock at ffffffff816adff0
 #7 [ffff8800608a79d0] remove_proc_subtree at ffffffff812791bc
 #8 [ffff8800608a7a20] proc_remove at ffffffff8127922e
 #9 [ffff8800608a7a30] lprocfs_obd_cleanup at ffffffffc0724b33 [obdclass]
#10 [ffff8800608a7a48] ptlrpc_lprocfs_unregister_obd at ffffffffc094d462 [ptlrpc]
#11 [ffff8800608a7a60] osc_precleanup at ffffffffc0ae25df [osc]
#12 [ffff8800608a7a78] class_cleanup at ffffffffc07388b9 [obdclass]
#13 [ffff8800608a7af0] class_process_config at ffffffffc0739edc [obdclass]
#14 [ffff8800608a7bb0] class_manual_cleanup at ffffffffc073c1b6 [obdclass]
#15 [ffff8800608a7c58] lov_putref at ffffffffc0b557f2 [lov]
#16 [ffff8800608a7cb8] lov_disconnect at ffffffffc0b5d062 [lov]
#17 [ffff8800608a7ce0] obd_disconnect at ffffffffc0bd9ecb [lustre]
#18 [ffff8800608a7d00] ll_put_super at ffffffffc0bdd361 [lustre]
#19 [ffff8800608a7e38] generic_shutdown_super at ffffffff812056d2
#20 [ffff8800608a7e60] kill_anon_super at ffffffff81205aa2
#21 [ffff8800608a7e78] lustre_kill_super at ffffffffc073ea75 [obdclass]
#22 [ffff8800608a7e90] deactivate_locked_super at ffffffff81205e59
#23 [ffff8800608a7eb0] deactivate_super at ffffffff812065c6
#24 [ffff8800608a7ec8] cleanup_mnt at ffffffff8122395f
#25 [ffff8800608a7ee0] __cleanup_mnt at ffffffff812239f2
#26 [ffff8800608a7ef0] task_work_run at ffffffff810aefe5
#27 [ffff8800608a7f30] do_notify_resume at ffffffff8102ab52
#28 [ffff8800608a7f50] int_signal at ffffffff816b8d37
    RIP: 00007f3942ac33e7  RSP: 00007fff3665cdd8  RFLAGS: 00000246
    RAX: 0000000000000000  RBX: 000055ab33267040  RCX: ffffffffffffffff
    RDX: 0000000000000001  RSI: 0000000000000000  RDI: 000055ab332682e0
    RBP: 000055ab332682e0   R8: 0000000000000008   R9: 000000000000000c
    R10: 00007fff3665c860  R11: 0000000000000246  R12: 00007f394363ad78
    R13: 0000000000000000  R14: 000055ab33267210  R15: 000055ab33267040
    ORIG_RAX: 00000000000000a6  CS: 0033  SS: 002b
=======================================

The full story, with my complete crash-dumps analysis, can be found at https://jira.hpdd.intel.com/browse/LU-10952, but in a few words, this crash can occur during a race between proc_readdir_de() routine's loop (to gather the available proc tunables within the same directory) and remove_proc_subtree() loop (to destroy the same set of proc tunable entries), and seems to be due to the current sequence of spin_[un]lock()s in both remove_proc_subtree() and proc_readdir_de() loops that may lead to the latter to sometime work-on/use a removed/freed pde.

I have created a reproducer, with the attached module source code in bfs.c file, which can easily trigger the same crash when running the following 2 scripts concurrently :
=======================================
# while true ; do
> insmod bfi.ko
> rmmod bfi
> done &
---------------------------
# while true ; do
> ls -la /proc/bfi/
> done &
=======================================

Based on my analysis and to verify it, I have created the following patch :
===========================================================================
This patch fixes a bug in PDEs processing when a race occurs
between a thread presently browsing entries in a /proc
subdir using proc_readdir_de() and an other thread removing
the whole subdir using remove_proc_subtree().

Signed-off-by: Bruno Faccini <bruno.faccini@intel.com>

--- orig/fs/proc/generic.c      2018-01-14 15:02:54.000000000 +0000
+++ bfi/fs/proc/generic.c       2018-04-26 13:53:25.139261920 +0000
@@ -295,8 +295,9 @@ int proc_readdir_de(struct proc_dir_entr
                filldir_t filldir)
 {
        unsigned int ino;
-       int i;
+       int i, j;
        struct inode *inode = file_inode(filp);
+       struct proc_dir_entry *pde[4];
        int ret = 0;
 
        ino = inode->i_ino;
@@ -317,6 +318,7 @@ int proc_readdir_de(struct proc_dir_entr
                        filp->f_pos++;
                        /* fall through */
                default:
+loop4more:
                        spin_lock(&proc_subdir_lock);
                        de = pde_subdir_first(de);
                        i -= 2;
@@ -332,24 +334,39 @@ int proc_readdir_de(struct proc_dir_entr
                                i--;
                        }
 
+                       i = 0;
+                       /* process entries 4 by 4 with lock held */
                        do {
                                struct proc_dir_entry *next;
 
-                               /* filldir passes info to user space */
+                               /* take a ref on de */
                                pde_get(de);
-                               spin_unlock(&proc_subdir_lock);
+                               filp->f_pos++;
+                               /* save de for further filldir usage without
+                                * lock
+                                */
+                               pde[i++] = de;
+                               next = pde_subdir_next(de);
+                               de = next;
+                       } while (de && i < 4);
+                       spin_unlock(&proc_subdir_lock);
+
+                       /* process entries found */
+                       for (j = 0; j < i ; j++) {
+                               de = pde[j];
+                               /* filldir passes info to user space, no lock
+                                * can be held */
                                if (filldir(dirent, de->name, de->namelen, filp->f_pos,
                                            de->low_ino, de->mode >> 12) < 0) {
-                                       pde_put(de);
+                                       while (j < i)
+                                               pde_put(pde[j]);
                                        goto out;
                                }
-                               spin_lock(&proc_subdir_lock);
-                               filp->f_pos++;
-                               next = pde_subdir_next(de);
                                pde_put(de);
-                               de = next;
-                       } while (de);
-                       spin_unlock(&proc_subdir_lock);
+                       }
+                       i = filp->f_pos;
+                       if (j == 4)
+                               goto loop4more;
        }
        ret = 1;
 out:
===========================================================================
And I have then been able to successfully expose it to the previously described reproducer.

Problem looks still present in latest 3.x and 4.x Kernels.
Comment 1 Bruno Faccini 2018-05-02 12:42:31 UTC
Created attachment 275731 [details]
Patch
Comment 2 Iry Mor 2023-05-31 06:28:05 UTC
Reproduced this issue on kernel 4.18.0:

=======================================
[ 1772.531943][T39767] BUG: unable to handle kernel NULL pointer dereference at 0000000000000008
[ 1772.540551][T39767] PGD 0 P4D 0    
[ 1772.543862][T39767] Oops: 0000 [#1] SMP
[ 1772.547781][T39767] CPU: 0 PID: 39767 Comm: ls Tainted: G        W IO     --------- -  - 4.18.0-infinidat-5767045 #1
[ 1772.558378][T39767] Hardware name:  /0YWR7D, BIOS 2.10.2 02/24/2021 
[ 1772.564726][T39767] RIP: 0010:ida_free+0xc6/0x130
[ 1772.569520][T39767] Code: f5 00 00 48 8b 3c 24 4c 89 e6 e8 25 33 02 00 48 8b 44 24 30 65 48 33 04 25 28 00 00 00 75 68 48 83 c4 38 5b 5d 41 5c 41 5d c3 <4c> 0f a3 28 73 3
7 4c 0f b3 28 31 f6 48 89 e7 e8 26 e7 00 00 be 00
[ 1772.589101][T39767] RSP: 0018:ffffc900072cfdf0 EFLAGS: 00010046
[ 1772.595113][T39767] RAX: 0000000000000000 RBX: 0000000000000c63 RCX: 0000000000000000
[ 1772.603041][T39767] RDX: 0000000000000003 RSI: ffffc900072cfdf0 RDI: ffff897821233d80
[ 1772.610971][T39767] RBP: 0000000000000000 R08: 00000000f0000c63 R09: 0000000000000008
[ 1772.618903][T39767] R10: 0000000000000003 R11: 0000000000000000 R12: 0000000000000206
[ 1772.626842][T39767] R13: 0000000000000063 R14: ffffffff81212c50 R15: 0000000000000001
[ 1772.634781][T39767] FS:  00007f1d2168a800(0000) GS:ffff89007f600000(0000) knlGS:0000000000000000
[ 1772.643672][T39767] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1772.650201][T39767] CR2: 0000000000000008 CR3: 000000790ee3e001 CR4: 00000000001706f0
[ 1772.658136][T39767] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 1772.666066][T39767] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 1772.673996][T39767] Call Trace:
[ 1772.677230][T39767]  pde_put+0x2c/0x40
[ 1772.681068][T39767]  proc_readdir_de+0x1ae/0x1e0
[ 1772.685775][T39767]  iterate_dir+0x108/0x160
[ 1772.690136][T39767]  ksys_getdents64+0x9c/0x130
[ 1772.694763][T39767]  ? iterate_dir+0x160/0x160
[ 1772.699295][T39767]  __x64_sys_getdents64+0x16/0x20
[ 1772.704262][T39767]  do_syscall_64+0x4f/0x1a0
[ 1772.708709][T39767]  entry_SYSCALL_64_after_hwframe+0x65/0xca
[ 1772.714551][T39767] RIP: 0033:0x7f1d20d3b80b
[ 1772.718916][T39767] Code: 00 00 48 83 c4 08 5b 5d c3 66 0f 1f 44 00 00 f3 0f 1e fa 48 8b 47 20 c3 0f 1f 80 00 00 00 00 f3 0f 1e fa b8 d9 00 00 00 0f 05 <48> 3d 00 f0 ff f
f 77 05 c3 0f 1f 40 00 48 8b 15 49 a6 2f 00 f7 d8
[ 1772.738491][T39767] RSP: 002b:00007ffc448a8708 EFLAGS: 00000246 ORIG_RAX: 00000000000000d9
[ 1772.746859][T39767] RAX: ffffffffffffffda RBX: 0000562a0ffcece0 RCX: 00007f1d20d3b80b
[ 1772.754788][T39767] RDX: 0000000000008000 RSI: 0000562a0ffced10 RDI: 0000000000000003
[ 1772.762717][T39767] RBP: 0000562a0ffced10 R08: 0000000010000000 R09: 0000000000000003
[ 1772.770650][T39767] R10: 0000000000000001 R11: 0000000000000246 R12: fffffffffffffe98
[ 1772.778588][T39767] R13: 0000000000000000 R14: 0000562a0ffce0d0 R15: 0000562a0f90f8e4
=======================================

Originally, this happened when running "find /", but also with Bruno's reproducer module. Is there any update on this bug?
Comment 3 Christian Brauner 2023-05-31 07:47:55 UTC
On Wed, May 31, 2023 at 06:28:05AM +0000, bugzilla-daemon@kernel.org wrote:
> https://bugzilla.kernel.org/show_bug.cgi?id=199593
> 
> Iry Mor (imor@infinidat.com) changed:
> 
>            What    |Removed                     |Added
> ----------------------------------------------------------------------------
>                  CC|                            |imor@infinidat.com
> 
> --- Comment #2 from Iry Mor (imor@infinidat.com) ---
> Reproduced this issue on kernel 4.18.0:
> 
> =======================================
> [ 1772.531943][T39767] BUG: unable to handle kernel NULL pointer dereference
> at
> 0000000000000008
> [ 1772.540551][T39767] PGD 0 P4D 0    
> [ 1772.543862][T39767] Oops: 0000 [#1] SMP
> [ 1772.547781][T39767] CPU: 0 PID: 39767 Comm: ls Tainted: G        W IO    
> --------- -  - 4.18.0-infinidat-5767045 #1
> [ 1772.558378][T39767] Hardware name:  /0YWR7D, BIOS 2.10.2 02/24/2021 
> [ 1772.564726][T39767] RIP: 0010:ida_free+0xc6/0x130
> [ 1772.569520][T39767] Code: f5 00 00 48 8b 3c 24 4c 89 e6 e8 25 33 02 00 48
> 8b
> 44 24 30 65 48 33 04 25 28 00 00 00 75 68 48 83 c4 38 5b 5d 41 5c 41 5d c3
> <4c>
> 0f a3 28 73 3
> 7 4c 0f b3 28 31 f6 48 89 e7 e8 26 e7 00 00 be 00
> [ 1772.589101][T39767] RSP: 0018:ffffc900072cfdf0 EFLAGS: 00010046
> [ 1772.595113][T39767] RAX: 0000000000000000 RBX: 0000000000000c63 RCX:
> 0000000000000000
> [ 1772.603041][T39767] RDX: 0000000000000003 RSI: ffffc900072cfdf0 RDI:
> ffff897821233d80
> [ 1772.610971][T39767] RBP: 0000000000000000 R08: 00000000f0000c63 R09:
> 0000000000000008
> [ 1772.618903][T39767] R10: 0000000000000003 R11: 0000000000000000 R12:
> 0000000000000206
> [ 1772.626842][T39767] R13: 0000000000000063 R14: ffffffff81212c50 R15:
> 0000000000000001
> [ 1772.634781][T39767] FS:  00007f1d2168a800(0000) GS:ffff89007f600000(0000)
> knlGS:0000000000000000
> [ 1772.643672][T39767] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 1772.650201][T39767] CR2: 0000000000000008 CR3: 000000790ee3e001 CR4:
> 00000000001706f0
> [ 1772.658136][T39767] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
> 0000000000000000
> [ 1772.666066][T39767] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7:
> 0000000000000400
> [ 1772.673996][T39767] Call Trace:
> [ 1772.677230][T39767]  pde_put+0x2c/0x40
> [ 1772.681068][T39767]  proc_readdir_de+0x1ae/0x1e0
> [ 1772.685775][T39767]  iterate_dir+0x108/0x160
> [ 1772.690136][T39767]  ksys_getdents64+0x9c/0x130
> [ 1772.694763][T39767]  ? iterate_dir+0x160/0x160
> [ 1772.699295][T39767]  __x64_sys_getdents64+0x16/0x20
> [ 1772.704262][T39767]  do_syscall_64+0x4f/0x1a0
> [ 1772.708709][T39767]  entry_SYSCALL_64_after_hwframe+0x65/0xca
> [ 1772.714551][T39767] RIP: 0033:0x7f1d20d3b80b
> [ 1772.718916][T39767] Code: 00 00 48 83 c4 08 5b 5d c3 66 0f 1f 44 00 00 f3
> 0f
> 1e fa 48 8b 47 20 c3 0f 1f 80 00 00 00 00 f3 0f 1e fa b8 d9 00 00 00 0f 05
> <48>
> 3d 00 f0 ff f
> f 77 05 c3 0f 1f 40 00 48 8b 15 49 a6 2f 00 f7 d8
> [ 1772.738491][T39767] RSP: 002b:00007ffc448a8708 EFLAGS: 00000246 ORIG_RAX:
> 00000000000000d9
> [ 1772.746859][T39767] RAX: ffffffffffffffda RBX: 0000562a0ffcece0 RCX:
> 00007f1d20d3b80b
> [ 1772.754788][T39767] RDX: 0000000000008000 RSI: 0000562a0ffced10 RDI:
> 0000000000000003
> [ 1772.762717][T39767] RBP: 0000562a0ffced10 R08: 0000000010000000 R09:
> 0000000000000003
> [ 1772.770650][T39767] R10: 0000000000000001 R11: 0000000000000246 R12:
> fffffffffffffe98
> [ 1772.778588][T39767] R13: 0000000000000000 R14: 0000562a0ffce0d0 R15:
> 0000562a0f90f8e4
> =======================================
> 
> Originally, this happened when running "find /", but also with Bruno's
> reproducer module. Is there any update on this bug?

Is it reproducible on mainline?
Comment 4 Iry Mor 2023-06-05 08:34:13 UTC
The kernel panic did not reproduce on Ubuntu with kernel 6.3.5 using Bruno's reproducer module (after adjusting it to the new kernel with struct proc_ops instead of struct file_operations), but there is a warning about an underflow in refcount_t when iterating over the procfs files that the module creates (ida_kdb_dummy is the reproducer module):

=======================================
[  126.302067] ------------[ cut here ]------------
[  126.302069] refcount_t: underflow; use-after-free.
[  126.302102] WARNING: CPU: 2 PID: 5628 at lib/refcount.c:28 refcount_warn_saturate+0xa3/0x150
[  126.302113] Modules linked in: ida_kdb_dummy(OE-) vsock_loopback vmw_vsock_virtio_transport_common vmw_vsock_vmci_transport vsock binfmt_misc intel_rapl_msr vmw_balloon intel_rapl_common rapl joydev input_leds serio_raw vmw_vmci mac_hid dm_multipath sch_fq_codel scsi_dh_rdac scsi_dh_emc scsi_dh_alua msr ramoops reed_solomon pstore_blk pstore_zone efi_pstore ip_tables x_tables autofs4 btrfs blake2b_generic raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c raid1 raid0 multipath linear vmwgfx drm_ttm_helper ttm drm_kms_helper crct10dif_pclmul syscopyarea crc32_pclmul sysfillrect polyval_clmulni mptspi sysimgblt polyval_generic mptscsih ghash_clmulni_intel sha512_ssse3 aesni_intel crypto_simd cryptd psmouse mptbase ahci drm vmxnet3 libahci i2c_piix4 scsi_transport_spi pata_acpi [last unloaded: ida_kdb_dummy(OE)]
[  126.302272] CPU: 2 PID: 5628 Comm: ls Tainted: G        W  OE      6.3.5-060305-generic #202305301445
[  126.302279] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 12/12/2018
[  126.302284] RIP: 0010:refcount_warn_saturate+0xa3/0x150
[  126.302292] Code: cc cc 0f b6 1d ae 07 e0 01 80 fb 01 0f 87 2c e8 8a 00 83 e3 01 75 dd 48 c7 c7 d8 a5 59 b0 c6 05 92 07 e0 01 01 e8 2d 5e 93 ff <0f> 0b eb c6 0f b6 1d 85 07 e0 01 80 fb 01 0f 87 ec e7 8a 00 83 e3
[  126.302298] RSP: 0018:ffffb24400eebd28 EFLAGS: 00010246
[  126.302304] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
[  126.302309] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
[  126.302313] RBP: ffffb24400eebd30 R08: 0000000000000000 R09: 0000000000000000
[  126.302318] R10: 0000000000000000 R11: 0000000000000000 R12: ffffb24400eebde0
[  126.302323] R13: ffff9859c5e0f540 R14: 0000000000000001 R15: ffffffffaf09e0e0
[  126.302328] FS:  00007fa858afb800(0000) GS:ffff9859fed00000(0000) knlGS:0000000000000000
[  126.302334] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  126.302339] CR2: 000055e20eb95ba8 CR3: 000000000622e001 CR4: 00000000001706e0
[  126.302371] Call Trace:
[  126.302374]  <TASK>
[  126.302378]  ? show_regs+0x6d/0x80
[  126.302386]  ? __warn+0x89/0x160
[  126.302398]  ? refcount_warn_saturate+0xa3/0x150
[  126.302407]  ? report_bug+0x17e/0x1b0
[  126.302416]  ? handle_bug+0x46/0x90
[  126.302423]  ? exc_invalid_op+0x18/0x80
[  126.302431]  ? asm_exc_invalid_op+0x1b/0x20
[  126.302441]  ? __pfx_filldir64+0x10/0x10
[  126.302456]  ? refcount_warn_saturate+0xa3/0x150
[  126.302465]  ? refcount_warn_saturate+0xa3/0x150
[  126.302474]  pde_put+0x9b/0xc0
[  126.302481]  proc_readdir_de+0x1f8/0x270
[  126.302490]  proc_readdir+0x27/0x50
[  126.302498]  iterate_dir+0x98/0x1e0
[  126.302508]  __x64_sys_getdents64+0x84/0x130
[  126.302518]  ? __pfx_filldir64+0x10/0x10
[  126.302530]  do_syscall_64+0x58/0x90
[  126.302542]  ? handle_mm_fault+0x135/0x340
[  126.302554]  ? do_user_addr_fault+0x1e8/0x720
[  126.302562]  ? exit_to_user_mode_prepare+0x30/0xb0
[  126.302569]  ? irqentry_exit_to_user_mode+0x9/0x20
[  126.302576]  ? irqentry_exit+0x43/0x50
[  126.302582]  ? exc_page_fault+0x91/0x1b0
[  126.302595]  entry_SYSCALL_64_after_hwframe+0x72/0xdc
[  126.302605] RIP: 0033:0x7fa8588e66b7
[  126.302617] Code: ac fa ff 4c 89 e0 5b 5d 41 5c c3 0f 1f 84 00 00 00 00 00 f3 0f 1e fa b8 ff ff ff 7f 48 39 c2 48 0f 47 d0 b8 d9 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 01 c3 48 8b 15 49 27 13 00 f7 d8 64 89 02 48
[  126.302622] RSP: 002b:00007ffcfde2caa8 EFLAGS: 00000293 ORIG_RAX: 00000000000000d9
[  126.302629] RAX: ffffffffffffffda RBX: 000055e20eb8dba0 RCX: 00007fa8588e66b7
[  126.302634] RDX: 0000000000008000 RSI: 000055e20eb8dba0 RDI: 0000000000000003
[  126.302638] RBP: 000055e20eb8db74 R08: 0000000000000000 R09: 000055e20eb8db70
[  126.302643] R10: 00007fa858a1a2a0 R11: 0000000000000293 R12: fffffffffffffe98
[  126.302648] R13: 0000000000000000 R14: 000055e20eb8db70 R15: 0000000000000000
[  126.302660]  </TASK>
[  126.302662] ---[ end trace 0000000000000000 ]---
=======================================
Comment 5 Bruno Faccini 2023-06-19 08:42:25 UTC
I have made some more investigations running with 4.18.0 Kernel, and confirmed that my previously detailed reproducer, made of custom "bfi" module and 2 concurrent scripts, still works.
After digging into the associated source code, I have successfully tested the following patch new version :

--- orig/fs/proc/generic.c      2022-10-07 14:45:37.000000000 +0000
+++ bfi/fs/proc/generic.c       2023-06-15 15:05:37.365019769 +0000
@@ -713,18 +713,20 @@ int remove_proc_subtree(const char *name
                        de = next;
                        continue;
                }
-               write_unlock(&proc_subdir_lock);
-
-               proc_entry_rundown(de);
                next = de->parent;
                if (S_ISDIR(de->mode))
                        next->nlink--;
                de->nlink = 0;
+
+               write_unlock(&proc_subdir_lock);
+
+               proc_entry_rundown(de);
                if (de == root)
                        break;
+               write_lock(&proc_subdir_lock);
+               /* call pde_put() with lock set to serialize vs concurrent readdir */
                pde_put(de);
 
-               write_lock(&proc_subdir_lock);
                de = next;
        }
        pde_put(root);
Comment 6 Bruno Faccini 2023-06-19 08:45:06 UTC
Created attachment 304453 [details]
New patch version for v4.18.0 Kernel
Comment 7 Bruno Faccini 2023-06-20 14:32:11 UTC
Oops hold-on, just got a new crash/signature after running my reproducer more longer than before :-(

Note You need to log in before you can comment on or make changes to this bug.