Created attachment 275729 [details] Reproducer's module source code Problem has occurred several times on a Lustre Client when filesystem is being unmounted and during the same time some user-land thread is still trying to access some of Lustre tunables in /proc. The Panic stack looks like following : ======================================= [ 7093.049261] BUG: unable to handle kernel NULL pointer dereference at 0000000000000038 [ 7093.050110] IP: [<ffffffff813261e1>] ida_remove+0x91/0xe0 [ 7093.050655] PGD 8000000064f1b067 PUD 5f9a7067 PMD 0 [ 7093.051182] Oops: 0000 [#1] SMP [ 7093.051534] Modules linked in: lustre(OE) obdecho(OE) mgc(OE) lov(OE) mdc(OE) osc(OE) lmv(OE) fid(OE) fld(OE) ptlrpc_gss(OE) ptlrpc(OE) obdclass(OE) ksocklnd(OE) lnet(OE) libcfs(OE) rpcsec_gss_krb5 nfsv4 dns_resolver nfs fscache rpcrdma ib_isert iscsi_target_mod ib_iser libiscsi scsi_transport_iscsi ib_srpt target_core_mod crc_t10dif crct10dif_generic ib_srp scsi_transport_srp scsi_tgt ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm ib_core iosf_mbi crc32_pclmul ghash_clmulni_intel nfsd ppdev aesni_intel lrw gf128mul glue_helper ablk_helper cryptd i2c_piix4 i2c_core pcspkr joydev virtio_balloon nfs_acl lockd parport_pc parport grace auth_rpcgss sunrpc ip_tables ata_generic pata_acpi ext4 mbcache jbd2 ata_piix libata virtio_blk 8139too crct10dif_pclmul crct10dif_common crc32c_intel virtio_pci virtio_ring serio_raw virtio 8139cp floppy mii [ 7093.060088] CPU: 0 PID: 23436 Comm: lctl Tainted: G OE ------------ 3.10.0-693.17.1.el7.x86_64 #1 [ 7093.061013] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011 [ 7093.061555] task: ffff880062b33f40 ti: ffff8800622f8000 task.ti: ffff8800622f8000 [ 7093.062250] RIP: 0010:[<ffffffff813261e1>] [<ffffffff813261e1>] ida_remove+0x91/0xe0 [ 7093.063009] RSP: 0018:ffff8800622fbe20 EFLAGS: 00010086 [ 7093.063511] RAX: 0000000000000180 RBX: 0000000000000000 RCX: 00000000000000f4 [ 7093.064172] RDX: 00000000001ec6f4 RSI: 00000000736a1480 RDI: ffff88007c0a0008 [ 7093.064838] RBP: ffff8800622fbe30 R08: ffff88007c0a0000 R09: 00000000000000f4 [ 7093.065506] R10: 00000000636a1001 R11: ffffea00018da840 R12: ffffffff81fd15e0 [ 7093.066164] R13: ffff88007ab27600 R14: ffff8800636a1000 R15: ffffffff81217790 [ 7093.066850] FS: 00007f4e87410740(0000) GS:ffff88007fc00000(0000) knlGS:0000000000000000 [ 7093.067600] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 7093.068132] CR2: 0000000000000038 CR3: 000000006091e000 CR4: 00000000000606f0 [ 7093.068797] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 7093.069461] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 [ 7093.070114] Call Trace: [ 7093.070355] [<ffffffff8127880f>] proc_free_inum+0x2f/0x50 [ 7093.070879] [<ffffffff81278c97>] pde_put+0x27/0x50 [ 7093.071333] [<ffffffff81278e1c>] proc_readdir_de+0x15c/0x220 [ 7093.071887] [<ffffffff81217790>] ? fillonedir+0xe0/0xe0 [ 7093.072382] [<ffffffff81278f02>] proc_readdir+0x22/0x30 [ 7093.072884] [<ffffffff81217680>] vfs_readdir+0xb0/0xe0 [ 7093.073379] [<ffffffff81217aa5>] SyS_getdents+0x95/0x120 [ 7093.073894] [<ffffffff816b89fd>] system_call_fastpath+0x16/0x1b [ 7093.074459] [<ffffffff816b889d>] ? system_call_after_swapgs+0xca/0x214 [ 7093.075071] Code: 45 0f b3 48 08 4f 8b 44 c8 28 83 e9 08 4d 85 c0 75 e0 4d 85 c0 74 4b 0f b6 ca 49 8d 78 08 41 0f b3 48 08 4c 63 c9 4b 8b 5c c8 28 <0f> a3 43 08 45 19 c0 45 85 c0 74 2b 0f b3 43 08 48 83 2b 01 75 [ 7093.078106] RIP [<ffffffff813261e1>] ida_remove+0x91/0xe0 [ 7093.078655] RSP <ffff8800622fbe20> [ 7093.078987] CR2: 0000000000000038 ======================================= When umount thread can be found working removing /proc entries in same directory with the following stack in crash-dumps: ======================================= PID: 23433 TASK: ffff88000016eeb0 CPU: 1 COMMAND: "umount" #0 [ffff88007fd08e48] crash_nmi_callback at ffffffff8104fd11 #1 [ffff88007fd08e58] nmi_handle at ffffffff816b0c57 #2 [ffff88007fd08eb0] do_nmi at ffffffff816b0e8d #3 [ffff88007fd08ef0] end_repeat_nmi at ffffffff816b00b9 [exception RIP: __pv_queued_spin_lock_slowpath+218] RIP: ffffffff810fbf0a RSP: ffff8800608a7978 RFLAGS: 00000206 RAX: 0000000000000005 RBX: ffffffff81fd15c0 RCX: 0000000000007b84 RDX: 0000000000000005 RSI: 0000000000000005 RDI: 0000000000000246 RBP: ffff8800608a79a8 R8: 0000000000000000 R9: 0000000000000040 R10: 0000000000000000 R11: 0000000000000001 R12: ffff88007fd195c0 R13: 0000000000000001 R14: 0000000000090000 R15: 0000000000000000 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 --- <NMI exception stack> --- #4 [ffff8800608a7978] __pv_queued_spin_lock_slowpath at ffffffff810fbf0a #5 [ffff8800608a79b0] queued_spin_lock_slowpath at ffffffff816a08df #6 [ffff8800608a79c0] _raw_spin_lock at ffffffff816adff0 #7 [ffff8800608a79d0] remove_proc_subtree at ffffffff812791bc #8 [ffff8800608a7a20] proc_remove at ffffffff8127922e #9 [ffff8800608a7a30] lprocfs_obd_cleanup at ffffffffc0724b33 [obdclass] #10 [ffff8800608a7a48] ptlrpc_lprocfs_unregister_obd at ffffffffc094d462 [ptlrpc] #11 [ffff8800608a7a60] osc_precleanup at ffffffffc0ae25df [osc] #12 [ffff8800608a7a78] class_cleanup at ffffffffc07388b9 [obdclass] #13 [ffff8800608a7af0] class_process_config at ffffffffc0739edc [obdclass] #14 [ffff8800608a7bb0] class_manual_cleanup at ffffffffc073c1b6 [obdclass] #15 [ffff8800608a7c58] lov_putref at ffffffffc0b557f2 [lov] #16 [ffff8800608a7cb8] lov_disconnect at ffffffffc0b5d062 [lov] #17 [ffff8800608a7ce0] obd_disconnect at ffffffffc0bd9ecb [lustre] #18 [ffff8800608a7d00] ll_put_super at ffffffffc0bdd361 [lustre] #19 [ffff8800608a7e38] generic_shutdown_super at ffffffff812056d2 #20 [ffff8800608a7e60] kill_anon_super at ffffffff81205aa2 #21 [ffff8800608a7e78] lustre_kill_super at ffffffffc073ea75 [obdclass] #22 [ffff8800608a7e90] deactivate_locked_super at ffffffff81205e59 #23 [ffff8800608a7eb0] deactivate_super at ffffffff812065c6 #24 [ffff8800608a7ec8] cleanup_mnt at ffffffff8122395f #25 [ffff8800608a7ee0] __cleanup_mnt at ffffffff812239f2 #26 [ffff8800608a7ef0] task_work_run at ffffffff810aefe5 #27 [ffff8800608a7f30] do_notify_resume at ffffffff8102ab52 #28 [ffff8800608a7f50] int_signal at ffffffff816b8d37 RIP: 00007f3942ac33e7 RSP: 00007fff3665cdd8 RFLAGS: 00000246 RAX: 0000000000000000 RBX: 000055ab33267040 RCX: ffffffffffffffff RDX: 0000000000000001 RSI: 0000000000000000 RDI: 000055ab332682e0 RBP: 000055ab332682e0 R8: 0000000000000008 R9: 000000000000000c R10: 00007fff3665c860 R11: 0000000000000246 R12: 00007f394363ad78 R13: 0000000000000000 R14: 000055ab33267210 R15: 000055ab33267040 ORIG_RAX: 00000000000000a6 CS: 0033 SS: 002b ======================================= The full story, with my complete crash-dumps analysis, can be found at https://jira.hpdd.intel.com/browse/LU-10952, but in a few words, this crash can occur during a race between proc_readdir_de() routine's loop (to gather the available proc tunables within the same directory) and remove_proc_subtree() loop (to destroy the same set of proc tunable entries), and seems to be due to the current sequence of spin_[un]lock()s in both remove_proc_subtree() and proc_readdir_de() loops that may lead to the latter to sometime work-on/use a removed/freed pde. I have created a reproducer, with the attached module source code in bfs.c file, which can easily trigger the same crash when running the following 2 scripts concurrently : ======================================= # while true ; do > insmod bfi.ko > rmmod bfi > done & --------------------------- # while true ; do > ls -la /proc/bfi/ > done & ======================================= Based on my analysis and to verify it, I have created the following patch : =========================================================================== This patch fixes a bug in PDEs processing when a race occurs between a thread presently browsing entries in a /proc subdir using proc_readdir_de() and an other thread removing the whole subdir using remove_proc_subtree(). Signed-off-by: Bruno Faccini <bruno.faccini@intel.com> --- orig/fs/proc/generic.c 2018-01-14 15:02:54.000000000 +0000 +++ bfi/fs/proc/generic.c 2018-04-26 13:53:25.139261920 +0000 @@ -295,8 +295,9 @@ int proc_readdir_de(struct proc_dir_entr filldir_t filldir) { unsigned int ino; - int i; + int i, j; struct inode *inode = file_inode(filp); + struct proc_dir_entry *pde[4]; int ret = 0; ino = inode->i_ino; @@ -317,6 +318,7 @@ int proc_readdir_de(struct proc_dir_entr filp->f_pos++; /* fall through */ default: +loop4more: spin_lock(&proc_subdir_lock); de = pde_subdir_first(de); i -= 2; @@ -332,24 +334,39 @@ int proc_readdir_de(struct proc_dir_entr i--; } + i = 0; + /* process entries 4 by 4 with lock held */ do { struct proc_dir_entry *next; - /* filldir passes info to user space */ + /* take a ref on de */ pde_get(de); - spin_unlock(&proc_subdir_lock); + filp->f_pos++; + /* save de for further filldir usage without + * lock + */ + pde[i++] = de; + next = pde_subdir_next(de); + de = next; + } while (de && i < 4); + spin_unlock(&proc_subdir_lock); + + /* process entries found */ + for (j = 0; j < i ; j++) { + de = pde[j]; + /* filldir passes info to user space, no lock + * can be held */ if (filldir(dirent, de->name, de->namelen, filp->f_pos, de->low_ino, de->mode >> 12) < 0) { - pde_put(de); + while (j < i) + pde_put(pde[j]); goto out; } - spin_lock(&proc_subdir_lock); - filp->f_pos++; - next = pde_subdir_next(de); pde_put(de); - de = next; - } while (de); - spin_unlock(&proc_subdir_lock); + } + i = filp->f_pos; + if (j == 4) + goto loop4more; } ret = 1; out: =========================================================================== And I have then been able to successfully expose it to the previously described reproducer. Problem looks still present in latest 3.x and 4.x Kernels.
Created attachment 275731 [details] Patch
Reproduced this issue on kernel 4.18.0: ======================================= [ 1772.531943][T39767] BUG: unable to handle kernel NULL pointer dereference at 0000000000000008 [ 1772.540551][T39767] PGD 0 P4D 0 [ 1772.543862][T39767] Oops: 0000 [#1] SMP [ 1772.547781][T39767] CPU: 0 PID: 39767 Comm: ls Tainted: G W IO --------- - - 4.18.0-infinidat-5767045 #1 [ 1772.558378][T39767] Hardware name: /0YWR7D, BIOS 2.10.2 02/24/2021 [ 1772.564726][T39767] RIP: 0010:ida_free+0xc6/0x130 [ 1772.569520][T39767] Code: f5 00 00 48 8b 3c 24 4c 89 e6 e8 25 33 02 00 48 8b 44 24 30 65 48 33 04 25 28 00 00 00 75 68 48 83 c4 38 5b 5d 41 5c 41 5d c3 <4c> 0f a3 28 73 3 7 4c 0f b3 28 31 f6 48 89 e7 e8 26 e7 00 00 be 00 [ 1772.589101][T39767] RSP: 0018:ffffc900072cfdf0 EFLAGS: 00010046 [ 1772.595113][T39767] RAX: 0000000000000000 RBX: 0000000000000c63 RCX: 0000000000000000 [ 1772.603041][T39767] RDX: 0000000000000003 RSI: ffffc900072cfdf0 RDI: ffff897821233d80 [ 1772.610971][T39767] RBP: 0000000000000000 R08: 00000000f0000c63 R09: 0000000000000008 [ 1772.618903][T39767] R10: 0000000000000003 R11: 0000000000000000 R12: 0000000000000206 [ 1772.626842][T39767] R13: 0000000000000063 R14: ffffffff81212c50 R15: 0000000000000001 [ 1772.634781][T39767] FS: 00007f1d2168a800(0000) GS:ffff89007f600000(0000) knlGS:0000000000000000 [ 1772.643672][T39767] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 1772.650201][T39767] CR2: 0000000000000008 CR3: 000000790ee3e001 CR4: 00000000001706f0 [ 1772.658136][T39767] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 1772.666066][T39767] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 1772.673996][T39767] Call Trace: [ 1772.677230][T39767] pde_put+0x2c/0x40 [ 1772.681068][T39767] proc_readdir_de+0x1ae/0x1e0 [ 1772.685775][T39767] iterate_dir+0x108/0x160 [ 1772.690136][T39767] ksys_getdents64+0x9c/0x130 [ 1772.694763][T39767] ? iterate_dir+0x160/0x160 [ 1772.699295][T39767] __x64_sys_getdents64+0x16/0x20 [ 1772.704262][T39767] do_syscall_64+0x4f/0x1a0 [ 1772.708709][T39767] entry_SYSCALL_64_after_hwframe+0x65/0xca [ 1772.714551][T39767] RIP: 0033:0x7f1d20d3b80b [ 1772.718916][T39767] Code: 00 00 48 83 c4 08 5b 5d c3 66 0f 1f 44 00 00 f3 0f 1e fa 48 8b 47 20 c3 0f 1f 80 00 00 00 00 f3 0f 1e fa b8 d9 00 00 00 0f 05 <48> 3d 00 f0 ff f f 77 05 c3 0f 1f 40 00 48 8b 15 49 a6 2f 00 f7 d8 [ 1772.738491][T39767] RSP: 002b:00007ffc448a8708 EFLAGS: 00000246 ORIG_RAX: 00000000000000d9 [ 1772.746859][T39767] RAX: ffffffffffffffda RBX: 0000562a0ffcece0 RCX: 00007f1d20d3b80b [ 1772.754788][T39767] RDX: 0000000000008000 RSI: 0000562a0ffced10 RDI: 0000000000000003 [ 1772.762717][T39767] RBP: 0000562a0ffced10 R08: 0000000010000000 R09: 0000000000000003 [ 1772.770650][T39767] R10: 0000000000000001 R11: 0000000000000246 R12: fffffffffffffe98 [ 1772.778588][T39767] R13: 0000000000000000 R14: 0000562a0ffce0d0 R15: 0000562a0f90f8e4 ======================================= Originally, this happened when running "find /", but also with Bruno's reproducer module. Is there any update on this bug?
On Wed, May 31, 2023 at 06:28:05AM +0000, bugzilla-daemon@kernel.org wrote: > https://bugzilla.kernel.org/show_bug.cgi?id=199593 > > Iry Mor (imor@infinidat.com) changed: > > What |Removed |Added > ---------------------------------------------------------------------------- > CC| |imor@infinidat.com > > --- Comment #2 from Iry Mor (imor@infinidat.com) --- > Reproduced this issue on kernel 4.18.0: > > ======================================= > [ 1772.531943][T39767] BUG: unable to handle kernel NULL pointer dereference > at > 0000000000000008 > [ 1772.540551][T39767] PGD 0 P4D 0 > [ 1772.543862][T39767] Oops: 0000 [#1] SMP > [ 1772.547781][T39767] CPU: 0 PID: 39767 Comm: ls Tainted: G W IO > --------- - - 4.18.0-infinidat-5767045 #1 > [ 1772.558378][T39767] Hardware name: /0YWR7D, BIOS 2.10.2 02/24/2021 > [ 1772.564726][T39767] RIP: 0010:ida_free+0xc6/0x130 > [ 1772.569520][T39767] Code: f5 00 00 48 8b 3c 24 4c 89 e6 e8 25 33 02 00 48 > 8b > 44 24 30 65 48 33 04 25 28 00 00 00 75 68 48 83 c4 38 5b 5d 41 5c 41 5d c3 > <4c> > 0f a3 28 73 3 > 7 4c 0f b3 28 31 f6 48 89 e7 e8 26 e7 00 00 be 00 > [ 1772.589101][T39767] RSP: 0018:ffffc900072cfdf0 EFLAGS: 00010046 > [ 1772.595113][T39767] RAX: 0000000000000000 RBX: 0000000000000c63 RCX: > 0000000000000000 > [ 1772.603041][T39767] RDX: 0000000000000003 RSI: ffffc900072cfdf0 RDI: > ffff897821233d80 > [ 1772.610971][T39767] RBP: 0000000000000000 R08: 00000000f0000c63 R09: > 0000000000000008 > [ 1772.618903][T39767] R10: 0000000000000003 R11: 0000000000000000 R12: > 0000000000000206 > [ 1772.626842][T39767] R13: 0000000000000063 R14: ffffffff81212c50 R15: > 0000000000000001 > [ 1772.634781][T39767] FS: 00007f1d2168a800(0000) GS:ffff89007f600000(0000) > knlGS:0000000000000000 > [ 1772.643672][T39767] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > [ 1772.650201][T39767] CR2: 0000000000000008 CR3: 000000790ee3e001 CR4: > 00000000001706f0 > [ 1772.658136][T39767] DR0: 0000000000000000 DR1: 0000000000000000 DR2: > 0000000000000000 > [ 1772.666066][T39767] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: > 0000000000000400 > [ 1772.673996][T39767] Call Trace: > [ 1772.677230][T39767] pde_put+0x2c/0x40 > [ 1772.681068][T39767] proc_readdir_de+0x1ae/0x1e0 > [ 1772.685775][T39767] iterate_dir+0x108/0x160 > [ 1772.690136][T39767] ksys_getdents64+0x9c/0x130 > [ 1772.694763][T39767] ? iterate_dir+0x160/0x160 > [ 1772.699295][T39767] __x64_sys_getdents64+0x16/0x20 > [ 1772.704262][T39767] do_syscall_64+0x4f/0x1a0 > [ 1772.708709][T39767] entry_SYSCALL_64_after_hwframe+0x65/0xca > [ 1772.714551][T39767] RIP: 0033:0x7f1d20d3b80b > [ 1772.718916][T39767] Code: 00 00 48 83 c4 08 5b 5d c3 66 0f 1f 44 00 00 f3 > 0f > 1e fa 48 8b 47 20 c3 0f 1f 80 00 00 00 00 f3 0f 1e fa b8 d9 00 00 00 0f 05 > <48> > 3d 00 f0 ff f > f 77 05 c3 0f 1f 40 00 48 8b 15 49 a6 2f 00 f7 d8 > [ 1772.738491][T39767] RSP: 002b:00007ffc448a8708 EFLAGS: 00000246 ORIG_RAX: > 00000000000000d9 > [ 1772.746859][T39767] RAX: ffffffffffffffda RBX: 0000562a0ffcece0 RCX: > 00007f1d20d3b80b > [ 1772.754788][T39767] RDX: 0000000000008000 RSI: 0000562a0ffced10 RDI: > 0000000000000003 > [ 1772.762717][T39767] RBP: 0000562a0ffced10 R08: 0000000010000000 R09: > 0000000000000003 > [ 1772.770650][T39767] R10: 0000000000000001 R11: 0000000000000246 R12: > fffffffffffffe98 > [ 1772.778588][T39767] R13: 0000000000000000 R14: 0000562a0ffce0d0 R15: > 0000562a0f90f8e4 > ======================================= > > Originally, this happened when running "find /", but also with Bruno's > reproducer module. Is there any update on this bug? Is it reproducible on mainline?
The kernel panic did not reproduce on Ubuntu with kernel 6.3.5 using Bruno's reproducer module (after adjusting it to the new kernel with struct proc_ops instead of struct file_operations), but there is a warning about an underflow in refcount_t when iterating over the procfs files that the module creates (ida_kdb_dummy is the reproducer module): ======================================= [ 126.302067] ------------[ cut here ]------------ [ 126.302069] refcount_t: underflow; use-after-free. [ 126.302102] WARNING: CPU: 2 PID: 5628 at lib/refcount.c:28 refcount_warn_saturate+0xa3/0x150 [ 126.302113] Modules linked in: ida_kdb_dummy(OE-) vsock_loopback vmw_vsock_virtio_transport_common vmw_vsock_vmci_transport vsock binfmt_misc intel_rapl_msr vmw_balloon intel_rapl_common rapl joydev input_leds serio_raw vmw_vmci mac_hid dm_multipath sch_fq_codel scsi_dh_rdac scsi_dh_emc scsi_dh_alua msr ramoops reed_solomon pstore_blk pstore_zone efi_pstore ip_tables x_tables autofs4 btrfs blake2b_generic raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c raid1 raid0 multipath linear vmwgfx drm_ttm_helper ttm drm_kms_helper crct10dif_pclmul syscopyarea crc32_pclmul sysfillrect polyval_clmulni mptspi sysimgblt polyval_generic mptscsih ghash_clmulni_intel sha512_ssse3 aesni_intel crypto_simd cryptd psmouse mptbase ahci drm vmxnet3 libahci i2c_piix4 scsi_transport_spi pata_acpi [last unloaded: ida_kdb_dummy(OE)] [ 126.302272] CPU: 2 PID: 5628 Comm: ls Tainted: G W OE 6.3.5-060305-generic #202305301445 [ 126.302279] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 12/12/2018 [ 126.302284] RIP: 0010:refcount_warn_saturate+0xa3/0x150 [ 126.302292] Code: cc cc 0f b6 1d ae 07 e0 01 80 fb 01 0f 87 2c e8 8a 00 83 e3 01 75 dd 48 c7 c7 d8 a5 59 b0 c6 05 92 07 e0 01 01 e8 2d 5e 93 ff <0f> 0b eb c6 0f b6 1d 85 07 e0 01 80 fb 01 0f 87 ec e7 8a 00 83 e3 [ 126.302298] RSP: 0018:ffffb24400eebd28 EFLAGS: 00010246 [ 126.302304] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000 [ 126.302309] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000 [ 126.302313] RBP: ffffb24400eebd30 R08: 0000000000000000 R09: 0000000000000000 [ 126.302318] R10: 0000000000000000 R11: 0000000000000000 R12: ffffb24400eebde0 [ 126.302323] R13: ffff9859c5e0f540 R14: 0000000000000001 R15: ffffffffaf09e0e0 [ 126.302328] FS: 00007fa858afb800(0000) GS:ffff9859fed00000(0000) knlGS:0000000000000000 [ 126.302334] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 126.302339] CR2: 000055e20eb95ba8 CR3: 000000000622e001 CR4: 00000000001706e0 [ 126.302371] Call Trace: [ 126.302374] <TASK> [ 126.302378] ? show_regs+0x6d/0x80 [ 126.302386] ? __warn+0x89/0x160 [ 126.302398] ? refcount_warn_saturate+0xa3/0x150 [ 126.302407] ? report_bug+0x17e/0x1b0 [ 126.302416] ? handle_bug+0x46/0x90 [ 126.302423] ? exc_invalid_op+0x18/0x80 [ 126.302431] ? asm_exc_invalid_op+0x1b/0x20 [ 126.302441] ? __pfx_filldir64+0x10/0x10 [ 126.302456] ? refcount_warn_saturate+0xa3/0x150 [ 126.302465] ? refcount_warn_saturate+0xa3/0x150 [ 126.302474] pde_put+0x9b/0xc0 [ 126.302481] proc_readdir_de+0x1f8/0x270 [ 126.302490] proc_readdir+0x27/0x50 [ 126.302498] iterate_dir+0x98/0x1e0 [ 126.302508] __x64_sys_getdents64+0x84/0x130 [ 126.302518] ? __pfx_filldir64+0x10/0x10 [ 126.302530] do_syscall_64+0x58/0x90 [ 126.302542] ? handle_mm_fault+0x135/0x340 [ 126.302554] ? do_user_addr_fault+0x1e8/0x720 [ 126.302562] ? exit_to_user_mode_prepare+0x30/0xb0 [ 126.302569] ? irqentry_exit_to_user_mode+0x9/0x20 [ 126.302576] ? irqentry_exit+0x43/0x50 [ 126.302582] ? exc_page_fault+0x91/0x1b0 [ 126.302595] entry_SYSCALL_64_after_hwframe+0x72/0xdc [ 126.302605] RIP: 0033:0x7fa8588e66b7 [ 126.302617] Code: ac fa ff 4c 89 e0 5b 5d 41 5c c3 0f 1f 84 00 00 00 00 00 f3 0f 1e fa b8 ff ff ff 7f 48 39 c2 48 0f 47 d0 b8 d9 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 01 c3 48 8b 15 49 27 13 00 f7 d8 64 89 02 48 [ 126.302622] RSP: 002b:00007ffcfde2caa8 EFLAGS: 00000293 ORIG_RAX: 00000000000000d9 [ 126.302629] RAX: ffffffffffffffda RBX: 000055e20eb8dba0 RCX: 00007fa8588e66b7 [ 126.302634] RDX: 0000000000008000 RSI: 000055e20eb8dba0 RDI: 0000000000000003 [ 126.302638] RBP: 000055e20eb8db74 R08: 0000000000000000 R09: 000055e20eb8db70 [ 126.302643] R10: 00007fa858a1a2a0 R11: 0000000000000293 R12: fffffffffffffe98 [ 126.302648] R13: 0000000000000000 R14: 000055e20eb8db70 R15: 0000000000000000 [ 126.302660] </TASK> [ 126.302662] ---[ end trace 0000000000000000 ]--- =======================================
I have made some more investigations running with 4.18.0 Kernel, and confirmed that my previously detailed reproducer, made of custom "bfi" module and 2 concurrent scripts, still works. After digging into the associated source code, I have successfully tested the following patch new version : --- orig/fs/proc/generic.c 2022-10-07 14:45:37.000000000 +0000 +++ bfi/fs/proc/generic.c 2023-06-15 15:05:37.365019769 +0000 @@ -713,18 +713,20 @@ int remove_proc_subtree(const char *name de = next; continue; } - write_unlock(&proc_subdir_lock); - - proc_entry_rundown(de); next = de->parent; if (S_ISDIR(de->mode)) next->nlink--; de->nlink = 0; + + write_unlock(&proc_subdir_lock); + + proc_entry_rundown(de); if (de == root) break; + write_lock(&proc_subdir_lock); + /* call pde_put() with lock set to serialize vs concurrent readdir */ pde_put(de); - write_lock(&proc_subdir_lock); de = next; } pde_put(root);
Created attachment 304453 [details] New patch version for v4.18.0 Kernel
Oops hold-on, just got a new crash/signature after running my reproducer more longer than before :-(