Bug 217405

Summary: overlayfs: ovl_permission: Null pointer dereference at realinode in rcu-walk mode
Product: File System Reporter: Zhihao Cheng (chengzhihao1)
Component: OtherAssignee: fs_other
Status: NEW ---    
Severity: normal CC: quic_bhaskarv, wnsdud1861
Priority: P3    
Hardware: All   
OS: Linux   
Kernel Version: Subsystem:
Regression: No Bisected commit-id:
Attachments: test.sh
diff

Description Zhihao Cheng 2023-05-05 08:13:18 UTC
1. Apply diff and compile kernel
2. useradd freg
3. ./test.sh

[  334.663558] overlayfs: get inode, wait drop cache, dir
[  334.688095] destroy inode
[  335.664586] overlayfs: wait done
[  335.664932] overlayfs: XXXX inodestate 60 unhash 1 upper dentry ffff88817b64c600 dentry->inode 0000000000000000
[  335.664979] BUG: kernel NULL pointer dereference, address: 0000000000000002
[  335.666508] #PF: supervisor read access in kernel mode
[  335.666965] #PF: error_code(0x0000) - not-present page
[  335.667429] PGD 0 P4D 0 
[  335.667673] Oops: 0000 [#1] PREEMPT SMP
[  335.668032] CPU: 0 PID: 2592 Comm: ls Not tainted 6.3.0-12065-g3fafea7f3ad0-dirty #1221
[  335.668752] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS ?-20190727_073836-buildvm-ppc64le-16.ppc.fedoraproje4
[  335.669956] RIP: 0010:inode_permission+0x33/0x2c0
[  335.670392] Code: 41 56 41 55 41 89 d5 41 54 41 89 d4 55 48 89 fd 53 48 89 f3 48 83 05 8b 1f f2 0c 01 41 83 e4 02 75 6c 48 8a
[  335.672101] RSP: 0018:ffffc9000190bba8 EFLAGS: 00010206
[  335.672578] RAX: ffff888103e32520 RBX: 0000000000000000 RCX: 0000000000000027
[  335.673216] RDX: 0000000000000081 RSI: 0000000000000000 RDI: ffffffff8a4aff50
[  335.673868] RBP: ffffffff8a4aff50 R08: ffff88817a97cf00 R09: ffffc9000190ba78
[  335.674509] R10: 0000000000000001 R11: 0000000000000001 R12: 0000000000000000
[  335.675145] R13: 0000000000000081 R14: ffff88817b73e0f8 R15: ffffffff8a4aff50
[  335.675789] FS:  00007fa2bf908540(0000) GS:ffff88842fc00000(0000) knlGS:0000000000000000
[  335.676513] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  335.677028] CR2: 0000000000000002 CR3: 0000000173508000 CR4: 00000000000006f0
[  335.677675] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  335.678308] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[  335.678939] Call Trace:
[  335.679165]  <TASK>
[  335.679371]  ovl_permission+0xde/0x320
[  335.679723]  inode_permission+0x15e/0x2c0
[  335.680090]  link_path_walk+0x115/0x550
[  335.680443]  ? path_init+0x3b1/0x560
[  335.680771]  path_lookupat.isra.0+0xb2/0x200
[  335.681170]  filename_lookup+0xda/0x240
[  335.681540]  ? __create_object+0x2b3/0x5c0
[  335.681922]  vfs_statx+0xa6/0x1f0
[  335.682233]  vfs_fstatat+0x7b/0xb0
[  335.682556]  __do_sys_newlstat+0x3b/0x90
[  335.682923]  ? exit_to_user_mode_prepare+0x130/0x340
[  335.683380]  __x64_sys_newlstat+0x1e/0x30
[  335.683758]  do_syscall_64+0x39/0x80
[  335.684103]  entry_SYSCALL_64_after_hwframe+0x63/0xcd
[  335.684568] RIP: 0033:0x7fa2beaffa75
[  335.684910] Code: 19 f4 2c 00 64 c7 00 16 00 00 00 b8 ff ff ff ff c3 0f 1f 40 00 83 ff 01 48 89 f0 77 30 48 89 c7 48 89 d6 b9
[  335.686562] RSP: 002b:00007ffedf0503e8 EFLAGS: 00000246 ORIG_RAX: 0000000000000006
[  335.687243] RAX: ffffffffffffffda RBX: 00007ffedf0524d0 RCX: 00007fa2beaffa75
[  335.687887] RDX: 0000559a4ec6b150 RSI: 0000559a4ec6b150 RDI: 00007ffedf0524d0
[  335.688529] RBP: 00007ffedf050790 R08: 0000000000000004 R09: 00007ffedf0524e7
[  335.689176] R10: 00000000df052400 R11: 0000000000000246 R12: 00007ffedf0524d0
[  335.689828] R13: 0000000000000000 R14: 0000559a4ec6b140 R15: 0000000000000000
[  335.690471]  </TASK>
[  335.690680] Modules linked in:
[  335.690975] CR2: 0000000000000002
[  335.691304] ---[ end trace 0000000000000000 ]---
[  335.691743] RIP: 0010:inode_permission+0x33/0x2c0
[  335.692175] Code: 41 56 41 55 41 89 d5 41 54 41 89 d4 55 48 89 fd 53 48 89 f3 48 83 05 8b 1f f2 0c 01 41 83 e4 02 75 6c 48 8a
[  335.693858] RSP: 0018:ffffc9000190bba8 EFLAGS: 00010206
[  335.694330] RAX: ffff888103e32520 RBX: 0000000000000000 RCX: 0000000000000027
[  335.694992] RDX: 0000000000000081 RSI: 0000000000000000 RDI: ffffffff8a4aff50
[  335.695649] RBP: ffffffff8a4aff50 R08: ffff88817a97cf00 R09: ffffc9000190ba78
[  335.696287] R10: 0000000000000001 R11: 0000000000000001 R12: 0000000000000000
[  335.696944] R13: 0000000000000081 R14: ffff88817b73e0f8 R15: ffffffff8a4aff50
[  335.697619] FS:  00007fa2bf908540(0000) GS:ffff88842fc00000(0000) knlGS:0000000000000000
[  335.698341] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  335.698881] CR2: 0000000000000002 CR3: 0000000173508000 CR4: 00000000000006f0
[  335.699524] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  335.700173] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[  335.700829] Kernel panic - not syncing: Fatal exception
[  335.701374] Kernel Offset: disabled
[  335.701720] ---[ end Kernel panic - not syncing: Fatal exception ]---
Comment 1 Zhihao Cheng 2023-05-05 08:13:31 UTC
Created attachment 304220 [details]
test.sh
Comment 2 Zhihao Cheng 2023-05-05 08:13:43 UTC
Created attachment 304221 [details]
diff
Comment 3 Bhaskar Valaboju 2023-05-08 06:56:56 UTC
I am also facing this issue and I am checking if dentry is in KILLED state to fix the issue.  I see it makes sense to check this instead of inode NULL check, comment?
Comment 4 Zhihao Cheng 2023-05-08 07:13:41 UTC
(In reply to Bhaskar Valaboju from comment #3)
> I am also facing this issue and I am checking if dentry is in KILLED state
> to fix the issue.  I see it makes sense to check this instead of inode NULL
> check, comment?

Do you judge the KILLED state by checking d_unhashed(dentry)? When do the check? Could you please sendout the fix code? Thanks.
Comment 5 Bhaskar Valaboju 2023-05-08 10:21:29 UTC
Please find below the change I am testing:

diff --git a/fs/overlayfs/inode.c b/fs/overlayfs/inode.c
index 541cf3717fc2..69c481cc820f 100644
--- a/fs/overlayfs/inode.c
+++ b/fs/overlayfs/inode.c
@@ -289,7 +289,7 @@ int ovl_permission(struct mnt_idmap *idmap,
 
        /* Careful in RCU walk mode */
        ovl_i_path_real(inode, &realpath);
-       if (!realpath.dentry) {
+       if (!realpath.dentry || realpath.dentry->d_flags & DCACHE_DENTRY_KILLED) {
                WARN_ON(!(mask & MAY_NOT_BLOCK));
                return -ECHILD;
        }
@@ -402,6 +402,9 @@ int ovl_xattr_get(struct dentry *dentry, struct inode *inode, const char *name,
        struct path realpath;
 
        ovl_i_path_real(inode, &realpath);
+       if (!realpath.dentry || realpath.dentry->d_flags & DCACHE_DENTRY_KILLED)
+               return -ECHILD;
+
        old_cred = ovl_override_creds(dentry->d_sb);
        res = vfs_getxattr(mnt_idmap(realpath.mnt), realpath.dentry, name, value, size);
        revert_creds(old_cred);
@@ -568,7 +571,7 @@ struct posix_acl *do_ovl_get_acl(struct mnt_idmap *idmap,
 
        /* Careful in RCU walk mode */
        ovl_i_path_real(inode, &realpath);
-       if (!realpath.dentry) {
+       if (!realpath.dentry || realpath.dentry->d_flags & DCACHE_DENTRY_KILLED) {
                WARN_ON(!rcu);
                return ERR_PTR(-ECHILD);
        }
Comment 6 Zhihao Cheng 2023-05-08 11:27:09 UTC
(In reply to Bhaskar Valaboju from comment #5)
> Please find below the change I am testing:
> 
> diff --git a/fs/overlayfs/inode.c b/fs/overlayfs/inode.c
> index 541cf3717fc2..69c481cc820f 100644
> --- a/fs/overlayfs/inode.c
> +++ b/fs/overlayfs/inode.c
> @@ -289,7 +289,7 @@ int ovl_permission(struct mnt_idmap *idmap,
>  
>         /* Careful in RCU walk mode */
>         ovl_i_path_real(inode, &realpath);
> -       if (!realpath.dentry) {
> +       if (!realpath.dentry || realpath.dentry->d_flags &
> DCACHE_DENTRY_KILLED) {
>                 WARN_ON(!(mask & MAY_NOT_BLOCK));
>                 return -ECHILD;
>         }

We'd better hold realpath.dentry->d_lock to check 'd_flags' and get 'd_inode', otherwise:
      P1                      P2
ovl_permission
 if (realpath.dentry->d_flags & DCACHE_DENTRY_KILLED) // false
                         drop_cache
                          __dentry_kill
                           dentry_unlist
                            dentry->d_flags |= DCACHE_DENTRY_KILLED
                           dentry_unlink_inode
                            dentry->d_inode = NULL
 realinode = d_inode(realpath.dentry) // NULL

Am I understanding right?

> @@ -402,6 +402,9 @@ int ovl_xattr_get(struct dentry *dentry, struct inode
> *inode, const char *name,
>         struct path realpath;
>  
>         ovl_i_path_real(inode, &realpath);
> +       if (!realpath.dentry || realpath.dentry->d_flags &
> DCACHE_DENTRY_KILLED)
> +               return -ECHILD;
> +
>         old_cred = ovl_override_creds(dentry->d_sb);
>         res = vfs_getxattr(mnt_idmap(realpath.mnt), realpath.dentry, name,
> value, size);
>         revert_creds(old_cred);

BTW, 
path_getxattr
 user_path_at(path) // hold refcnt for path.dentry
  getxattr(path.dentry)
   vfs_getxattr
    ovl_xattr_get  // realpath.dentry cannot be killed
 path_put(&path)

Are there any other paths to trigger null-ptr-deference in ovl_xattr_get()?

> @@ -568,7 +571,7 @@ struct posix_acl *do_ovl_get_acl(struct mnt_idmap *idmap,
>  
>         /* Careful in RCU walk mode */
>         ovl_i_path_real(inode, &realpath);
> -       if (!realpath.dentry) {
> +       if (!realpath.dentry || realpath.dentry->d_flags &
> DCACHE_DENTRY_KILLED) {
>                 WARN_ON(!rcu);
>                 return ERR_PTR(-ECHILD);
>         }
Comment 7 Bhaskar Valaboju 2023-05-09 04:00:43 UTC
(In reply to Zhihao Cheng from comment #6)
> (In reply to Bhaskar Valaboju from comment #5)
> > Please find below the change I am testing:
> > 
> > diff --git a/fs/overlayfs/inode.c b/fs/overlayfs/inode.c
> > index 541cf3717fc2..69c481cc820f 100644
> > --- a/fs/overlayfs/inode.c
> > +++ b/fs/overlayfs/inode.c
> > @@ -289,7 +289,7 @@ int ovl_permission(struct mnt_idmap *idmap,
> >  
> >         /* Careful in RCU walk mode */
> >         ovl_i_path_real(inode, &realpath);
> > -       if (!realpath.dentry) {
> > +       if (!realpath.dentry || realpath.dentry->d_flags &
> > DCACHE_DENTRY_KILLED) {
> >                 WARN_ON(!(mask & MAY_NOT_BLOCK));
> >                 return -ECHILD;
> >         }
> 
> We'd better hold realpath.dentry->d_lock to check 'd_flags' and get
> 'd_inode', otherwise:
>       P1                      P2
> ovl_permission
>  if (realpath.dentry->d_flags & DCACHE_DENTRY_KILLED) // false
>                          drop_cache
>                           __dentry_kill
>                            dentry_unlist
>                             dentry->d_flags |= DCACHE_DENTRY_KILLED
>                            dentry_unlink_inode
>                             dentry->d_inode = NULL
>  realinode = d_inode(realpath.dentry) // NULL
> 
> Am I understanding right?

Makes sense.

> 
> > @@ -402,6 +402,9 @@ int ovl_xattr_get(struct dentry *dentry, struct inode
> > *inode, const char *name,
> >         struct path realpath;
> >  
> >         ovl_i_path_real(inode, &realpath);
> > +       if (!realpath.dentry || realpath.dentry->d_flags &
> > DCACHE_DENTRY_KILLED)
> > +               return -ECHILD;
> > +
> >         old_cred = ovl_override_creds(dentry->d_sb);
> >         res = vfs_getxattr(mnt_idmap(realpath.mnt), realpath.dentry, name,
> > value, size);
> >         revert_creds(old_cred);
> 
> BTW, 
> path_getxattr
>  user_path_at(path) // hold refcnt for path.dentry
>   getxattr(path.dentry)
>    vfs_getxattr
>     ovl_xattr_get  // realpath.dentry cannot be killed
>  path_put(&path)
> 
> Are there any other paths to trigger null-ptr-deference in ovl_xattr_get()?
>

There are no other parts, we can ignore this change.
 
> > @@ -568,7 +571,7 @@ struct posix_acl *do_ovl_get_acl(struct mnt_idmap
> *idmap,
> >  
> >         /* Careful in RCU walk mode */
> >         ovl_i_path_real(inode, &realpath);
> > -       if (!realpath.dentry) {
> > +       if (!realpath.dentry || realpath.dentry->d_flags &
> > DCACHE_DENTRY_KILLED) {
> >                 WARN_ON(!rcu);
> >                 return ERR_PTR(-ECHILD);
> >         }