Bug 205567

Summary: potential (possibly benign) data race on ext4_dir_entry_2->inode when getdents64 and rename happens on the same directory
Product: File System Reporter: Meng Xu (mengxu.gatech)
Component: ext4Assignee: fs_ext4 (fs_ext4)
Status: RESOLVED WILL_NOT_FIX    
Severity: normal CC: sandeen, tytso
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: 5.4-rc5 Subsystem:
Regression: No Bisected commit-id:

Description Meng Xu 2019-11-18 20:15:09 UTC
I am reporting a potential data race (maybe benign) in the ext4 layer on  ext4_dir_entry_2->inode when getdents64 and rename happens on the same directory.

[Setup]
mkdir(dir_foo, 0777);
open(dir_foo, 0x10000, 0777) = 0;
dup2(0, 199) = 199;

[Thread 1] getdents64(199, <some buffer>, 4469) = 48;
[Thread 2] rename(dir_foo, aaaaa) = 0;

The function call trace is shown below:

[Thread 1: SYS_getdents64]
__do_sys_getdents64
  ksys_getdents64
    iterate_dir
      ext4_readdir
        ext4_dx_readdir
          ext4_htree_fill_tree
            htree_dirblock_to_tree
              [READ] if (de->inode == 0)

[Thread 2: SYS_rename]
__do_renameat2
  do_renameat2
    vfs_rename
      ext4_rename2
        ext4_rename
          ext4_rename_dir_finish
            [WRITE] ent->parent_de->inode = cpu_to_le32(dir_ino);


I could confirm that the WRITE may happen before and after the READ operation by controlling the timing of the two threads, i.e., by setting breakpoints before the WRITE statement.

However, I am not very sure about the implication of such a data race (e.g., causing violations of assumptions). I would appreciate if you could help check on this potential bug and advise whether this is a harmful data race or it
is intended. Thank you!
Comment 1 Eric Sandeen 2019-11-18 23:08:41 UTC
As a suggestion - if you do not know for sure that these are bugs, it might be better to ask these questions the list, as opposed to filing bugs.

Is this from code inspection, or are you using KCSAN or a similar tool?
Comment 2 Meng Xu 2019-11-18 23:10:24 UTC
(In reply to Eric Sandeen from comment #1)
> As a suggestion - if you do not know for sure that these are bugs, it might
> be better to ask these questions the list, as opposed to filing bugs.
> 
> Is this from code inspection, or are you using KCSAN or a similar tool?

I am using a tool developed by ourselves, not KCSAN, it is still a work in progress so it may raise some false positives :(
Comment 3 Meng Xu 2019-11-18 23:11:06 UTC
(In reply to Eric Sandeen from comment #1)
> As a suggestion - if you do not know for sure that these are bugs, it might
> be better to ask these questions the list, as opposed to filing bugs.
> 
> Is this from code inspection, or are you using KCSAN or a similar tool?

Thank you for the suggestion, I'll post them to the list then.
Comment 4 Theodore Tso 2019-11-19 01:29:29 UTC
POSIX specifically states that it is undefined that if there is a rename taking place during a readdir() scan, it is undefined whether the entry will appear or not.

So, not a race.
Comment 5 Theodore Tso 2019-11-19 01:29:55 UTC
(Or rather, it's allowed by the standard, so it's no big deal.)
Comment 6 Meng Xu 2019-11-19 01:32:01 UTC
(In reply to Theodore Tso from comment #5)
> (Or rather, it's allowed by the standard, so it's no big deal.)

Many thanks for the confirmation Ted!
Comment 7 Meng Xu 2019-11-19 03:03:36 UTC
(In reply to Theodore Tso from comment #5)
> (Or rather, it's allowed by the standard, so it's no big deal.)

Hi Ted,

Just a quick thought in my mind: do they need to be wrapped with READ_ONCE, WRITE_ONCE and/or memory barriers so that the visibility of the [WRITE] and all operations before that [WRITE] are seen by the [READ] and after?

Best Regards,
Meng