Bug 205567 - potential (possibly benign) data race on ext4_dir_entry_2->inode when getdents64 and rename happens on the same directory
Summary: potential (possibly benign) data race on ext4_dir_entry_2->inode when getdent...
Status: RESOLVED WILL_NOT_FIX
Alias: None
Product: File System
Classification: Unclassified
Component: ext4 (show other bugs)
Hardware: All Linux
: P1 normal
Assignee: fs_ext4@kernel-bugs.osdl.org
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2019-11-18 20:15 UTC by Meng Xu
Modified: 2019-11-19 03:03 UTC (History)
2 users (show)

See Also:
Kernel Version: 5.4-rc5
Subsystem:
Regression: No
Bisected commit-id:


Attachments

Description Meng Xu 2019-11-18 20:15:09 UTC
I am reporting a potential data race (maybe benign) in the ext4 layer on  ext4_dir_entry_2->inode when getdents64 and rename happens on the same directory.

[Setup]
mkdir(dir_foo, 0777);
open(dir_foo, 0x10000, 0777) = 0;
dup2(0, 199) = 199;

[Thread 1] getdents64(199, <some buffer>, 4469) = 48;
[Thread 2] rename(dir_foo, aaaaa) = 0;

The function call trace is shown below:

[Thread 1: SYS_getdents64]
__do_sys_getdents64
  ksys_getdents64
    iterate_dir
      ext4_readdir
        ext4_dx_readdir
          ext4_htree_fill_tree
            htree_dirblock_to_tree
              [READ] if (de->inode == 0)

[Thread 2: SYS_rename]
__do_renameat2
  do_renameat2
    vfs_rename
      ext4_rename2
        ext4_rename
          ext4_rename_dir_finish
            [WRITE] ent->parent_de->inode = cpu_to_le32(dir_ino);


I could confirm that the WRITE may happen before and after the READ operation by controlling the timing of the two threads, i.e., by setting breakpoints before the WRITE statement.

However, I am not very sure about the implication of such a data race (e.g., causing violations of assumptions). I would appreciate if you could help check on this potential bug and advise whether this is a harmful data race or it
is intended. Thank you!
Comment 1 Eric Sandeen 2019-11-18 23:08:41 UTC
As a suggestion - if you do not know for sure that these are bugs, it might be better to ask these questions the list, as opposed to filing bugs.

Is this from code inspection, or are you using KCSAN or a similar tool?
Comment 2 Meng Xu 2019-11-18 23:10:24 UTC
(In reply to Eric Sandeen from comment #1)
> As a suggestion - if you do not know for sure that these are bugs, it might
> be better to ask these questions the list, as opposed to filing bugs.
> 
> Is this from code inspection, or are you using KCSAN or a similar tool?

I am using a tool developed by ourselves, not KCSAN, it is still a work in progress so it may raise some false positives :(
Comment 3 Meng Xu 2019-11-18 23:11:06 UTC
(In reply to Eric Sandeen from comment #1)
> As a suggestion - if you do not know for sure that these are bugs, it might
> be better to ask these questions the list, as opposed to filing bugs.
> 
> Is this from code inspection, or are you using KCSAN or a similar tool?

Thank you for the suggestion, I'll post them to the list then.
Comment 4 Theodore Tso 2019-11-19 01:29:29 UTC
POSIX specifically states that it is undefined that if there is a rename taking place during a readdir() scan, it is undefined whether the entry will appear or not.

So, not a race.
Comment 5 Theodore Tso 2019-11-19 01:29:55 UTC
(Or rather, it's allowed by the standard, so it's no big deal.)
Comment 6 Meng Xu 2019-11-19 01:32:01 UTC
(In reply to Theodore Tso from comment #5)
> (Or rather, it's allowed by the standard, so it's no big deal.)

Many thanks for the confirmation Ted!
Comment 7 Meng Xu 2019-11-19 03:03:36 UTC
(In reply to Theodore Tso from comment #5)
> (Or rather, it's allowed by the standard, so it's no big deal.)

Hi Ted,

Just a quick thought in my mind: do they need to be wrapped with READ_ONCE, WRITE_ONCE and/or memory barriers so that the visibility of the [WRITE] and all operations before that [WRITE] are seen by the [READ] and after?

Best Regards,
Meng

Note You need to log in before you can comment on or make changes to this bug.