I am reporting a potential data race (maybe benign) in the ext4 layer on ext4_dir_entry_2->inode when getdents64 and rename happens on the same directory. [Setup] mkdir(dir_foo, 0777); open(dir_foo, 0x10000, 0777) = 0; dup2(0, 199) = 199; [Thread 1] getdents64(199, <some buffer>, 4469) = 48; [Thread 2] rename(dir_foo, aaaaa) = 0; The function call trace is shown below: [Thread 1: SYS_getdents64] __do_sys_getdents64 ksys_getdents64 iterate_dir ext4_readdir ext4_dx_readdir ext4_htree_fill_tree htree_dirblock_to_tree [READ] if (de->inode == 0) [Thread 2: SYS_rename] __do_renameat2 do_renameat2 vfs_rename ext4_rename2 ext4_rename ext4_rename_dir_finish [WRITE] ent->parent_de->inode = cpu_to_le32(dir_ino); I could confirm that the WRITE may happen before and after the READ operation by controlling the timing of the two threads, i.e., by setting breakpoints before the WRITE statement. However, I am not very sure about the implication of such a data race (e.g., causing violations of assumptions). I would appreciate if you could help check on this potential bug and advise whether this is a harmful data race or it is intended. Thank you!
As a suggestion - if you do not know for sure that these are bugs, it might be better to ask these questions the list, as opposed to filing bugs. Is this from code inspection, or are you using KCSAN or a similar tool?
(In reply to Eric Sandeen from comment #1) > As a suggestion - if you do not know for sure that these are bugs, it might > be better to ask these questions the list, as opposed to filing bugs. > > Is this from code inspection, or are you using KCSAN or a similar tool? I am using a tool developed by ourselves, not KCSAN, it is still a work in progress so it may raise some false positives :(
(In reply to Eric Sandeen from comment #1) > As a suggestion - if you do not know for sure that these are bugs, it might > be better to ask these questions the list, as opposed to filing bugs. > > Is this from code inspection, or are you using KCSAN or a similar tool? Thank you for the suggestion, I'll post them to the list then.
POSIX specifically states that it is undefined that if there is a rename taking place during a readdir() scan, it is undefined whether the entry will appear or not. So, not a race.
(Or rather, it's allowed by the standard, so it's no big deal.)
(In reply to Theodore Tso from comment #5) > (Or rather, it's allowed by the standard, so it's no big deal.) Many thanks for the confirmation Ted!
(In reply to Theodore Tso from comment #5) > (Or rather, it's allowed by the standard, so it's no big deal.) Hi Ted, Just a quick thought in my mind: do they need to be wrapped with READ_ONCE, WRITE_ONCE and/or memory barriers so that the visibility of the [WRITE] and all operations before that [WRITE] are seen by the [READ] and after? Best Regards, Meng