Bug 218850

Summary: Unexpected failure when write to a file with two file descriptor
Product: File System Reporter: Chi (zhangchi_seg)
Component: ext4Assignee: fs_ext4 (fs_ext4)
Status: RESOLVED INVALID    
Severity: normal CC: tytso
Priority: P3    
Hardware: All   
OS: Linux   
Kernel Version: Subsystem:
Regression: No Bisected commit-id:
Attachments: reproduce.c

Description Chi 2024-05-16 15:02:08 UTC
Created attachment 306300 [details]
reproduce.c

Hi,

I mounted an ext4 image, created a file, and created a link to it, then I wrote to these two files, and I failed with a specific read and write order. I can reproduce this with the latest linux kernel https://git.kernel.org/torvalds/t/linux-6.9-rc7.tar.gz

The following is the triggering script:
```
dd if=/dev/zero of=ext4-0.img bs=1M count=120
mkfs.ext4 ext4-0.img
g++ -static reproduce.c
losetup /dev/loop0 ext4-0.img
mkdir /root/mnt
./a.out
```

After run the script, you will see the error message:

```
write failure
```

The contents of `reproduce.c` :
```
#include <assert.h>
#include <string.h>
#include <stdlib.h>
#include <stdio.h>
#include <stdarg.h>
#include <stddef.h>
#include <unistd.h>
#include <pthread.h>
#include <errno.h>
#include <dirent.h>

#include <string>

#include <sys/mount.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <sys/ioctl.h>
#include <sys/wait.h>
#include <sys/xattr.h>
#include <sys/mount.h>
#include <sys/statfs.h>
#include <fcntl.h>

#define ALIGN 4096

void* align_alloc(size_t size) {
    void *ptr = NULL;
    int ret = posix_memalign(&ptr, ALIGN, size);
    if (ret) {
      printf("align error\n");
      exit(1);
    }
    return ptr;
}

int main()
{
    char *buf_15 = (char*)align_alloc(4096*20);
    memset(buf_15, 'a', 4096*20);

    char *buf_4 = (char*)align_alloc(4096*20);
    memset(buf_4, 'a', 4096*20);

    mount("/dev/loop0", "/root/mnt", "f2fs", 0, "");

    creat("/root/mnt/a", S_IRWXG);
    link("/root/mnt/a", "/root/mnt/b");
    int fd_a = open("/root/mnt/a", O_RDWR);
    int fd_b = open("/root/mnt/b", O_RDWR | O_DIRECT);

    lseek(fd_a, 100, SEEK_SET); 
    write(fd_a, buf_15, 9900); 

    read(fd_b, buf_4, 73728); 
    
    int state = write(fd_b, buf_15, 65536); 
    if (state == -1) {
        printf("write failure\n");
    }

    return 0;
}

```

If I move the statement `read(fd_b, buf_4, 73728); ` before the first write operation, or modify the size `73728` to a smaller one, such as `63728`, then this script will not fail.

Did I do anything wrong?
Comment 1 Chi 2024-05-16 15:19:41 UTC
So sorry, The mount function call should be `mount("/dev/loop0", "/root/mnt", "ext4", 0, "");`
Comment 2 Chi 2024-05-27 07:28:38 UTC
Could you please help me review this report? I am still able to reproduce the issue with the following test case:

```
#include <assert.h>
#include <string.h>
#include <stdlib.h>
#include <stdio.h>
#include <stdarg.h>
#include <stddef.h>
#include <unistd.h>
#include <pthread.h>
#include <errno.h>
#include <dirent.h>

#include <string>

#include <sys/mount.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <sys/ioctl.h>
#include <sys/wait.h>
#include <sys/xattr.h>
#include <sys/mount.h>
#include <sys/statfs.h>
#include <fcntl.h>

#define ALIGN 4096

void* align_alloc(size_t size) {
    void *ptr = NULL;
    int ret = posix_memalign(&ptr, ALIGN, size);
    if (ret) {
      printf("align error\n");
      exit(1);
    }
    return ptr;
}

int main()
{
    char *buf_15 = (char*)align_alloc(4096*20);
    memset(buf_15, 'a', 4096*20);

    char *buf_4 = (char*)align_alloc(4096*20);
    memset(buf_4, 'a', 4096*20);

    mount("/dev/loop0", "/root/mnt", "ext4", 0, "");

    creat("/root/mnt/a", S_IRWXG);
    link("/root/mnt/a", "/root/mnt/b");
    int fd_a = open("/root/mnt/a", O_RDWR);
    int fd_b = open("/root/mnt/b", O_RDWR | O_DIRECT);

    lseek(fd_a, 100, SEEK_SET); 
    write(fd_a, buf_15, 9900); 

    read(fd_b, buf_4, 73728); 
    
    int state = write(fd_b, buf_15, 65536); 
    if (state == -1) {
        printf("write failure\n");
    }

    return 0;
}
```
Comment 3 Theodore Tso 2024-05-27 21:14:06 UTC
Hint:   check the return value of all system calls.   In particular, check to see what the read(fd_b, buf_4, 73728) returns.    Check to see what the size of the file is after write(fd_a, buf_15, 9900), and then reflect on what happens if the read ends up hitting the end of file marker, and what the offset of fd_b is after the short read when hitting EOF.

Finally, read the documentation for the O_DIRECT flag in the NOTES section of the open man page[1], and understand what the requirements are for O_DIRECT writes, in particular about the alignment requirements are of the starting offset when performing an O_DIRECT write (or O_DIRECT) read.   Then also check on the errno return (for example replace the printf("write failure\n") with perror("write").

[1] https://man7.org/linux/man-pages/man2/open.2.html

In any case, this is not a bug, and this is not a good place for you to be asking for instruction in basic Unix system call programming.