Bug 218180

Summary: ntfs3: empty file on update without forced cache drop
Product: File System Reporter: Giovanni Santini (giovannisantini93)
Component: OtherAssignee: fs_other
Status: RESOLVED CODE_FIX    
Severity: normal CC: almaz.alexandrovich, bagasdotme, eslaamber, regressions
Priority: P3    
Hardware: All   
OS: Linux   
Kernel Version: 6.2 Subsystem:
Regression: Yes Bisected commit-id: ad26a9c84510af7252e582e811de970433a9758f
Attachments: NVME info
The test script I run and its output
system journalctl
user journalctl
dmesg
Bisection log
Test output for mainline with commit reverted

Description Giovanni Santini 2023-11-22 12:47:09 UTC
Hi,
After reporting the bug to my distribution:
https://bugs.archlinux.org/task/80283
I've decided to report the issue here, since using a "clean" kernel does not solve my issue.

The issue appears on every release after 6.2 (6.1 releases do not show the issue).

The problem I am facing is the following:
1. I mount an NTFS partition via NTFS3
2. I create a file
3. I write to the file
4. The file is empty
5. I remount the partition
6. The file has the changes I made before the remount

I can avoid the remount by doing:
sudo sysctl vm.drop_caches=3

I would like some help in figuring out why this happens.
I can rebuild the kernel and testing stuff out.

Here is the log (taken from the ArchLinux issue) of my shell presenting the issue:

(17:17) giovanni @ ~ $ udisksctl mount -b /dev/nvme0n1p5 -t ntfs3
Mounted /dev/nvme0n1p5 at /run/media/giovanni/Data
(17:17) giovanni @ ~ $ cat /run/media/giovanni/Data/mount_test.txt
Using udisks and ntfs3
(17:17) giovanni @ ~ $ echo "Using udisks and ntfs3" > /run/media/giovanni/Data/mount_test.txt
(17:18) giovanni @ ~ $ cat /run/media/giovanni/Data/mount_test.txt
(17:18) giovanni @ ~ $ sync; sudo sysctl vm.drop_caches=3
vm.drop_caches = 3
(17:18) giovanni @ ~ $ cat /run/media/giovanni/Data/mount_test.txt
Using udisks and ntfs3
(17:18) giovanni @ ~ $ echo "Using udisks and ntfs3 again" >
/run/media/giovanni/Data/mount_test.txt
(17:18) giovanni @ ~ $ cat /run/media/giovanni/Data/mount_test.txt
(17:18) giovanni @ ~ $ sync
(17:18) giovanni @ ~ $ cat /run/media/giovanni/Data/mount_test.txt
(17:18) giovanni @ ~ $ sync; sudo sysctl vm.drop_caches=3
vm.drop_caches = 3
(17:19) giovanni @ ~ $ cat /run/media/giovanni/Data/mount_test.txt
Using udisks and ntfs3 again
(17:19) giovanni @ ~ $

Using "mount" shows the same issue.
Comment 1 Bagas Sanjaya 2023-11-23 00:37:36 UTC
(In reply to Giovanni Santini from comment #0)
> Hi,
> After reporting the bug to my distribution:
> https://bugs.archlinux.org/task/80283
> I've decided to report the issue here, since using a "clean" kernel does not
> solve my issue.
> 
> The issue appears on every release after 6.2 (6.1 releases do not show the
> issue).
> 
> The problem I am facing is the following:
> 1. I mount an NTFS partition via NTFS3
> 2. I create a file
> 3. I write to the file
> 4. The file is empty
> 5. I remount the partition
> 6. The file has the changes I made before the remount
> 
> I can avoid the remount by doing:
> sudo sysctl vm.drop_caches=3
> 
> I would like some help in figuring out why this happens.
> I can rebuild the kernel and testing stuff out.
> 
> Here is the log (taken from the ArchLinux issue) of my shell presenting the
> issue:
> 
> (17:17) giovanni @ ~ $ udisksctl mount -b /dev/nvme0n1p5 -t ntfs3
> Mounted /dev/nvme0n1p5 at /run/media/giovanni/Data
> (17:17) giovanni @ ~ $ cat /run/media/giovanni/Data/mount_test.txt
> Using udisks and ntfs3
> (17:17) giovanni @ ~ $ echo "Using udisks and ntfs3" >
> /run/media/giovanni/Data/mount_test.txt
> (17:18) giovanni @ ~ $ cat /run/media/giovanni/Data/mount_test.txt
> (17:18) giovanni @ ~ $ sync; sudo sysctl vm.drop_caches=3
> vm.drop_caches = 3
> (17:18) giovanni @ ~ $ cat /run/media/giovanni/Data/mount_test.txt
> Using udisks and ntfs3
> (17:18) giovanni @ ~ $ echo "Using udisks and ntfs3 again" >
> /run/media/giovanni/Data/mount_test.txt
> (17:18) giovanni @ ~ $ cat /run/media/giovanni/Data/mount_test.txt
> (17:18) giovanni @ ~ $ sync
> (17:18) giovanni @ ~ $ cat /run/media/giovanni/Data/mount_test.txt
> (17:18) giovanni @ ~ $ sync; sudo sysctl vm.drop_caches=3
> vm.drop_caches = 3
> (17:19) giovanni @ ~ $ cat /run/media/giovanni/Data/mount_test.txt
> Using udisks and ntfs3 again
> (17:19) giovanni @ ~ $
> 
> Using "mount" shows the same issue.

First, check latest mainline (currently v6.7-rc2). Then, if this regression still persists, do bisection (see Documentation/admin-guide/bug-bisect.rst in the kernel sources for reference).
Comment 2 Bagas Sanjaya 2023-11-23 02:31:46 UTC
(In reply to Giovanni Santini from comment #0)
> Hi,
> After reporting the bug to my distribution:
> https://bugs.archlinux.org/task/80283
> I've decided to report the issue here, since using a "clean" kernel does not
> solve my issue.
> 
> The issue appears on every release after 6.2 (6.1 releases do not show the
> issue).
> 
> The problem I am facing is the following:
> 1. I mount an NTFS partition via NTFS3
> 2. I create a file
> 3. I write to the file
> 4. The file is empty
> 5. I remount the partition
> 6. The file has the changes I made before the remount
> 
> I can avoid the remount by doing:
> sudo sysctl vm.drop_caches=3
> 
> I would like some help in figuring out why this happens.
> I can rebuild the kernel and testing stuff out.
> 
> Here is the log (taken from the ArchLinux issue) of my shell presenting the
> issue:
> 
> (17:17) giovanni @ ~ $ udisksctl mount -b /dev/nvme0n1p5 -t ntfs3
> Mounted /dev/nvme0n1p5 at /run/media/giovanni/Data
> (17:17) giovanni @ ~ $ cat /run/media/giovanni/Data/mount_test.txt
> Using udisks and ntfs3
> (17:17) giovanni @ ~ $ echo "Using udisks and ntfs3" >
> /run/media/giovanni/Data/mount_test.txt
> (17:18) giovanni @ ~ $ cat /run/media/giovanni/Data/mount_test.txt
> (17:18) giovanni @ ~ $ sync; sudo sysctl vm.drop_caches=3
> vm.drop_caches = 3
> (17:18) giovanni @ ~ $ cat /run/media/giovanni/Data/mount_test.txt
> Using udisks and ntfs3
> (17:18) giovanni @ ~ $ echo "Using udisks and ntfs3 again" >
> /run/media/giovanni/Data/mount_test.txt
> (17:18) giovanni @ ~ $ cat /run/media/giovanni/Data/mount_test.txt
> (17:18) giovanni @ ~ $ sync
> (17:18) giovanni @ ~ $ cat /run/media/giovanni/Data/mount_test.txt
> (17:18) giovanni @ ~ $ sync; sudo sysctl vm.drop_caches=3
> vm.drop_caches = 3
> (17:19) giovanni @ ~ $ cat /run/media/giovanni/Data/mount_test.txt
> Using udisks and ntfs3 again
> (17:19) giovanni @ ~ $
> 
> Using "mount" shows the same issue.

Hi Giovanni,

I can't reproduce this regression on kernel v6.7-rc2 using both loop device and my SanDisk flash drive.
Comment 3 Giovanni Santini 2023-11-23 22:51:37 UTC
Hello,

While I do understand your point, I still have the issue...
I tried wiping and setting an USB flash drive but also that won't work.
I can try seeing whether a loop device will work or not; how did you do it?

I am building the latest mainline RC for testing.

Is there any log I can provide you to continue the investigation?
Comment 4 Bagas Sanjaya 2023-11-24 06:48:57 UTC
(In reply to Giovanni Santini from comment #3)
> Hello,
> 
> While I do understand your point, I still have the issue...
> I tried wiping and setting an USB flash drive but also that won't work.
> I can try seeing whether a loop device will work or not; how did you do it?
> 
> I am building the latest mainline RC for testing.
> 
> Is there any log I can provide you to continue the investigation?

What is your NVMe device? And can you attach full dmesg output?
Comment 5 Giovanni Santini 2023-11-24 08:50:24 UTC
Created attachment 305468 [details]
NVME info
Comment 6 Giovanni Santini 2023-11-24 08:52:36 UTC
(In reply to Bagas Sanjaya from comment #4)
> What is your NVMe device? And can you attach full dmesg output?

I've attached the output of `nvme-cli` above.

I can for sure share the dmesg output.
What do you want me to do specifically?
Mount / write / unmount?
Comment 7 Bagas Sanjaya 2023-11-24 14:11:47 UTC
(In reply to Giovanni Santini from comment #6)
> (In reply to Bagas Sanjaya from comment #4)
> > What is your NVMe device? And can you attach full dmesg output?
> 
> I've attached the output of `nvme-cli` above.
> 
> I can for sure share the dmesg output.
> What do you want me to do specifically?
> Mount / write / unmount?

dmesg is for kernel developers to aid debugging.

As for testing mainline, make sure to repeat the reproducer exactly as it is.
Comment 8 Giovanni Santini 2023-11-25 22:49:06 UTC
(In reply to Bagas Sanjaya from comment #7)
> 
> dmesg is for kernel developers to aid debugging.
> 
> As for testing mainline, make sure to repeat the reproducer exactly as it is.

Alright, I am going to attach all logs I could think of:
1. A shell session where I show my ntfs3 test script and run it, showing that I still have the issue on Linux 6.7 RC2.
2. All journalctl system log
3. All journalctl user log
4. All dmesg log

Let me know if you want other logs :)
Comment 9 Giovanni Santini 2023-11-25 22:49:51 UTC
Created attachment 305474 [details]
The test script I run and its output
Comment 10 Giovanni Santini 2023-11-25 22:50:14 UTC
Created attachment 305475 [details]
system journalctl
Comment 11 Giovanni Santini 2023-11-25 22:50:30 UTC
Created attachment 305476 [details]
user journalctl
Comment 12 Giovanni Santini 2023-11-25 22:50:51 UTC
Created attachment 305477 [details]
dmesg
Comment 13 The Linux kernel's regression tracker (Thorsten Leemhuis) 2023-11-26 12:27:04 UTC
I might be missing something here, but why did Artem close this as RESOLVED CODE_FIX? 

Anyway, as Bagas can't reproduce this it's likely something pretty specific. Forwarding the problem to the maintainers likely won't help much. But what really could help is a git bisection between 6.1 and 6.2. Could you perform one Giovanni Santini?
Comment 14 Giovanni Santini 2023-11-26 19:36:55 UTC
Hi Thorsten,

I am unsure why the issue was closed as "resolved" and "code fix", since the mainline kernel didn't work for me...

I agree it is something obscure with my machine, since even the Arch dev who supported me couldn't replicate it.

I am working on the bisect already, currently at this point:
---
$ git bisect log
git bisect start
# status: waiting for both good and bad commits
# good: [830b3c68c1fb1e9176028d02ef86f3cf76aa2476] Linux 6.1
git bisect good 830b3c68c1fb1e9176028d02ef86f3cf76aa2476
# status: waiting for bad commit, 1 good commit known
# bad: [c9c3395d5e3dcc6daee66c6908354d47bf98cb0c] Linux 6.2
git bisect bad c9c3395d5e3dcc6daee66c6908354d47bf98cb0c
# good: [1ca06f1c1acecbe02124f14a37cce347b8c1a90c] Merge tag 'xtensa-20221213' of https://github.com/jcmvbkbc/linux-xtensa
git bisect good 1ca06f1c1acecbe02124f14a37cce347b8c1a90c
# good: [b83a7080d30032cf70832bc2bb04cc342e203b88] Merge tag 'staging-6.2-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging
git bisect good b83a7080d30032cf70832bc2bb04cc342e203b88
# bad: [06d65a6f640118430b894273914aa8d62d2cf637] Merge tag 'mips_6.2_1' of git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux
git bisect bad 06d65a6f640118430b894273914aa8d62d2cf637
# good: [a6e3e6f138058ff184d8ef5064a033b3f5fee8f8] Merge tag 'mm-nonmm-stable-2022-12-17-20-32' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
git bisect good a6e3e6f138058ff184d8ef5064a033b3f5fee8f8
# good: [3c202d14a9d73fb63c3dccb18feac5618c21e1c4] prandom: remove prandom_u32_max()
git bisect good 3c202d14a9d73fb63c3dccb18feac5618c21e1c4
# good: [f2855eec19cadddad2900da3a009ee39df6116a7] Merge tag 'mailbox-v6.2' of git://git.linaro.org/landing-teams/working/fujitsu/integration
git bisect good f2855eec19cadddad2900da3a009ee39df6116a7
# bad: [6022ec6ee2c3a16b26f218d7abb538afb839bd6d] Merge tag 'ntfs3_for_6.2' of https://github.com/Paragon-Software-Group/linux-ntfs3
git bisect bad 6022ec6ee2c3a16b26f218d7abb538afb839bd6d
# good: [5461e079009ae2732c833281c4b50dfb58d15ba5] Merge tag 'media/v6.2-2' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media
git bisect good 5461e079009ae2732c833281c4b50dfb58d15ba5
# good: [0d19f3d71394b0b03b8775c958b3354fa2259609] fs/ntfs3: Add system.ntfs_attrib_be extended attribute
git bisect good 0d19f3d71394b0b03b8775c958b3354fa2259609
---

I think it should be another 4 steps. Will update you :)
Comment 15 Giovanni Santini 2023-11-26 22:06:46 UTC
I finished the bisection, the commit that causes the issue for me is:
ad26a9c84510af7252e582e811de970433a9758f

git log -1 ad26a9c84510af7252e582e811de970433a9758f
commit ad26a9c84510af7252e582e811de970433a9758f (HEAD)
Author: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>
Date:   Fri Oct 7 20:08:06 2022 +0300

    fs/ntfs3: Fixing wrong logic in attr_set_size and ntfs_fallocate
    
    There were 2 problems:
    - in some cases we lost dirty flag;
    - cluster allocation can be called even when it wasn't needed.
    Fixes xfstest generic/465
    
    Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>

I am attaching the bisection log, maybe it is needed.
Comment 16 Giovanni Santini 2023-11-26 22:08:40 UTC
Created attachment 305479 [details]
Bisection log
Comment 17 The Linux kernel's regression tracker (Thorsten Leemhuis) 2023-11-27 06:27:36 UTC
(In reply to Giovanni Santini from comment #14)
> I am unsure why the issue was closed as "resolved" and "code fix", since the
> mainline kernel didn't work for me...

I guess Artem thought a comment Bagas made came from you. 

Anyway, thx for the tests and the bisection. I forwarded the issue my mail to the developer that has to handle this: https://lore.kernel.org/regressions/138ed123-0f84-4d7a-8a17-67fe2418cf29@leemhuis.info/
Comment 18 The Linux kernel's regression tracker (Thorsten Leemhuis) 2023-11-27 06:28:56 UTC
Forgot: If you have a minute, could you try reverting the commit ontop of 6.7-rc2 -- and if that works check if that fixes things?
Comment 19 Giovanni Santini 2023-11-27 14:57:25 UTC
(In reply to The Linux kernel's regression tracker (Thorsten Leemhuis) from comment #18)
> Forgot: If you have a minute, could you try reverting the commit ontop of
> 6.7-rc2 -- and if that works check if that fixes things?

Sure can do!
Will do it soon :)
Comment 20 Giovanni Santini 2023-11-27 16:06:29 UTC
Hello! I confirmed that mainline with that commit reverted does not present the issue.
Reverting the commit solves my problem.

I am attaching the shell log, recorded via script.
Comment 21 Giovanni Santini 2023-11-27 16:07:15 UTC
Created attachment 305484 [details]
Test output for mainline with commit reverted
Comment 22 Giovanni Santini 2023-11-30 11:00:41 UTC
Hello again,
Any progress regarding the issue?
Is there anything I can do to help? :)
/ Giovanni
Comment 23 The Linux kernel's regression tracker (Thorsten Leemhuis) 2023-11-30 13:00:58 UTC
(In reply to Giovanni Santini from comment #22)
> Any progress regarding the issue?

No reply to the mail, so doesn't look like it.

> Is there anything I can do to help? :)

Not that I can see. We really need a ntfs3 expert now, as reverting the change might cause more trouble then it solves for other use cases.
Comment 24 almaz.alexandrovich 2024-01-26 12:44:08 UTC
(In reply to Giovanni Santini from comment #22)
> Hello again,
> Any progress regarding the issue?
> Is there anything I can do to help? :)
> / Giovanni

Hello Giovanni,

I am in the final stages of checking the reported bug and will be ready to upload the patch within a couple of working days. I was able to reproduce the bug for a compressed file.
Is the directory in which you created the file compressed?

Best regards,
Konstantin Komarov
Comment 25 Giovanni Santini 2024-01-30 09:00:56 UTC
Hi Konstantin

(In reply to almaz.alexandrovich from comment #24)
> ...
> Is the directory in which you created the file compressed?
> 

I have set compression on the root node of the partition, so I believe so.
I also verified that creating and writing to a file in a compressed folder shows the issue, while doing so in an uncompressed folder works fine.
Comment 26 Artem S. Tashkinov 2024-02-03 06:55:56 UTC
Please write to https://www.spinics.net/lists/ntfs3/index.html

NTFS3 developers are not here and don't see any messages posted in this bug report.
Comment 27 Artem S. Tashkinov 2024-02-03 06:57:36 UTC
Though my bad, never mind, Konstantin is here. Sorry for the noise.
Comment 28 Giovanni Santini 2024-02-13 13:45:18 UTC
Tried the latest mainline kernel via the ArchLinux package `linux-mainline` and the issue seems to be fixed. My script is happy with both a compressed and a normal file :)

Is the change being backported to the LTS releases too?
Comment 29 The Linux kernel's regression tracker (Thorsten Leemhuis) 2024-02-13 14:08:35 UTC
(In reply to Giovanni Santini from comment #28)
> Is the change being backported to the LTS releases too?

Good question I already brought up, but no reply yet from Konstantin:
https://lore.kernel.org/all/bdea4053-b978-489d-a4a2-927685eee4a8@leemhuis.info/
Comment 30 Giovanni Santini 2024-03-05 15:41:16 UTC
Patches seems to have landed on 6.6 LTS and 6.7, I can use compressed files with no issues :)

I suppose the patches landed also on the other LTS kernels, will mark as solved.