Bug 218180 - ntfs3: empty file on update without forced cache drop
Summary: ntfs3: empty file on update without forced cache drop
Status: RESOLVED CODE_FIX
Alias: None
Product: File System
Classification: Unclassified
Component: Other (show other bugs)
Hardware: All Linux
: P3 normal
Assignee: fs_other
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2023-11-22 12:47 UTC by Giovanni Santini
Modified: 2024-03-05 15:41 UTC (History)
3 users (show)

See Also:
Kernel Version: 6.2
Subsystem:
Regression: Yes
Bisected commit-id: ad26a9c84510af7252e582e811de970433a9758f


Attachments
NVME info (2.61 KB, text/plain)
2023-11-24 08:50 UTC, Giovanni Santini
Details
The test script I run and its output (1.78 KB, text/plain)
2023-11-25 22:49 UTC, Giovanni Santini
Details
system journalctl (190.14 KB, text/plain)
2023-11-25 22:50 UTC, Giovanni Santini
Details
user journalctl (7.48 KB, text/plain)
2023-11-25 22:50 UTC, Giovanni Santini
Details
dmesg (94.69 KB, text/plain)
2023-11-25 22:50 UTC, Giovanni Santini
Details
Bisection log (2.84 KB, text/plain)
2023-11-26 22:08 UTC, Giovanni Santini
Details
Test output for mainline with commit reverted (2.87 KB, text/plain)
2023-11-27 16:07 UTC, Giovanni Santini
Details

Description Giovanni Santini 2023-11-22 12:47:09 UTC
Hi,
After reporting the bug to my distribution:
https://bugs.archlinux.org/task/80283
I've decided to report the issue here, since using a "clean" kernel does not solve my issue.

The issue appears on every release after 6.2 (6.1 releases do not show the issue).

The problem I am facing is the following:
1. I mount an NTFS partition via NTFS3
2. I create a file
3. I write to the file
4. The file is empty
5. I remount the partition
6. The file has the changes I made before the remount

I can avoid the remount by doing:
sudo sysctl vm.drop_caches=3

I would like some help in figuring out why this happens.
I can rebuild the kernel and testing stuff out.

Here is the log (taken from the ArchLinux issue) of my shell presenting the issue:

(17:17) giovanni @ ~ $ udisksctl mount -b /dev/nvme0n1p5 -t ntfs3
Mounted /dev/nvme0n1p5 at /run/media/giovanni/Data
(17:17) giovanni @ ~ $ cat /run/media/giovanni/Data/mount_test.txt
Using udisks and ntfs3
(17:17) giovanni @ ~ $ echo "Using udisks and ntfs3" > /run/media/giovanni/Data/mount_test.txt
(17:18) giovanni @ ~ $ cat /run/media/giovanni/Data/mount_test.txt
(17:18) giovanni @ ~ $ sync; sudo sysctl vm.drop_caches=3
vm.drop_caches = 3
(17:18) giovanni @ ~ $ cat /run/media/giovanni/Data/mount_test.txt
Using udisks and ntfs3
(17:18) giovanni @ ~ $ echo "Using udisks and ntfs3 again" >
/run/media/giovanni/Data/mount_test.txt
(17:18) giovanni @ ~ $ cat /run/media/giovanni/Data/mount_test.txt
(17:18) giovanni @ ~ $ sync
(17:18) giovanni @ ~ $ cat /run/media/giovanni/Data/mount_test.txt
(17:18) giovanni @ ~ $ sync; sudo sysctl vm.drop_caches=3
vm.drop_caches = 3
(17:19) giovanni @ ~ $ cat /run/media/giovanni/Data/mount_test.txt
Using udisks and ntfs3 again
(17:19) giovanni @ ~ $

Using "mount" shows the same issue.
Comment 1 Bagas Sanjaya 2023-11-23 00:37:36 UTC
(In reply to Giovanni Santini from comment #0)
> Hi,
> After reporting the bug to my distribution:
> https://bugs.archlinux.org/task/80283
> I've decided to report the issue here, since using a "clean" kernel does not
> solve my issue.
> 
> The issue appears on every release after 6.2 (6.1 releases do not show the
> issue).
> 
> The problem I am facing is the following:
> 1. I mount an NTFS partition via NTFS3
> 2. I create a file
> 3. I write to the file
> 4. The file is empty
> 5. I remount the partition
> 6. The file has the changes I made before the remount
> 
> I can avoid the remount by doing:
> sudo sysctl vm.drop_caches=3
> 
> I would like some help in figuring out why this happens.
> I can rebuild the kernel and testing stuff out.
> 
> Here is the log (taken from the ArchLinux issue) of my shell presenting the
> issue:
> 
> (17:17) giovanni @ ~ $ udisksctl mount -b /dev/nvme0n1p5 -t ntfs3
> Mounted /dev/nvme0n1p5 at /run/media/giovanni/Data
> (17:17) giovanni @ ~ $ cat /run/media/giovanni/Data/mount_test.txt
> Using udisks and ntfs3
> (17:17) giovanni @ ~ $ echo "Using udisks and ntfs3" >
> /run/media/giovanni/Data/mount_test.txt
> (17:18) giovanni @ ~ $ cat /run/media/giovanni/Data/mount_test.txt
> (17:18) giovanni @ ~ $ sync; sudo sysctl vm.drop_caches=3
> vm.drop_caches = 3
> (17:18) giovanni @ ~ $ cat /run/media/giovanni/Data/mount_test.txt
> Using udisks and ntfs3
> (17:18) giovanni @ ~ $ echo "Using udisks and ntfs3 again" >
> /run/media/giovanni/Data/mount_test.txt
> (17:18) giovanni @ ~ $ cat /run/media/giovanni/Data/mount_test.txt
> (17:18) giovanni @ ~ $ sync
> (17:18) giovanni @ ~ $ cat /run/media/giovanni/Data/mount_test.txt
> (17:18) giovanni @ ~ $ sync; sudo sysctl vm.drop_caches=3
> vm.drop_caches = 3
> (17:19) giovanni @ ~ $ cat /run/media/giovanni/Data/mount_test.txt
> Using udisks and ntfs3 again
> (17:19) giovanni @ ~ $
> 
> Using "mount" shows the same issue.

First, check latest mainline (currently v6.7-rc2). Then, if this regression still persists, do bisection (see Documentation/admin-guide/bug-bisect.rst in the kernel sources for reference).
Comment 2 Bagas Sanjaya 2023-11-23 02:31:46 UTC
(In reply to Giovanni Santini from comment #0)
> Hi,
> After reporting the bug to my distribution:
> https://bugs.archlinux.org/task/80283
> I've decided to report the issue here, since using a "clean" kernel does not
> solve my issue.
> 
> The issue appears on every release after 6.2 (6.1 releases do not show the
> issue).
> 
> The problem I am facing is the following:
> 1. I mount an NTFS partition via NTFS3
> 2. I create a file
> 3. I write to the file
> 4. The file is empty
> 5. I remount the partition
> 6. The file has the changes I made before the remount
> 
> I can avoid the remount by doing:
> sudo sysctl vm.drop_caches=3
> 
> I would like some help in figuring out why this happens.
> I can rebuild the kernel and testing stuff out.
> 
> Here is the log (taken from the ArchLinux issue) of my shell presenting the
> issue:
> 
> (17:17) giovanni @ ~ $ udisksctl mount -b /dev/nvme0n1p5 -t ntfs3
> Mounted /dev/nvme0n1p5 at /run/media/giovanni/Data
> (17:17) giovanni @ ~ $ cat /run/media/giovanni/Data/mount_test.txt
> Using udisks and ntfs3
> (17:17) giovanni @ ~ $ echo "Using udisks and ntfs3" >
> /run/media/giovanni/Data/mount_test.txt
> (17:18) giovanni @ ~ $ cat /run/media/giovanni/Data/mount_test.txt
> (17:18) giovanni @ ~ $ sync; sudo sysctl vm.drop_caches=3
> vm.drop_caches = 3
> (17:18) giovanni @ ~ $ cat /run/media/giovanni/Data/mount_test.txt
> Using udisks and ntfs3
> (17:18) giovanni @ ~ $ echo "Using udisks and ntfs3 again" >
> /run/media/giovanni/Data/mount_test.txt
> (17:18) giovanni @ ~ $ cat /run/media/giovanni/Data/mount_test.txt
> (17:18) giovanni @ ~ $ sync
> (17:18) giovanni @ ~ $ cat /run/media/giovanni/Data/mount_test.txt
> (17:18) giovanni @ ~ $ sync; sudo sysctl vm.drop_caches=3
> vm.drop_caches = 3
> (17:19) giovanni @ ~ $ cat /run/media/giovanni/Data/mount_test.txt
> Using udisks and ntfs3 again
> (17:19) giovanni @ ~ $
> 
> Using "mount" shows the same issue.

Hi Giovanni,

I can't reproduce this regression on kernel v6.7-rc2 using both loop device and my SanDisk flash drive.
Comment 3 Giovanni Santini 2023-11-23 22:51:37 UTC
Hello,

While I do understand your point, I still have the issue...
I tried wiping and setting an USB flash drive but also that won't work.
I can try seeing whether a loop device will work or not; how did you do it?

I am building the latest mainline RC for testing.

Is there any log I can provide you to continue the investigation?
Comment 4 Bagas Sanjaya 2023-11-24 06:48:57 UTC
(In reply to Giovanni Santini from comment #3)
> Hello,
> 
> While I do understand your point, I still have the issue...
> I tried wiping and setting an USB flash drive but also that won't work.
> I can try seeing whether a loop device will work or not; how did you do it?
> 
> I am building the latest mainline RC for testing.
> 
> Is there any log I can provide you to continue the investigation?

What is your NVMe device? And can you attach full dmesg output?
Comment 5 Giovanni Santini 2023-11-24 08:50:24 UTC
Created attachment 305468 [details]
NVME info
Comment 6 Giovanni Santini 2023-11-24 08:52:36 UTC
(In reply to Bagas Sanjaya from comment #4)
> What is your NVMe device? And can you attach full dmesg output?

I've attached the output of `nvme-cli` above.

I can for sure share the dmesg output.
What do you want me to do specifically?
Mount / write / unmount?
Comment 7 Bagas Sanjaya 2023-11-24 14:11:47 UTC
(In reply to Giovanni Santini from comment #6)
> (In reply to Bagas Sanjaya from comment #4)
> > What is your NVMe device? And can you attach full dmesg output?
> 
> I've attached the output of `nvme-cli` above.
> 
> I can for sure share the dmesg output.
> What do you want me to do specifically?
> Mount / write / unmount?

dmesg is for kernel developers to aid debugging.

As for testing mainline, make sure to repeat the reproducer exactly as it is.
Comment 8 Giovanni Santini 2023-11-25 22:49:06 UTC
(In reply to Bagas Sanjaya from comment #7)
> 
> dmesg is for kernel developers to aid debugging.
> 
> As for testing mainline, make sure to repeat the reproducer exactly as it is.

Alright, I am going to attach all logs I could think of:
1. A shell session where I show my ntfs3 test script and run it, showing that I still have the issue on Linux 6.7 RC2.
2. All journalctl system log
3. All journalctl user log
4. All dmesg log

Let me know if you want other logs :)
Comment 9 Giovanni Santini 2023-11-25 22:49:51 UTC
Created attachment 305474 [details]
The test script I run and its output
Comment 10 Giovanni Santini 2023-11-25 22:50:14 UTC
Created attachment 305475 [details]
system journalctl
Comment 11 Giovanni Santini 2023-11-25 22:50:30 UTC
Created attachment 305476 [details]
user journalctl
Comment 12 Giovanni Santini 2023-11-25 22:50:51 UTC
Created attachment 305477 [details]
dmesg
Comment 13 The Linux kernel's regression tracker (Thorsten Leemhuis) 2023-11-26 12:27:04 UTC
I might be missing something here, but why did Artem close this as RESOLVED CODE_FIX? 

Anyway, as Bagas can't reproduce this it's likely something pretty specific. Forwarding the problem to the maintainers likely won't help much. But what really could help is a git bisection between 6.1 and 6.2. Could you perform one Giovanni Santini?
Comment 14 Giovanni Santini 2023-11-26 19:36:55 UTC
Hi Thorsten,

I am unsure why the issue was closed as "resolved" and "code fix", since the mainline kernel didn't work for me...

I agree it is something obscure with my machine, since even the Arch dev who supported me couldn't replicate it.

I am working on the bisect already, currently at this point:
---
$ git bisect log
git bisect start
# status: waiting for both good and bad commits
# good: [830b3c68c1fb1e9176028d02ef86f3cf76aa2476] Linux 6.1
git bisect good 830b3c68c1fb1e9176028d02ef86f3cf76aa2476
# status: waiting for bad commit, 1 good commit known
# bad: [c9c3395d5e3dcc6daee66c6908354d47bf98cb0c] Linux 6.2
git bisect bad c9c3395d5e3dcc6daee66c6908354d47bf98cb0c
# good: [1ca06f1c1acecbe02124f14a37cce347b8c1a90c] Merge tag 'xtensa-20221213' of https://github.com/jcmvbkbc/linux-xtensa
git bisect good 1ca06f1c1acecbe02124f14a37cce347b8c1a90c
# good: [b83a7080d30032cf70832bc2bb04cc342e203b88] Merge tag 'staging-6.2-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging
git bisect good b83a7080d30032cf70832bc2bb04cc342e203b88
# bad: [06d65a6f640118430b894273914aa8d62d2cf637] Merge tag 'mips_6.2_1' of git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux
git bisect bad 06d65a6f640118430b894273914aa8d62d2cf637
# good: [a6e3e6f138058ff184d8ef5064a033b3f5fee8f8] Merge tag 'mm-nonmm-stable-2022-12-17-20-32' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
git bisect good a6e3e6f138058ff184d8ef5064a033b3f5fee8f8
# good: [3c202d14a9d73fb63c3dccb18feac5618c21e1c4] prandom: remove prandom_u32_max()
git bisect good 3c202d14a9d73fb63c3dccb18feac5618c21e1c4
# good: [f2855eec19cadddad2900da3a009ee39df6116a7] Merge tag 'mailbox-v6.2' of git://git.linaro.org/landing-teams/working/fujitsu/integration
git bisect good f2855eec19cadddad2900da3a009ee39df6116a7
# bad: [6022ec6ee2c3a16b26f218d7abb538afb839bd6d] Merge tag 'ntfs3_for_6.2' of https://github.com/Paragon-Software-Group/linux-ntfs3
git bisect bad 6022ec6ee2c3a16b26f218d7abb538afb839bd6d
# good: [5461e079009ae2732c833281c4b50dfb58d15ba5] Merge tag 'media/v6.2-2' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media
git bisect good 5461e079009ae2732c833281c4b50dfb58d15ba5
# good: [0d19f3d71394b0b03b8775c958b3354fa2259609] fs/ntfs3: Add system.ntfs_attrib_be extended attribute
git bisect good 0d19f3d71394b0b03b8775c958b3354fa2259609
---

I think it should be another 4 steps. Will update you :)
Comment 15 Giovanni Santini 2023-11-26 22:06:46 UTC
I finished the bisection, the commit that causes the issue for me is:
ad26a9c84510af7252e582e811de970433a9758f

git log -1 ad26a9c84510af7252e582e811de970433a9758f
commit ad26a9c84510af7252e582e811de970433a9758f (HEAD)
Author: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>
Date:   Fri Oct 7 20:08:06 2022 +0300

    fs/ntfs3: Fixing wrong logic in attr_set_size and ntfs_fallocate
    
    There were 2 problems:
    - in some cases we lost dirty flag;
    - cluster allocation can be called even when it wasn't needed.
    Fixes xfstest generic/465
    
    Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>

I am attaching the bisection log, maybe it is needed.
Comment 16 Giovanni Santini 2023-11-26 22:08:40 UTC
Created attachment 305479 [details]
Bisection log
Comment 17 The Linux kernel's regression tracker (Thorsten Leemhuis) 2023-11-27 06:27:36 UTC
(In reply to Giovanni Santini from comment #14)
> I am unsure why the issue was closed as "resolved" and "code fix", since the
> mainline kernel didn't work for me...

I guess Artem thought a comment Bagas made came from you. 

Anyway, thx for the tests and the bisection. I forwarded the issue my mail to the developer that has to handle this: https://lore.kernel.org/regressions/138ed123-0f84-4d7a-8a17-67fe2418cf29@leemhuis.info/
Comment 18 The Linux kernel's regression tracker (Thorsten Leemhuis) 2023-11-27 06:28:56 UTC
Forgot: If you have a minute, could you try reverting the commit ontop of 6.7-rc2 -- and if that works check if that fixes things?
Comment 19 Giovanni Santini 2023-11-27 14:57:25 UTC
(In reply to The Linux kernel's regression tracker (Thorsten Leemhuis) from comment #18)
> Forgot: If you have a minute, could you try reverting the commit ontop of
> 6.7-rc2 -- and if that works check if that fixes things?

Sure can do!
Will do it soon :)
Comment 20 Giovanni Santini 2023-11-27 16:06:29 UTC
Hello! I confirmed that mainline with that commit reverted does not present the issue.
Reverting the commit solves my problem.

I am attaching the shell log, recorded via script.
Comment 21 Giovanni Santini 2023-11-27 16:07:15 UTC
Created attachment 305484 [details]
Test output for mainline with commit reverted
Comment 22 Giovanni Santini 2023-11-30 11:00:41 UTC
Hello again,
Any progress regarding the issue?
Is there anything I can do to help? :)
/ Giovanni
Comment 23 The Linux kernel's regression tracker (Thorsten Leemhuis) 2023-11-30 13:00:58 UTC
(In reply to Giovanni Santini from comment #22)
> Any progress regarding the issue?

No reply to the mail, so doesn't look like it.

> Is there anything I can do to help? :)

Not that I can see. We really need a ntfs3 expert now, as reverting the change might cause more trouble then it solves for other use cases.
Comment 24 almaz.alexandrovich 2024-01-26 12:44:08 UTC
(In reply to Giovanni Santini from comment #22)
> Hello again,
> Any progress regarding the issue?
> Is there anything I can do to help? :)
> / Giovanni

Hello Giovanni,

I am in the final stages of checking the reported bug and will be ready to upload the patch within a couple of working days. I was able to reproduce the bug for a compressed file.
Is the directory in which you created the file compressed?

Best regards,
Konstantin Komarov
Comment 25 Giovanni Santini 2024-01-30 09:00:56 UTC
Hi Konstantin

(In reply to almaz.alexandrovich from comment #24)
> ...
> Is the directory in which you created the file compressed?
> 

I have set compression on the root node of the partition, so I believe so.
I also verified that creating and writing to a file in a compressed folder shows the issue, while doing so in an uncompressed folder works fine.
Comment 26 Artem S. Tashkinov 2024-02-03 06:55:56 UTC
Please write to https://www.spinics.net/lists/ntfs3/index.html

NTFS3 developers are not here and don't see any messages posted in this bug report.
Comment 27 Artem S. Tashkinov 2024-02-03 06:57:36 UTC
Though my bad, never mind, Konstantin is here. Sorry for the noise.
Comment 28 Giovanni Santini 2024-02-13 13:45:18 UTC
Tried the latest mainline kernel via the ArchLinux package `linux-mainline` and the issue seems to be fixed. My script is happy with both a compressed and a normal file :)

Is the change being backported to the LTS releases too?
Comment 29 The Linux kernel's regression tracker (Thorsten Leemhuis) 2024-02-13 14:08:35 UTC
(In reply to Giovanni Santini from comment #28)
> Is the change being backported to the LTS releases too?

Good question I already brought up, but no reply yet from Konstantin:
https://lore.kernel.org/all/bdea4053-b978-489d-a4a2-927685eee4a8@leemhuis.info/
Comment 30 Giovanni Santini 2024-03-05 15:41:16 UTC
Patches seems to have landed on 6.6 LTS and 6.7, I can use compressed files with no issues :)

I suppose the patches landed also on the other LTS kernels, will mark as solved.

Note You need to log in before you can comment on or make changes to this bug.