Bug 218180
Summary: | ntfs3: empty file on update without forced cache drop | ||
---|---|---|---|
Product: | File System | Reporter: | Giovanni Santini (giovannisantini93) |
Component: | Other | Assignee: | fs_other |
Status: | RESOLVED CODE_FIX | ||
Severity: | normal | CC: | almaz.alexandrovich, bagasdotme, eslaamber, regressions |
Priority: | P3 | ||
Hardware: | All | ||
OS: | Linux | ||
Kernel Version: | 6.2 | Subsystem: | |
Regression: | Yes | Bisected commit-id: | ad26a9c84510af7252e582e811de970433a9758f |
Attachments: |
NVME info
The test script I run and its output system journalctl user journalctl dmesg Bisection log Test output for mainline with commit reverted |
Description
Giovanni Santini
2023-11-22 12:47:09 UTC
(In reply to Giovanni Santini from comment #0) > Hi, > After reporting the bug to my distribution: > https://bugs.archlinux.org/task/80283 > I've decided to report the issue here, since using a "clean" kernel does not > solve my issue. > > The issue appears on every release after 6.2 (6.1 releases do not show the > issue). > > The problem I am facing is the following: > 1. I mount an NTFS partition via NTFS3 > 2. I create a file > 3. I write to the file > 4. The file is empty > 5. I remount the partition > 6. The file has the changes I made before the remount > > I can avoid the remount by doing: > sudo sysctl vm.drop_caches=3 > > I would like some help in figuring out why this happens. > I can rebuild the kernel and testing stuff out. > > Here is the log (taken from the ArchLinux issue) of my shell presenting the > issue: > > (17:17) giovanni @ ~ $ udisksctl mount -b /dev/nvme0n1p5 -t ntfs3 > Mounted /dev/nvme0n1p5 at /run/media/giovanni/Data > (17:17) giovanni @ ~ $ cat /run/media/giovanni/Data/mount_test.txt > Using udisks and ntfs3 > (17:17) giovanni @ ~ $ echo "Using udisks and ntfs3" > > /run/media/giovanni/Data/mount_test.txt > (17:18) giovanni @ ~ $ cat /run/media/giovanni/Data/mount_test.txt > (17:18) giovanni @ ~ $ sync; sudo sysctl vm.drop_caches=3 > vm.drop_caches = 3 > (17:18) giovanni @ ~ $ cat /run/media/giovanni/Data/mount_test.txt > Using udisks and ntfs3 > (17:18) giovanni @ ~ $ echo "Using udisks and ntfs3 again" > > /run/media/giovanni/Data/mount_test.txt > (17:18) giovanni @ ~ $ cat /run/media/giovanni/Data/mount_test.txt > (17:18) giovanni @ ~ $ sync > (17:18) giovanni @ ~ $ cat /run/media/giovanni/Data/mount_test.txt > (17:18) giovanni @ ~ $ sync; sudo sysctl vm.drop_caches=3 > vm.drop_caches = 3 > (17:19) giovanni @ ~ $ cat /run/media/giovanni/Data/mount_test.txt > Using udisks and ntfs3 again > (17:19) giovanni @ ~ $ > > Using "mount" shows the same issue. First, check latest mainline (currently v6.7-rc2). Then, if this regression still persists, do bisection (see Documentation/admin-guide/bug-bisect.rst in the kernel sources for reference). (In reply to Giovanni Santini from comment #0) > Hi, > After reporting the bug to my distribution: > https://bugs.archlinux.org/task/80283 > I've decided to report the issue here, since using a "clean" kernel does not > solve my issue. > > The issue appears on every release after 6.2 (6.1 releases do not show the > issue). > > The problem I am facing is the following: > 1. I mount an NTFS partition via NTFS3 > 2. I create a file > 3. I write to the file > 4. The file is empty > 5. I remount the partition > 6. The file has the changes I made before the remount > > I can avoid the remount by doing: > sudo sysctl vm.drop_caches=3 > > I would like some help in figuring out why this happens. > I can rebuild the kernel and testing stuff out. > > Here is the log (taken from the ArchLinux issue) of my shell presenting the > issue: > > (17:17) giovanni @ ~ $ udisksctl mount -b /dev/nvme0n1p5 -t ntfs3 > Mounted /dev/nvme0n1p5 at /run/media/giovanni/Data > (17:17) giovanni @ ~ $ cat /run/media/giovanni/Data/mount_test.txt > Using udisks and ntfs3 > (17:17) giovanni @ ~ $ echo "Using udisks and ntfs3" > > /run/media/giovanni/Data/mount_test.txt > (17:18) giovanni @ ~ $ cat /run/media/giovanni/Data/mount_test.txt > (17:18) giovanni @ ~ $ sync; sudo sysctl vm.drop_caches=3 > vm.drop_caches = 3 > (17:18) giovanni @ ~ $ cat /run/media/giovanni/Data/mount_test.txt > Using udisks and ntfs3 > (17:18) giovanni @ ~ $ echo "Using udisks and ntfs3 again" > > /run/media/giovanni/Data/mount_test.txt > (17:18) giovanni @ ~ $ cat /run/media/giovanni/Data/mount_test.txt > (17:18) giovanni @ ~ $ sync > (17:18) giovanni @ ~ $ cat /run/media/giovanni/Data/mount_test.txt > (17:18) giovanni @ ~ $ sync; sudo sysctl vm.drop_caches=3 > vm.drop_caches = 3 > (17:19) giovanni @ ~ $ cat /run/media/giovanni/Data/mount_test.txt > Using udisks and ntfs3 again > (17:19) giovanni @ ~ $ > > Using "mount" shows the same issue. Hi Giovanni, I can't reproduce this regression on kernel v6.7-rc2 using both loop device and my SanDisk flash drive. Hello, While I do understand your point, I still have the issue... I tried wiping and setting an USB flash drive but also that won't work. I can try seeing whether a loop device will work or not; how did you do it? I am building the latest mainline RC for testing. Is there any log I can provide you to continue the investigation? (In reply to Giovanni Santini from comment #3) > Hello, > > While I do understand your point, I still have the issue... > I tried wiping and setting an USB flash drive but also that won't work. > I can try seeing whether a loop device will work or not; how did you do it? > > I am building the latest mainline RC for testing. > > Is there any log I can provide you to continue the investigation? What is your NVMe device? And can you attach full dmesg output? Created attachment 305468 [details]
NVME info
(In reply to Bagas Sanjaya from comment #4) > What is your NVMe device? And can you attach full dmesg output? I've attached the output of `nvme-cli` above. I can for sure share the dmesg output. What do you want me to do specifically? Mount / write / unmount? (In reply to Giovanni Santini from comment #6) > (In reply to Bagas Sanjaya from comment #4) > > What is your NVMe device? And can you attach full dmesg output? > > I've attached the output of `nvme-cli` above. > > I can for sure share the dmesg output. > What do you want me to do specifically? > Mount / write / unmount? dmesg is for kernel developers to aid debugging. As for testing mainline, make sure to repeat the reproducer exactly as it is. (In reply to Bagas Sanjaya from comment #7) > > dmesg is for kernel developers to aid debugging. > > As for testing mainline, make sure to repeat the reproducer exactly as it is. Alright, I am going to attach all logs I could think of: 1. A shell session where I show my ntfs3 test script and run it, showing that I still have the issue on Linux 6.7 RC2. 2. All journalctl system log 3. All journalctl user log 4. All dmesg log Let me know if you want other logs :) Created attachment 305474 [details]
The test script I run and its output
Created attachment 305475 [details]
system journalctl
Created attachment 305476 [details]
user journalctl
Created attachment 305477 [details]
dmesg
I might be missing something here, but why did Artem close this as RESOLVED CODE_FIX? Anyway, as Bagas can't reproduce this it's likely something pretty specific. Forwarding the problem to the maintainers likely won't help much. But what really could help is a git bisection between 6.1 and 6.2. Could you perform one Giovanni Santini? Hi Thorsten, I am unsure why the issue was closed as "resolved" and "code fix", since the mainline kernel didn't work for me... I agree it is something obscure with my machine, since even the Arch dev who supported me couldn't replicate it. I am working on the bisect already, currently at this point: --- $ git bisect log git bisect start # status: waiting for both good and bad commits # good: [830b3c68c1fb1e9176028d02ef86f3cf76aa2476] Linux 6.1 git bisect good 830b3c68c1fb1e9176028d02ef86f3cf76aa2476 # status: waiting for bad commit, 1 good commit known # bad: [c9c3395d5e3dcc6daee66c6908354d47bf98cb0c] Linux 6.2 git bisect bad c9c3395d5e3dcc6daee66c6908354d47bf98cb0c # good: [1ca06f1c1acecbe02124f14a37cce347b8c1a90c] Merge tag 'xtensa-20221213' of https://github.com/jcmvbkbc/linux-xtensa git bisect good 1ca06f1c1acecbe02124f14a37cce347b8c1a90c # good: [b83a7080d30032cf70832bc2bb04cc342e203b88] Merge tag 'staging-6.2-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging git bisect good b83a7080d30032cf70832bc2bb04cc342e203b88 # bad: [06d65a6f640118430b894273914aa8d62d2cf637] Merge tag 'mips_6.2_1' of git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux git bisect bad 06d65a6f640118430b894273914aa8d62d2cf637 # good: [a6e3e6f138058ff184d8ef5064a033b3f5fee8f8] Merge tag 'mm-nonmm-stable-2022-12-17-20-32' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm git bisect good a6e3e6f138058ff184d8ef5064a033b3f5fee8f8 # good: [3c202d14a9d73fb63c3dccb18feac5618c21e1c4] prandom: remove prandom_u32_max() git bisect good 3c202d14a9d73fb63c3dccb18feac5618c21e1c4 # good: [f2855eec19cadddad2900da3a009ee39df6116a7] Merge tag 'mailbox-v6.2' of git://git.linaro.org/landing-teams/working/fujitsu/integration git bisect good f2855eec19cadddad2900da3a009ee39df6116a7 # bad: [6022ec6ee2c3a16b26f218d7abb538afb839bd6d] Merge tag 'ntfs3_for_6.2' of https://github.com/Paragon-Software-Group/linux-ntfs3 git bisect bad 6022ec6ee2c3a16b26f218d7abb538afb839bd6d # good: [5461e079009ae2732c833281c4b50dfb58d15ba5] Merge tag 'media/v6.2-2' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media git bisect good 5461e079009ae2732c833281c4b50dfb58d15ba5 # good: [0d19f3d71394b0b03b8775c958b3354fa2259609] fs/ntfs3: Add system.ntfs_attrib_be extended attribute git bisect good 0d19f3d71394b0b03b8775c958b3354fa2259609 --- I think it should be another 4 steps. Will update you :) I finished the bisection, the commit that causes the issue for me is: ad26a9c84510af7252e582e811de970433a9758f git log -1 ad26a9c84510af7252e582e811de970433a9758f commit ad26a9c84510af7252e582e811de970433a9758f (HEAD) Author: Konstantin Komarov <almaz.alexandrovich@paragon-software.com> Date: Fri Oct 7 20:08:06 2022 +0300 fs/ntfs3: Fixing wrong logic in attr_set_size and ntfs_fallocate There were 2 problems: - in some cases we lost dirty flag; - cluster allocation can be called even when it wasn't needed. Fixes xfstest generic/465 Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com> I am attaching the bisection log, maybe it is needed. Created attachment 305479 [details]
Bisection log
(In reply to Giovanni Santini from comment #14) > I am unsure why the issue was closed as "resolved" and "code fix", since the > mainline kernel didn't work for me... I guess Artem thought a comment Bagas made came from you. Anyway, thx for the tests and the bisection. I forwarded the issue my mail to the developer that has to handle this: https://lore.kernel.org/regressions/138ed123-0f84-4d7a-8a17-67fe2418cf29@leemhuis.info/ Forgot: If you have a minute, could you try reverting the commit ontop of 6.7-rc2 -- and if that works check if that fixes things? (In reply to The Linux kernel's regression tracker (Thorsten Leemhuis) from comment #18) > Forgot: If you have a minute, could you try reverting the commit ontop of > 6.7-rc2 -- and if that works check if that fixes things? Sure can do! Will do it soon :) Hello! I confirmed that mainline with that commit reverted does not present the issue. Reverting the commit solves my problem. I am attaching the shell log, recorded via script. Created attachment 305484 [details]
Test output for mainline with commit reverted
Hello again, Any progress regarding the issue? Is there anything I can do to help? :) / Giovanni (In reply to Giovanni Santini from comment #22) > Any progress regarding the issue? No reply to the mail, so doesn't look like it. > Is there anything I can do to help? :) Not that I can see. We really need a ntfs3 expert now, as reverting the change might cause more trouble then it solves for other use cases. (In reply to Giovanni Santini from comment #22) > Hello again, > Any progress regarding the issue? > Is there anything I can do to help? :) > / Giovanni Hello Giovanni, I am in the final stages of checking the reported bug and will be ready to upload the patch within a couple of working days. I was able to reproduce the bug for a compressed file. Is the directory in which you created the file compressed? Best regards, Konstantin Komarov Hi Konstantin (In reply to almaz.alexandrovich from comment #24) > ... > Is the directory in which you created the file compressed? > I have set compression on the root node of the partition, so I believe so. I also verified that creating and writing to a file in a compressed folder shows the issue, while doing so in an uncompressed folder works fine. Please write to https://www.spinics.net/lists/ntfs3/index.html NTFS3 developers are not here and don't see any messages posted in this bug report. Though my bad, never mind, Konstantin is here. Sorry for the noise. Tried the latest mainline kernel via the ArchLinux package `linux-mainline` and the issue seems to be fixed. My script is happy with both a compressed and a normal file :) Is the change being backported to the LTS releases too? (In reply to Giovanni Santini from comment #28) > Is the change being backported to the LTS releases too? Good question I already brought up, but no reply yet from Konstantin: https://lore.kernel.org/all/bdea4053-b978-489d-a4a2-927685eee4a8@leemhuis.info/ Patches seems to have landed on 6.6 LTS and 6.7, I can use compressed files with no issues :) I suppose the patches landed also on the other LTS kernels, will mark as solved. |