Bug 214873

Summary: man 2 fsync implies possibility to return early
Product: Documentation Reporter: sworddragon2
Component: man-pagesAssignee: documentation_man-pages (documentation_man-pages)
Status: REOPENED ---    
Severity: low CC: alx
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: Subsystem:
Regression: No Bisected commit-id:

Description sworddragon2 2021-10-29 21:25:56 UTC
The manpage for the fsync system call ( https://man7.org/linux/man-pages/man2/fsync.2.html ) describes as flushing the related caches to a storage device so that the information can even be retrieved after a crash/reboot. But then it does make the statement "The call blocks until the device reports that the transfer has completed." which causes now some interpretation: What happens if the device reports early completion (e.g. via a bugged firmware) of the transfer while the kernel still sees unsent caches in its context? Does fsync() indeed return then as the last referenced sentence implies or does it continue to send the caches the kernel sees to guarantee data integrity as good as possible as the previous documented part might imply?

I noticed this discrepancy when reporting a bug against dd ( https://debbugs.gnu.org/cgi/bugreport.cgi?bug=51345 ) that causes dd to return early when it is used with its fsync capability while the kernel still sees caches and consulting the fsync() manpage made it not clear if such a theoretical possibility from the fsync() system call would be intended or not so eventually this part could be slighty enhanced.
Comment 1 Alejandro Colomar 2021-10-30 12:05:31 UTC
[CC += LKML and a few kernel programmers]

Hi,

On 10/29/21 23:25, bugzilla-daemon@bugzilla.kernel.org wrote:
> https://bugzilla.kernel.org/show_bug.cgi?id=214873
> 
>              Bug ID: 214873
>             Summary: man 2 fsync implies possibility to return early
>             Product: Documentation
>             Version: unspecified
>            Hardware: All
>                  OS: Linux
>              Status: NEW
>            Severity: low
>            Priority: P1
>           Component: man-pages
>            Assignee: documentation_man-pages@kernel-bugs.osdl.org
>            Reporter: sworddragon2@gmail.com
>          Regression: No
> 
> The manpage for the fsync system call (
> https://man7.org/linux/man-pages/man2/fsync.2.html ) describes as flushing
> the
> related caches to a storage device so that the information can even be
> retrieved after a crash/reboot. But then it does make the statement "The call
> blocks until the device reports that the transfer has completed." which
> causes
> now some interpretation: What happens if the device reports early completion
> (e.g. via a bugged firmware) of the transfer while the kernel still sees
> unsent
> caches in its context? Does fsync() indeed return then as the last referenced
> sentence implies or does it continue to send the caches the kernel sees to
> guarantee data integrity as good as possible as the previous documented part
> might imply?
> 
> I noticed this discrepancy when reporting a bug against dd (
> https://debbugs.gnu.org/cgi/bugreport.cgi?bug=51345 ) that causes dd to
> return
> early when it is used with its fsync capability while the kernel still sees
> caches and consulting the fsync() manpage made it not clear if such a
> theoretical possibility from the fsync() system call would be intended or not
> so eventually this part could be slighty enhanced.
> 

I don't know how fsync(2) works.  Could some kernel fs programmer please 
check if the text matches the implementation, and if that issue reported 
should be reworded in the manual page?

Thanks,

Alex
Comment 2 Jens Axboe 2021-10-30 15:17:20 UTC
On 10/30/21 6:05 AM, Alejandro Colomar (man-pages) wrote:
> [CC += LKML and a few kernel programmers]
> 
> Hi,
> 
> On 10/29/21 23:25, bugzilla-daemon@bugzilla.kernel.org wrote:
>> https://bugzilla.kernel.org/show_bug.cgi?id=214873
>>
>>              Bug ID: 214873
>>             Summary: man 2 fsync implies possibility to return early
>>             Product: Documentation
>>             Version: unspecified
>>            Hardware: All
>>                  OS: Linux
>>              Status: NEW
>>            Severity: low
>>            Priority: P1
>>           Component: man-pages
>>            Assignee: documentation_man-pages@kernel-bugs.osdl.org
>>            Reporter: sworddragon2@gmail.com
>>          Regression: No
>>
>> The manpage for the fsync system call (
>> https://man7.org/linux/man-pages/man2/fsync.2.html ) describes as flushing
>> the
>> related caches to a storage device so that the information can even be
>> retrieved after a crash/reboot. But then it does make the statement "The
>> call
>> blocks until the device reports that the transfer has completed." which
>> causes
>> now some interpretation: What happens if the device reports early completion
>> (e.g. via a bugged firmware) of the transfer while the kernel still sees
>> unsent
>> caches in its context? Does fsync() indeed return then as the last
>> referenced
>> sentence implies or does it continue to send the caches the kernel sees to
>> guarantee data integrity as good as possible as the previous documented part
>> might imply?
>>
>> I noticed this discrepancy when reporting a bug against dd (
>> https://debbugs.gnu.org/cgi/bugreport.cgi?bug=51345 ) that causes dd to
>> return
>> early when it is used with its fsync capability while the kernel still sees
>> caches and consulting the fsync() manpage made it not clear if such a
>> theoretical possibility from the fsync() system call would be intended or
>> not
>> so eventually this part could be slighty enhanced.
>>
> 
> I don't know how fsync(2) works.  Could some kernel fs programmer please 
> check if the text matches the implementation, and if that issue reported 
> should be reworded in the manual page?

I don't know what the "see caches" mean in a few spots in the above
text? In simplified terms, fsync will write out dirty data and then
ensure that it is stable on media. The latter is your cache flush, if
the underlying device is using some sort of writeback caching. When the
flush is issued, there is no more dirty kernel cached data.

If the device doesn't honor a cache flush (eg "all writes previously
acked are now stable"), then there's nothing the kernel can do about it.
It would not even know. The only way to know is if a powercut comes in
after a flush, and once power is restored, the media contains stale
data.

There is no issue here. If your storage device is lying to you, buy
better storage devices.
Comment 3 Alejandro Colomar 2021-10-30 19:03:02 UTC
Thanks, Jens.  I'll close this bug.
Comment 4 sworddragon2 2021-10-31 12:33:43 UTC
(In reply to Jens Axboe from comment #2)
> I don't know what the "see caches" mean in a few spots in the above
> text?

Dirty kernel cached data - as you described it.


(In reply to Jens Axboe from comment #2)
> If the device doesn't honor a cache flush (eg "all writes previously
> acked are now stable"), then there's nothing the kernel can do about it.

In such a case the kernel could still send out all dirty kernel cached data - but the manpage strictly states fsync() would return early here with "The call blocks until the device reports that the transfer has completed.". But in the previous sentences it states it would not.

I assume if a storage device falsely claims the transfer has been completed fsync() would still send out dirty kernel cached data if any and blocks until this is done as this would make sense. This ticket is about clarifying this in the manpage, e.g. if this assumption is correct the referenced sentence from above could be changed to "The call blocks until dirty writes are sent out and the device reports that the transfer has completed." or more appropriate if needed.
Comment 5 sworddragon2 2021-11-07 23:24:34 UTC
This ticket was closed pretty fast after comment #2 so I could not write comment #4 before closing this ticket and now it seems due to it being closed it does not receive attention anymore. Thus I'm reopening this ticket so that comment #4 can be evaluated if this makes changes to the manpage valid.

But if you think there is really nothing that needs to be changed feel free to close this ticket again as I then won't bother about it here anymore.
Comment 6 Alejandro Colomar 2021-11-12 20:22:21 UTC
[Add CCs]

Hi Jens,

On 11/8/21 00:24, bugzilla-daemon@bugzilla.kernel.org wrote:
> https://bugzilla.kernel.org/show_bug.cgi?id=214873
> 
> sworddragon2@gmail.com changed:
> 
>             What    |Removed                     |Added
> ----------------------------------------------------------------------------
>               Status|RESOLVED                    |REOPENED
>           Resolution|INVALID                     |---
> 
> --- Comment #5 from sworddragon2@gmail.com ---
> This ticket was closed pretty fast after comment #2 so I could not write
> comment #4 before closing this ticket and now it seems due to it being closed
> it does not receive attention anymore. Thus I'm reopening this ticket so that
> comment #4 can be evaluated if this makes changes to the manpage valid.
> 
> But if you think there is really nothing that needs to be changed feel free
> to
> close this ticket again as I then won't bother about it here anymore.
> 

That comment (and the previous) was directed to you, but since you're 
not CCd in this bugzilla issue, you didn't receive it.  Could you please 
have a look at it.

I also CCd the same other emails as in my previous email, since some of 
them may want to have a look at it too.

Thanks,
Alex
Comment 7 Jens Axboe 2021-11-12 21:22:17 UTC
This is still mixing up multiple things. There are two things to consider here:

1) The dirty page cache for the file/device. This is what the kernel knows about, and fsync will flush all of it.

2) The device side potential write back cache. The kernel has no knowledge of the state of this. The kernel will issue a synchronize cache command for the device, upon which the device should ensure that all previously acked data is now on stable storage (eg the cache is clean).

The kernel will ensure that _all_ dirty cache is flushed out to the device, and then it will issue a flush command. That's all the kernel can do, and it will not leave dirty data unwritten for that mapping when fsync(2) is invoked. That's outside of errors that can happen, for which fsync(2) will return an error.

There's no issue here, outside of the potential buggy device case. For that case, the kernel still does what it's supposed to, which is flush all dirty kernel cache to the device. If the device is buggy and doesn't commit it to stable storage when a synchronize cache command is issued, the kernel has no knowledge of this nor is there anything it can do about it. There's no early return _unless_ the device is buggy! The man page clearly states that the call blocks until the device has told you that the data is stable. If the device violates the storage standards it belongs to, then you are likely screwed in more ways than just this one.

Please close this one.
Comment 8 sworddragon2 2021-11-12 23:38:11 UTC
(In reply to Jens Axboe from comment #7)
> The kernel will ensure that _all_ dirty cache is flushed out to the device,
> and then it will issue a flush command. That's all the kernel can do, and it
> will not leave dirty data unwritten for that mapping when fsync(2) is
> invoked.

That is pretty clear and now I would say the mentioned sentence should be indeed being updated but...


(In reply to Jens Axboe from comment #7)
> The man page clearly states that the
> call blocks until the device has told you that the data is stable.

This would probably go better with an example: Userspace requests 600 MiB to be written to an external storage device. fsync(2) has been called, 500 MiB have been sent to the storage device and 100 MiB are still in the dirty kernel cached data. At this point due to a slight firmware-bug the device falsely signals the transfer has been completed (but might not reject further received data). The referenced sentence in the manpage strictly claims fsync(2) returns here despite the kernel still having 100 MiB dirty kernel cached data while the part before claims the 100 MiB would also have been flushed - that is the conflict I'm claiming about here.