Bug 217541 - Issue about whiteout characters
Summary: Issue about whiteout characters
Status: NEW
Alias: None
Product: File System
Classification: Unclassified
Component: HFS/HFSPLUS (show other bugs)
Hardware: All Linux
: P3 high
Assignee: fs_hfs@kernel-bugs.osdl.org
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2023-06-11 19:13 UTC by hps
Modified: 2023-06-11 19:18 UTC (History)
0 users

See Also:
Kernel Version:
Subsystem:
Regression: No
Bisected commit-id:


Attachments

Description hps 2023-06-11 19:13:44 UTC
Hi,

After discovering an issue in all Apple products mistreating whiteout characters, I wonder if it is responsible of Linux to add an optional filter to block creation of files having the faulty character pattern.

When MS-DOS and FAT32 was invented, 0xE5, was used as a whiteout character, but unfortunately both Norway and Denmark got assigned 0xE5 for their "å" character. Then 0x05 was used to escape this character.

When unicode was established, "å" carried over to being escaped, but now as "a" and a ring-over-character. There appears to have been a regression issue at Apple, that once you could create file-names that way, and now you no longer can, but the files are still there, just not accessible by name, nor listed in finder.

I've been in contact with Apple two times, to no avail. I've just been met with silence. I guess this issue is non-fixable. The consequences are important documents may suddenly disappear, by the use of the wrong character, when doing backups, or when transferring between systems.

I've tried to collect information at the following locations:

Test program to figure out what characters are supported:
https://github.com/hselasky/invalidchar

Full reproduction in Norwegian using Windows 11 (works great in XFCE-4 too btw) and MacOS:
https://www.reddit.com/r/norge/comments/144h6wx/har_du_mistet_innhold_dokumenter_bilder_musikk/

Documentation:
https://superuser.com/questions/204287/what-characters-are-forbidden-in-os-x-filenames/

This also affects exFAT!

Does anyone in the Linux community know the history bits here?

Other file systems like FFS (FreeBSD) separates the whiteout bit from the filename.

What can be done about this issue?

--HPS
Comment 1 hps 2023-06-11 19:18:49 UTC
Further, the escape method used is dubious, and leads to undefined characters, not properly handled _everywhere_ so to speak. Try it yourself! A small hint is post-script and DAB.

https://nvd.nist.gov/vuln/detail/CVE-2023-25193

Like this:

N=1 å
N=2 å̊ 
N=4 å̊̊̊
N=8 å̊̊̊̊̊̊̊ 
N=16 å̊̊̊̊̊̊̊̊̊̊̊̊̊̊̊
N=32 å̊̊̊̊̊̊̊̊̊̊̊̊̊̊̊̊̊̊̊̊̊̊̊̊̊̊̊̊̊̊̊

According to the Norwegian language council, having more than one ring over "a" is meaningless. Text processing tools, should therefore just cancel multiple ring-over letters into a single one. Similarly for all other characters, also called "umlauts" in German.

Note You need to log in before you can comment on or make changes to this bug.