Hi everybody, The f2fs filesystem is unable to read some files with special characters, such as ❤️, after the kernel was updated with the following patch: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=18b5f47e7da46d3a0d7331e48befcaf151ed2ddf We can reproduce this in the following steps: 1、First, we need to roll back the unicode-related changes above and create the special character file or folder: ./tools/mkfs.f2fs -f -O casefold -C utf8 f2fs.img mount f2fs.img f2fs_dir/ mkdir Picture ./f2fs_io setflags casefold Picture cd Picture touch ❤️ 2、Then we apply the above unicode patch, and after mounting the filesystem, we get a message that the special character file was not found. mount f2fs.img f2fs_dir/ cd Picture ls -alh ls: cannot access '❤️': No such file or directory total 8 drwxr-xr-x 2 root root 3488 Dec 10 06:11 . drwxr-xr-x 3 root root 4096 Dec 9 10:21 .. -????????? ? ? ? ? ? ❤️ Here are the conclusions of my preliminary analysis. In casefole-enabled f2fs filesystems, file names are converted to lowercase by the utf8_casefold function when querying for a file, and then the hash is calculated based on the lowercase filename and stored on disk. The path to the function is: f2fs_lookup f2fs_prepare_lookup __f2fs_setup_filename f2fs_init_casefolded_name utf8_casefold f2fs_hash_filename __f2fs_find_entry For some files that contain special characters, such as ❤️. We found that the length of the output characters changed after the utf8_casefold function converted them to lowercase before and after the patch, which ultimately led to a change in the calculated hash. Files created before patch are not readable after path is enabled. I think we need to modify the f2fs filesystem to be compatible with unicode related changes.
Hi, I think that unicode patch introduced a regression, as old and new paths gives a different file length. Wasn't that broken to fix?
(In reply to Jaegeuk Kim from comment #1) > Hi, > > I think that unicode patch introduced a regression, as old and new paths > gives a different file length. Wasn't that broken to fix? Hi Kim, I'm using the latest version of the kernel and following the above steps still reproduces the problem, can you tell me which patch fixed it? My kernel version is as follows: uname -r 6.13.0-rc2+ Currently, the f2fs.fsck utility can only repair directories or files that are not encrypted, but it cannot repair encrypted files with the encryption feature because it cannot get the key to calculate the hash, so it cannot repair encrypted files with fsck.
IMO, you need to revert unicode patch.
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=231825b2e1ff6ba799c5eaf396d3ab2354e37c6b reverts the culprit and supposedly fixes the issue. Can you confirm, and close the issues, if it does?
(In reply to HanQi from comment #0) > Hi everybody, > The f2fs filesystem is unable to read some files with special characters, > such as ❤️, after the kernel was updated with the following patch: > https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/ > ?id=18b5f47e7da46d3a0d7331e48befcaf151ed2ddf Hi HanQi, I guess you can report this bug to Gabriel Krisman Bertazi <krisman@kernel.org>? Thanks,
(In reply to Chao Yu from comment #5) > (In reply to HanQi from comment #0) > > Hi everybody, > > The f2fs filesystem is unable to read some files with special characters, > > such as ❤️, after the kernel was updated with the following patch: > > https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/ > > ?id=18b5f47e7da46d3a0d7331e48befcaf151ed2ddf > > Hi HanQi, I guess you can report this bug to Gabriel Krisman Bertazi > <krisman@kernel.org>? > > Thanks, Hi Chao, Krisman already knows about the bug. You can see the link:https://lore.kernel.org/lkml/875xnqudr1.fsf@mailhost.krisman.be/