Bug 219586

Summary: Unable to find file after unicode change
Product: File System Reporter: HanQi (hanqi)
Component: f2fsAssignee: Default virtual assignee for f2fs (filesystem_f2fs)
Status: RESOLVED CODE_FIX    
Severity: blocking CC: chao, hanqi, jaegeuk, pmenzel+bugzilla.kernel.org
Priority: P3    
Hardware: All   
OS: Linux   
Kernel Version: Subsystem:
Regression: Yes Bisected commit-id: 18b5f47e7da46d3a0d7331e48befcaf151ed2ddf

Description HanQi 2024-12-10 06:58:44 UTC
Hi everybody,
The f2fs filesystem is unable to read some files with special characters,
such as ❤️, after the kernel was updated with the following patch:
https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=18b5f47e7da46d3a0d7331e48befcaf151ed2ddf

We can reproduce this in the following steps:
1、First, we need to roll back the unicode-related changes above and create
the special character file or folder:
./tools/mkfs.f2fs -f -O casefold -C utf8 f2fs.img
mount f2fs.img f2fs_dir/
mkdir Picture
./f2fs_io setflags casefold Picture
cd Picture
touch ❤️

2、Then we apply the above unicode patch, and after mounting the filesystem,
we get a message that the special character file was not found.
mount f2fs.img f2fs_dir/
cd Picture
ls -alh
ls: cannot access '❤️': No such file or directory
total 8
drwxr-xr-x 2 root root 3488 Dec 10 06:11 .
drwxr-xr-x 3 root root 4096 Dec  9 10:21 ..
-????????? ? ?    ?       ?            ? ❤️

Here are the conclusions of my preliminary analysis.
In casefole-enabled f2fs filesystems, file names are converted to lowercase
by the utf8_casefold function when querying for a file, and then the hash is
calculated based on the lowercase filename and stored on disk. The path to
the function is:
f2fs_lookup
    f2fs_prepare_lookup
        __f2fs_setup_filename
            f2fs_init_casefolded_name
                utf8_casefold
            f2fs_hash_filename
    __f2fs_find_entry

For some files that contain special characters, such as ❤️. We found that the
length of the output characters changed after the utf8_casefold function converted
them to lowercase before and after the patch, which ultimately led to a change in the
calculated hash. Files created before patch are not readable after path is enabled.

I think we need to modify the f2fs filesystem to be compatible with unicode related changes.
Comment 1 Jaegeuk Kim 2024-12-11 00:08:25 UTC
Hi,

I think that unicode patch introduced a regression, as old and new paths gives a different file length. Wasn't that broken to fix?
Comment 2 HanQi 2024-12-11 02:11:06 UTC
(In reply to Jaegeuk Kim from comment #1)
> Hi,
> 
> I think that unicode patch introduced a regression, as old and new paths
> gives a different file length. Wasn't that broken to fix?

Hi Kim,
I'm using the latest version of the kernel and following the above steps still reproduces the problem, can you tell me which patch fixed it? My kernel version is as follows:
uname -r
6.13.0-rc2+

Currently, the f2fs.fsck utility can only repair directories or files that are not encrypted, but it cannot repair encrypted files with the encryption feature because it cannot get the key to calculate the hash, so it cannot repair encrypted files with fsck.
Comment 3 Jaegeuk Kim 2024-12-11 04:13:44 UTC
IMO, you need to revert unicode patch.
Comment 4 Paul Menzel 2024-12-12 08:35:31 UTC
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=231825b2e1ff6ba799c5eaf396d3ab2354e37c6b reverts the culprit and supposedly fixes the issue. Can you confirm, and close the issues, if it does?
Comment 5 Chao Yu 2024-12-12 15:25:27 UTC
(In reply to HanQi from comment #0)
> Hi everybody,
> The f2fs filesystem is unable to read some files with special characters,
> such as ❤️, after the kernel was updated with the following patch:
> https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/
> ?id=18b5f47e7da46d3a0d7331e48befcaf151ed2ddf

Hi HanQi, I guess you can report this bug to Gabriel Krisman Bertazi <krisman@kernel.org>?

Thanks,
Comment 6 HanQi 2024-12-13 01:32:19 UTC
(In reply to Chao Yu from comment #5)
> (In reply to HanQi from comment #0)
> > Hi everybody,
> > The f2fs filesystem is unable to read some files with special characters,
> > such as ❤️, after the kernel was updated with the following patch:
> > https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/
> > ?id=18b5f47e7da46d3a0d7331e48befcaf151ed2ddf
> 
> Hi HanQi, I guess you can report this bug to Gabriel Krisman Bertazi
> <krisman@kernel.org>?
> 
> Thanks,

Hi Chao, Krisman already knows about the bug. You can see the link:https://lore.kernel.org/lkml/875xnqudr1.fsf@mailhost.krisman.be/