The Kernel NTFS driver shares a problem with UDF (bug 199291) in that it handles UTF-16 code units one by one and fails on surrogates. Steps to reproduce: 1. On a Windows box, created a file called 🐧.txt in some NTFS media. 2. Let Linux mount it with the RO driver. 3. Run `ls` on the mounted directory. Expected results: 🐧.txt exists. Actual results: 🐧.txt is not shown. Running `dmesg | tail` reveals "... contains chaacters that cannot be converted to utf8. try [...] nls=utf8".
VFAT has a similar problem where 🐧.txt becomes ??.txt. HFSplus driver calls uni2char, which is known to only accept a 16-bit wchar_t; it's therefore likely broken too. JFS has a jfs_strfromUCS_le which seems to clear its own guilt with its name, but following the reasoning applied for UDF "Unicode" it should be fixed too. Joliet uni16_to_x8 uses uni2char on Windows "Unicode" (UTF-16). * * * I mean, just grep for "unichar" under fs/. You can probably open 10 separate reports from that grep. The NLS interface does not correctly handle SMP characters to start with.
> grep for "unichar" *uni2char