The Kernel NTFS driver shares a problem with UDF (bug 199291) in that it handles UTF-16 code units one by one and fails on surrogates.
Steps to reproduce:
1. On a Windows box, created a file called 🐧.txt in some NTFS media.
2. Let Linux mount it with the RO driver.
3. Run `ls` on the mounted directory.
🐧.txt is not shown. Running `dmesg | tail` reveals "... contains chaacters that cannot be converted to utf8. try [...] nls=utf8".
VFAT has a similar problem where 🐧.txt becomes ??.txt.
HFSplus driver calls uni2char, which is known to only accept a 16-bit wchar_t; it's therefore likely broken too.
JFS has a jfs_strfromUCS_le which seems to clear its own guilt with its name, but following the reasoning applied for UDF "Unicode" it should be fixed too.
Joliet uni16_to_x8 uses uni2char on Windows "Unicode" (UTF-16).
* * *
I mean, just grep for "unichar" under fs/. You can probably open 10 separate reports from that grep. The NLS interface does not correctly handle SMP characters to start with.
> grep for "unichar"