Bug 3452 - [Patch] Update fs/nls/nls_cp936.c (Chinese codepage)
Summary: [Patch] Update fs/nls/nls_cp936.c (Chinese codepage)
Status: CLOSED CODE_FIX
Alias: None
Product: File System
Classification: Unclassified
Component: Other (show other bugs)
Hardware: i386 Linux
: P2 normal
Assignee: xexz
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2004-09-24 02:59 UTC by hashao
Modified: 2007-10-22 02:16 UTC (History)
1 user (show)

See Also:
Kernel Version: 2.6.9
Subsystem:
Regression: ---
Bisected commit-id:


Attachments
Patch to update the cp936 with mapping from MS site. (114.60 KB, patch)
2004-09-24 03:02 UTC, hashao
Details | Diff
Patch to update the cp936 with mapping from MS site. (119.26 KB, patch)
2004-09-24 03:16 UTC, hashao
Details | Diff
Patch to update the cp936 with mapping from MS site. (119.25 KB, patch)
2004-09-24 03:19 UTC, hashao
Details | Diff
Patch to update the cp950 with mapping from MS site. (127.93 KB, patch)
2004-09-25 03:55 UTC, hashao
Details | Diff
[Patch] Update the cp936 with correct mapping from MS site. (119.65 KB, patch)
2005-01-06 22:51 UTC, hashao
Details | Diff
[Patch] Update the cp950 with correct mapping from MS site. (128.31 KB, patch)
2005-01-06 22:55 UTC, hashao
Details | Diff

Description hashao 2004-09-24 02:59:13 UTC
The current conversion table for codepage cp936 (Chinese Simplified) has many
wrong mapping. I don't know where did the original table come from. As a result,
Chinese filenames created on a vfat partition under Linux has some characters
that cannot be accessed under Windows.

The cp936 table can be found at:

http://www.microsoft.com/globaldev/reference/dbcs/936.htm

e.g.: CP936 code point: 0x8179 0x81ED
Comment 1 hashao 2004-09-24 03:02:37 UTC
Created attachment 3712 [details]
Patch to update the cp936 with mapping from MS site.

Also add an alias to GBK to the code page.
Comment 2 hashao 2004-09-24 03:16:04 UTC
Created attachment 3713 [details]
Patch to update the cp936 with mapping from MS site.

Unicode mapping start from 0x0000 instead of 0x0100. There are some symbols in
that 0x0000-0x0100 range.
Comment 3 hashao 2004-09-24 03:19:14 UTC
Created attachment 3714 [details]
Patch to update the cp936 with mapping from MS site.

Remove debug garbage.
Comment 4 hashao 2004-09-25 03:55:29 UTC
Created attachment 3720 [details]
Patch to update the cp950 with mapping from MS site.


This one is for codepage CP950, which is for traditional Chinese.

The conversion table was based on the gnu glibc's BIG5.gz charmap
(/usr/share/i18n/charmaps/BIG5.gz) which has some additional mapping for
popular extension.

The actual Microsoft table can be found at:
http://www.microsoft.com/globaldev/reference/dbcs/950.htm

P.S. The GBK table in glibc is the same as the MS table.
Comment 5 hashao 2005-01-06 22:51:47 UTC
Created attachment 4346 [details]
[Patch] Update the cp936 with correct mapping from MS site.

Fix a bug for ascii in mapping function.
Comment 6 hashao 2005-01-06 22:55:10 UTC
Created attachment 4347 [details]
[Patch] Update the cp950 with correct mapping from MS site.

The same ascii char fix from cp936
Comment 7 Natalie Protasevich 2007-10-16 06:31:54 UTC
Hashao,
Is the problem still there with recent kernels? I would be surprised if it is, probably fixed by now. Can you confirm please so we close the bug.
Thanks.
Comment 8 hashao 2007-10-22 02:16:26 UTC
Yes, it is fixed.

Note You need to log in before you can comment on or make changes to this bug.