Bug 8310 (usb-id-encoding) - USB device names are not sanitized for UTF-8
Summary: USB device names are not sanitized for UTF-8
Status: CLOSED CODE_FIX
Alias: usb-id-encoding
Product: Drivers
Classification: Unclassified
Component: Input Devices (show other bugs)
Hardware: i386 Linux
: P2 normal
Assignee: drivers_input-devices
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2007-04-07 03:12 UTC by Nicolas Mailhot
Modified: 2009-09-05 06:36 UTC (History)
4 users (show)

See Also:
Kernel Version: 2.6.21-rc5
Subsystem:
Regression: No
Bisected commit-id:


Attachments

Description Nicolas Mailhot 2007-04-07 03:12:30 UTC
The USB layer will output text device names "as is", sometimes not in 7bit ASCII
or UTF-8 but ISO-8859-1

These names are reused by the xorg evdev driver to identify devices:

Section "InputDevice"
    Identifier "track-expl"
    Driver     "evdev"
    Option     "Protocol" "evdev"
    Option     "Name" "Microsoft Microsoft Trackball Explorer
Comment 1 Anonymous Emailer 2007-04-07 03:42:32 UTC
Reply-To: akpm@linux-foundation.org

On Sat, 7 Apr 2007 03:12:49 -0700 bugme-daemon@bugzilla.kernel.org wrote:

> http://bugzilla.kernel.org/show_bug.cgi?id=8310
> 
>            Summary: USB device names are not sanitized for UTF-8
>     Kernel Version: 2.6.21-rc5
>             Status: NEW
>           Severity: normal
>              Owner: drivers_input-devices@kernel-bugs.osdl.org
>          Submitter: Nicolas.Mailhot@LaPoste.net
> 
> 
> The USB layer will output text device names "as is", sometimes not in 7bit ASCII
> or UTF-8 but ISO-8859-1
> 
> These names are reused by the xorg evdev driver to identify devices:
> 
> Section "InputDevice"
>     Identifier "track-expl"
>     Driver     "evdev"
>     Option     "Protocol" "evdev"
>     Option     "Name" "Microsoft Microsoft Trackball Explorer_"
>     Option     "ZAxisMapping" "4 5"
>     Option     "Buttons" "7"
> EndSection
> 
> evdev matching requires the "Name" string match byte-to-byte to the string
> exposed by the kernel
> 
> That means finding a way to create a ISO-8859-1 conf file on a UTF-8 distro (not
> easy nowadays)
> 
> Also all the Xorg.conf tools will rewrite the file in UTF-8 at the slightest
> opportunity, breaking the matching and killing X startup. Some of the commonly
> installed tools rewrite the file at each boot, and you get to fix X setup
> manually every time
> 
> Can't the USB layer filter byte strings that are incompatible with today's main
> Linux encoding ?

Comment 2 David Brownell 2007-04-07 06:05:37 UTC
Sure enough, hid-core.c just does strlcpy() of the ISO-8859-1 strings returned 
by usbcore, based on the UTF-16 returned by various devices. 
 
It's not clear what "bug" this intends to report though.  Complaining that 
non-ASCII characters are emitted?  That it doesn't convert the UTF-16 into 
UTF-8?  That it doesn't return UTF-16 in the first place?  That the X tools are 
stupid? 
 
It'd be simple enough for hid-core to morph characters with the high bit set 
into the '?' used by usbcore for characters with the high byte set... 
Comment 3 Nicolas Mailhot 2007-04-07 06:33:20 UTC
I didn't want to suggest a solution, just point out the problem. But since you
insist my preferred resolution would be (in this order)
1. convert the UTF-16 into UTF-8
2. filter strings to report 7bit ASCII only (will probably fail spectacularly as
soon as the chinese start using chinese names for gadgets targeted at their
internal market)

Anything else can not be used with UTF-8 userspace (also why is the kernel
reporting UTF-16 strings when almost no one uses it under Linux)

I definitively agree xorg is stupid to do matching that's encoding-sensitive.
This can be mitigated by moving to the de-facto common Linux encoding today:
UTF-8. Asking the tools that process xorg.conf to know encoding conversion rules
apply to everything but the evdev name strings (that must be considered as
opaque byte strings) is way too baroque to succeed
Comment 4 Dmitry Torokhov 2007-04-07 07:10:00 UTC
If we do any sanitization in kernel I think it should be done when we populate 
dev->manufacturer and dev->product in UDB core. It would also make sense for X 
to use strstr when matching name and phys strings.
Comment 5 Natalie Protasevich 2007-06-14 09:19:28 UTC
Any more thought on how to proceed on this issue?
Should Xorg developers be contacted also?
Thanks.
Comment 6 Nicolas Mailhot 2007-06-14 09:52:59 UTC
The xorg devs already wrote me it was not their problem and they didn't want to fix the xorg side (actually, xorg input has his share of problems so trying to workaround a kernel mistake would not really be a good developer time use)

A patch to fix this issue kernel-side has been posted on mailing lists and is being discussed. It works for my system
Comment 7 Natalie Protasevich 2007-06-14 10:11:57 UTC
Subject: Re:  USB device names are not sanitized for UTF-8

On 6/14/07, bugme-daemon@bugzilla.kernel.org
<bugme-daemon@bugzilla.kernel.org> wrote:
> http://bugzilla.kernel.org/show_bug.cgi?id=8310
>
>
>
>
>
> ------- Comment #6 from Nicolas.Mailhot@LaPoste.net  2007-06-14 09:52 -------
> The xorg devs already wrote me it was not their problem and they didn't want
> to
> fix the xorg side (actually, xorg input has his share of problems so trying
> to
> workaround a kernel mistake would not really be a good developer time use)
>
> A patch to fix this issue kernel-side has been posted on mailing lists and is
> being discussed. It works for my system
>
>
> --
> Configure bugmail: http://bugzilla.kernel.org/userprefs.cgi?tab=email
> ------- You are receiving this mail because: -------
> You are on the CC list for the bug, or are watching someone who is.
>

This is great, can you please attach the patch and maybe link to the discussion?
Thanks!
Comment 8 Nicolas Mailhot 2007-06-14 10:27:38 UTC
> > A patch to fix this issue kernel-side has been posted on mailing lists and
> is
> > being discussed. It works for my system

> This is great, can you please attach the patch and maybe link to the
> discussion?

http://article.gmane.org/gmane.linux.usb.devel/54768
Comment 9 Natalie Protasevich 2007-06-14 10:45:25 UTC
Alan, your patch worked great, are you planning to submit it?
Then we can close the bug :)
Thanks,
--Natalie
Comment 10 Alan Stern 2007-06-14 11:59:25 UTC
No, the patch isn't suitable.  I'm going to add some library routines to the kernel for converting to/from UTF-8.  Then the USB code will use the library routines.
Comment 11 Dmitry Torokhov 2009-09-05 06:36:22 UTC
USB core has been updated to translate strings from UTF-16LE to UTF-8, and it seems to work well with my Microsoft Intellimouse Explorer, closing.

Note You need to log in before you can comment on or make changes to this bug.