Kernel Bug Tracker – Bug 7746
Accented letters don't get capitalized by caps lock in unicode mode
Last modified: 2009-03-31 23:20:37 UTC
Quoting a mail from Vojtech Pavlik:
"Several languages (polish, czech, slovak, ...) use dead keys (keys that
don't do anything per se, but put an accent on the next letter). And now
almost everyone is switching to unicode. And Linux kernel doesn't
support unicode for dead keys. This means trouble."
(full mail at http://www.ussg.iu.edu/hypermail/linux/kernel/0405.3/1387.html)
And indeed, see http://bugs.debian.org/404503
There is a more recent patch proposed on
Is there any objection to the proposed way? (extending the internal type, and add a new ioctl for uploading unicode dead keys). Else, I'll work on updating the patch to latest versions of linux.
Does this sound like a good idea?
See http://bugs.debian.org/404503 : now that distributions use UTF-8, polish, czech, slovak, etc. people can't type their accented characters, this is a quite bad regression for them (that's 10-15% of Czech alphabet for instance).
I think this is really good idea. I install Linux with iptables to a router and I need comment iptables rules. I have to say that no X system is installed on this hardware. Other peoples has to be able to read and change original rules. I am sure that 10 years ago nobody expect accented charcters here. Today end user expects full national alphabet in such cases. It looks a bit non professional when some characters cannot be typed on a keyboard.
Guys, we recently merged some unicode changes. (http://grmso.net:8090/commit/1ed8a2b3c501bedd4b35130c8a52662ccf78abad/ was the latest)
Does that get us what we need here? If not, now would be a good time
to follow up on lkml. Please cc Egmont Koblinger <firstname.lastname@example.org>
if doing this.
Mmm, it doesn't solve the issue at stake here at all, actually :) Now writing a mail to lkml
Andrew: the patch you've mentioned is independent from this. That one fixes utf-8 output issues, while dead keys is the input part.
By the way, if I recall correctly, there's another major bug in the input part: caps lock doesn't capitalize accented letters (those that are entered directly by a keypress, not by dead or compose key). As I see, both these issues require adding some ioctl and adjusting user space software, so I think they'd rather be addressed together. (I have a nasty workaround for the caps lock problem here:
but it's clearly disgusting and is very far from being suitable for mainstream commit.)
Samuel, I think our best shot here would be for you to
dust off Jirka's patch and start shepherding it into
the tree, if you have time.
I know basically nothing about console and unicode, so
careful explanations will be needed ;)
One commit did already help, it's http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=759448f459234bfcf34b82471f0dba77a9aca498 which fixes dead keys. However, it won't be completely sufficient for non-latin1 characters. I've cleaned the old patch, here it is for review.
Created attachment 12255 [details]
add interface for unicode diacritics
Created attachment 12256 [details]
This version fixes non-latin1 characters: it translates them into unicode according to the current 8bit character map. Of course, kbd should still be fixed to use the new unicode interface, for non-8bit characters, but that can be done later.
About capitalization of accented letters, the proposed way is actually exactly the same as setting
keycode 3 = +eacute Eacute
in the kmap file. The "+" means that KT_LETTER type will be prepended to the keysym, which will make it pass through the regular capitalization rules. So it's the kmap files that need to be fixed.
Oops, sorry, my second patch is actually bogus: it uses the font translation, not the input translation, and they might differ.
Created attachment 12259 [details]
Ok, this version works fine.
Created attachment 12260 [details]
Sorry for the flood, the correct file with comments was on another machine.
Created attachment 12313 [details]
patch for kbd unicode diacritics support
Here is a patch for the kbd support of unicode diacritics.
Created attachment 12314 [details]
Ok, this time it should completely work: I've tested in all of latin1, latin9, latin2 and unicode with french and czech dead keys.
On Tue, 7 Aug 2007 18:41:45 -0700 (PDT)
> Here is a patch
pleeeeze, send patches via email to me, lkml, add signoffs, changelog, etc?
bugzilla isn't a suitable way of sending patches!
Well, that was mostly for letting people test it before submitting, but ok I'll do this.
Samuel: is this patch meant to solve capitalization issue (comment #11)? It still doesn't work for me.
I've patched my kernel 2.6.22 with the patch from comment #8 and from comment #16, installed it and its headers and rebooted. Then I patched kbd successfully, though your patch applied with some offsets which means that it was not taken against mainstream kbd-1.12. In addition, it didn't even compile, it still referenced structure member kd.kbdiacr three times near loadkeys.y:960 where only kd.kbdiacruc exists. I altered the source (inserted "uc" here), compiled and installed it, and reloaded my keymap using "loadkeys -u hu". Even though the "hu.map" file contains that plus signs before the names of accented characters, caps lock is still ignored on these keys.
Created attachment 12359 [details]
Fixed kbd support
Erf, sorry, the problem was that I didn't copy kd.h directly from my patched linux to my /usr/include/linux because such operation usually needs some tweaks, but rather did it by hand, and not in the same way. Here is the obviously fixed patch.
My patch won't solve the capitalization problem, that is a completely separate issue. I had a closer look and found:
/* silence the common usage dumpkeys | loadkeys -u */
fprintf(stderr, _("plus before %s ignored\n"), p);
i.e. your + just get silently dropped by loadkeys in unicode mode. And indeed, since KDSKBENT keysyms are coded as unsigned shorts, there is not so much room for telling what is capitalizable or not. However, latin characters should still work as in non-unicode case, and I could check that they actually _do_ still work with upstream kbd. With the debian package, however, they didn't took care that they restricted capitalization to ascii characters only, but that's really a kbd bug. Now, I don't see why you're hitting a bug here. Could you uncomment the fprintf cited above and check whether you have it?
BTW, in your patch,
+ || (keysym >= 0xFF21 && keysym <= 0xFF3A)
+ || (keysym >= 0xFF41 && keysym <= 0xFF5A)
can't happen, since in such case you'd have type >= 0xf0.
Actually, your patch poses problem: on a french layout, we generally _don't_ want eacute to be shifted for instance, because with the azerty layout it would not become Eacute, but two. That's because the kernel uses keyboard layout shifting, not capitalization like in X with xkb.
However, what your patch reveals is that in unicode, capitalizable letters happen to be <= U+3000. What we could do is to claim that Hângul letters can't be directly typed on the linux console (that won't prevent using compose sequences), and thus be able to use U+B000-U+DFFF as representants of capitalizable U+0000-U+2FFF.
(but just to stress it again: the problem you're having is in kbd: I tried loading the hu keymap with upstream loadkeys while in unicode mode, and I could for instance type Õ by using caps-lock and õ)
My patch is crap, I know :-) There's no use in analyzing it.
There's no Õ/õ (o with tilde) in Hungarian, there's only Ő/ő (o with double acute accent). In kernels up to 2.6.21 you may see a wrong character because a latin1 font is loaded instead of the desired latin2, but 2.6.22 prints a replacement symbol and not a faulty glyph in this case. Did you pass "-u" to loadkeys?
I'm not sure I understand correctly: what is the desired behavior of French layout? I guess the key that procudes "2" in English layout should print these:
- no modifier => é
- shift => 2
- caps lock => É
- caps lock + shift => 2
Is this okay? At least this is what X with xkb (xkeyboard-config 0.8) does for me.
If I load turkish layout under X, one of the keys produces ı and I and another key produces i and İ. For both keys, caps lock simply toggles the state of shift.
So: a caps lock that just toggles shift for certain keys doesn't work for French. On the other hand, a caps lock that toggles the case of the Unicode character for certain keys doesn't work (at least not in a locale-independent way) since there's no way to know what the uppercase version of i is (it's different in Turkish than in most other languages).
At this moment I only see one way: introcude the state of caps lock as yet another modifier in addition to the 8 current ones (shift, ctrl, alt, altgr, shiftl, shiftr, ctrll, ctrlr) so that the kernel table gets twice as bigger, and then adjust the user-space software.
Oops, sorry yes I wasn't using a latin2 font. And with -u, I indeed get not capitalization, but that's still a bug of kbd: while it could be able to tell K(KT_LETTER,0xF3) for +oacute, it sends the unicode equivalent, and hence can't express that it wants it to be capitalizable.
- no modifier => é
- shift => 2
- caps lock => É
- caps lock + shift => 2
Yes, that's what you get in xkb, and that's useful. But no, that simply can't be achieved in the kernel without major console rework: by design, the effect of capitalization and shift are the same there.
About `I only see one way: introcude the state of caps lock as yet another modifier in addition to the 8 current ones', it's already available :) See KG_CAPSSHIFT in keyboard.h, which can be triggered by K_CAPSSHIFT. It's just up to the keymap authors to take advantage of it if + is not enough for them and they can afford the extra room.
When in comment #21 I said "latin", it was actually "latin1": in unicode mode, K(KT_LETTER,x) always expresses latin1 letters. So the kernel limitation is for non-latin1 accented letters.
Mmm, but why using -u actually? loadkeys should be able to do everything in a correct way by its own... And actually it does: unicode_start iso02.f08 iso02 ; loadkeys hu.kmap and then ó / Ó works fine.
I don't use unicode_start, rather launch the commands of this shell script separately. There are cases when "loadkeys" and "loadkeys -u" makes a difference for me, and, using a Unicode system, I always need "loadkeys -u".
One of the commands launched by unicode_start is "dumpkeys | loadkeys --unicode", this may have effect on further loadkeys commands as well, I don't know.
I could never get loadkeys to load a keymap where Hungarian accented letters that are part of latin1 are toggled by caps lock but those outside latin1 aren't. Actually this wouldn't make any sense at all. Either all the accented letters are toggled by caps lock (8-bit world) or none of them (utf-8 world). It may be possible in the kernel to have the latin1 accented letters toggled by caps lock even in utf-8, but then kbd doesn't use this feature, and it's really not the way we should follow.
Mmm, are you using upstream kbd or a patched version? The debian version actually forces -u, for instance. I was testing upstream kbd 1.12.
Anyway, as discussed in this bug report, the problem is that the unicode interface does not permit to handle caps lock in unicode mode for non-latin1 letters by just using + and it's not even possible in some cases (like french) to express capitalization with just +. However, we can use the CapsShift from kbd. It actually doesn't _double_ the keymap size, it only adds one or two keymap rows (depending on for how many shift levels you want to express caps change). The only problem is that CapsShift_Lock won't light the CapsLock LED.
Just a thought: rather than extending again and again the kernel console and being limited by not having access to localized toupper() & such functions for instance, maybe it's about time to follow GNU/Hurd and just write a user-mode framebuffer console which just uses xkb data for keymaps...
I have patched loadkeys so that the default is --unicode and a --no-unicode switch reverts it, but when submitting comments here I wrote what I think mainstream loadkeys would have done.
Personally I don't really like framebuffer: it's much slower than a graphical Unicode-aware terminal emulator under X. On the other hand, standard VGA is really obsolete and unable to do trickier stuff (e.g. more than 512 characters displayed at the same time), but at least it's okay as a fallback if something's wrong with the X setup. There are plenty of user-space terminal emulators out there, I see no reason in writing another one -- it would be better to have an X on framebuffer, or some other hack to get an existing one (eg. xterm, vte) operate directly on framebuffer. However, as long as there is a terminal driver inside the kernel, I think it should be fixed in this respect.
Just another note:
As far as I see, caps lock doesn't even work correctly for French in 8-bit setup. (And actually the keymap contains an opening brace instead of é for the key "2" -- strange...)
Nowadays the whole Linux world is switching from 8-bit locales to UTF-8. Dead keys were working in 8-bit world but not in UTF-8, that's what this bugreport is about originally, and this is what's fixed here.
Caps lock not working for layouts such as Hungarian, Turkish etc. is a similar kind of regression: it used to work in 8-bit mode, but doesn't work in UTF-8.
(It's not strictly speaking a regression in the kernel because the current kernel is not worse than the older ones. It's a regression if you look at whole linux distributions: it used to work in them a few years ago, but doesn't work nowadays.)
Caps lock not doing the right thing for French is different: it's an ancient bug.
If you look at the kernel only, these two bugs weight the same I think. If you look at complete Linux distributions, I think the Hungarian, Turkish etc. is more serious, since it is a regression, and regression bugs are usually handled at higher importance (nothing the user is get used to should stop working). If we could fix this at least then the kernel would be no worse in UTF-8 mode than in 8-bit mode.
(PS. I'm not sure I had the same opinion if I were French :-)))
About framebuffer, I do agree about slugginess :), and I do know there are a lot of terms available, there is just not _one_ that suits all usages that could be used by default, and that is painful. Extending xterm or vte into framebuffer would be an interesting idea for that concern indeed, but of course that won't get any better speed than existing framebuffer terms. For our concern (latin2, which can cope with VGA text mode limitations), there could be an equivalent of framebuffer term, but that just use the normal text mode, a bit like screen but it would directly fetch keyboard codes like X does, and hence be able to use xkb data. The problem is that supporting all features of all languages is something that doesn't leave the kernel as simple as possible... We most probably _don't_ want to implement all xkb features in the kernel.
About regression/distributions/etc. the problem is unfortunately rather: not so many people actually care nowadays...
Just to repeat myself maybe more clearly: with the current KDSKBDENT interface we just can't do better. Defining a new interface is what the linuxconsole guys are supposed to be doing. I personnally won't take the time to do it.
About the level of buggyness: not being able to type accented letters is a lot worse than not being able to use caps lock: with the caps lock problem you have the shift work-around (I actually personnally never use the caps lock...)
I forgot to comment on your testing of the french keymap: you are probably using loadkeys fr? That's an elderly one, we rather use fr-latin9 nowadays.
> not being able to type accented letters is a lot worse than not being able to use caps lock...
That's not the case here, you can't compare these two. None of the keymaps is buggy as long as you don't use caps lock. It's the caps lock that triggers the bug in both cases, though in a different way that would need different fix.
Yes, I used fr, fr-latin2 really has an é there, thanks for the info.
> It's the caps lock that triggers the bug in both cases, though in a different
> way that would need different fix.
The two cases you are referring to are the dead accent problem ("not being able
to type accented letters") and the capitalization problem, right?
The dead accent problem doesn't involve the caps lock at all, and truly doesn't
have any workaround except typing latin2 codes by hand. For the capitalization
problem, you want to type capital accented letters, you have the (not very
handful, though) "press shift" workaround.
Note: it meant to write "capitalization problem, *if you want to type...", but the "if" word got lost in edition.
No, I'm no longer talking about the dead accent problem, there's a patch for it here, most likely it will be applied soon, so we can consider it (nearly) fixed.
I'm only thinking on how the caps lock problem could be solved, for two completely different kind of keyboard layouts...
> I'm only thinking on how the caps lock problem could be solved, for two
> completely different kind of keyboard layouts...
Ah ok. Well, I personally agree that the latin2 issue is a bit more important,
yes. But anyway, they're part of the same issue: the current interface is
limited. As you suggested (and is actually already implemented), the CapsShift
modifier can be used for fixing them from kbd without having to define a new
What is the status of this problem, has it been resolved?
The original request, yes. The capitalization of accented letters by caps lock has not been. I will clone the bug to express that.
Closing the non cloned half
Err, actually I had made the clone be the part that was solved so
this bug shouldn't have been closed. That being said, Debian's
console-setup now implements using a series of keymaps to properly
handle (localized, even) capitalization, so at least on that side the
issue is fixed.