Tkabber Wiki

Tk Windows keysyms bug explained
Login

Tk Windows keysyms bug explained

Материал из Tkabber Wiki

Содержание

Multiple keymaps

Russian (and, in fact, most "non-Latin-1") users use several "keymaps" when typing. "Keymap" stands for a mapping that the system uses when it translates the user's keypresses into characters, which, for example, are inserted into a text editor in which window the user is typing.

Usually one of these keymaps is the standard "en_US" and another (or other) are for some "incompatible" language, say, "ru_RU" for Russian Cyrillic.

Just to recap for those who aren't familiar with the idea of keymaps: the keyboard always generates the same "scancodes" for each physical key, but when it comes to translation of the scancodes to the corresponding characters the currenlty active keymap is taken into account.

For example, on my system I have English and Russian keymaps, so if I start a text editor (or take any widget accepting textual input and type "qwerty" on both keymaps (separating the sentencies by a newline), I will get:

qwerty
йцукен

because on my "Russian Windows keyboard" I have "й" depicted on the "q" key and so on, which corresponds to the appropriate keymap. You can look at this picture of such keyboard to get the idea: [1].

A small testbed

Now let's create a small testbed that will serve as a demonstration tool for the further explanations. In the freshly started wish I type:

bind . <Key> { puts "%K - %A" }

i.e. when I press a key on the Tk root window I will get the keysym and the corresponding character for that key printed in the console window.

The problem

Let's start with the example which demonstrates why translation of keypresses to keysyms is broken in Windows Tk.

At first I ensure I'm on the English keymap and type the standard encantation — "qwerty". So I get

q - q
w - w
e - e
r - r
t - t
y - y

printed at the console. This is perfectly OK.

Then I switch to the Russian keymap.

Now take a break and guess what scenarios should a user expect when typing our "qwerty" sequence on the Russian keymap? It's pretty understandable that %A is expected to be substituted by the relevant Cyrillic characters. When it comes to keysyms, naturally, there are only two sane scenarios:

Now let's see at the actual output. It will be:

eacute - й
odiaeresis - ц
oacute - у
ecircumflex - к
aring - е
iacute - н

So we have neither of the expected scenarios. Instead we have keysyms corresponding to some Latin-1 characters of code points >= 0x80 from any appropriate charset (ISO8859-1 for instance or the 1st BMP from Unicode).

The explanation

Windows Tk uses two functions from the win\tkWinKey.c file to translate the "virtual-key codes" delivered by the system via the WM_CHAR messages to the window: TkpGetString() and TkpGetKeySym(). The first is used to get the character corresponding to the keyboard event(s) (that is what is substituted in place of %A placeholder in the bound scripts), the second is to get the relevant keysym (what is substituted in place of the %K placeholder).

Both functions use the ToAscii() Win32 API call to get the data they want. And while for the TkpGetString() this does the Right Thing (it's used, for instance, to convert keypresses into the characters being inserted into a widget providing for a text input), for the case of keysyms it breaks the expectations of the developers.

The problem is that, as it seems, the usage of ToAscii() is based on the fact that it's expected to return codes of Latin-1 characters and the relevant X keysyms are happliy assigned the codes which map perfectly to Latin-1, i.e. an X keysym for the physical key labeled "q" has the same code as ToAscii() returns for the scancode of this physical key on Windows.

Unfortunately this doesn't take into account the fact that ToAscii() uses the currently active keymap when processing. Each keymap is implicitly associated with some "code page" (Windows idea for 8-bit charsets), so when I have a Russian keymap active ToAscii() will use the Windows-1251 code-page to map the keyboard scancodes to characters.

Now look at the Windows-1251 table: [2] You will see that our "йцукен" characters corresponding to the "qwerty" series of physical keys are assigned the codes 0xE9, 0xF6, 0xF3, 0xEA, 0xE5 and 0xED, respectively.

These codes are returned from TkpGetKeySym() and they have nothing to do neither with the codes of "q", "w", etc nor with the codes of X keysyms Cyrillic_shorti (0x06ca), etc. Instead, they appear to the caller as keysyms corresponding to Latin-1 characters. That's what we have seen in our testbed. Look at the ISO8859-1 chart [3] and see what characters correspond to the codes listed above. You'll notice a perfect match with our sample output.

Since Windows uses Windows-1252 (WinLatin-1) [4] for the English (and not only English) keymaps and this code-page matches pretty closely to the "canonical" Latin-1 assignment of code points, ToAscii() does the Right Thing in the context of TkpGetKeySym() if and only if the currently active keymap is linked to the cp1252. For any other code-page which "upper part" is incompatible with cp1252 this behaviour is broken.

Two ways to fix the problem

Unfortunately, it appears that there's no one clean solution that would "just fix" the current behaviour while keeping absolute backwards compatibility. Also the outlined problem reveals somthing that I would qualify as a "blind spot" of Tk's handling of keyboard events.

(!) Сделать: write up


A chat with Kevin Kenny (2008-04-10).

(!) Сделать: prettyfy

[01:46]<kostix> if I ever write a patch for Tk this will be fixes for how keysyms are implemented in Windows. looks like only cyrillic speakers/writers care about these problems, so our happiness is in our hands :)

[01:47]<dkf_> it's not so much that only you care, as the rest of us find it hard to reproduce :/

[01:47]<dkf_> (I don't have cyrillic keyboard support, or a cyrillic keyboard either)

[01:47]<hat0> kostix, i've had no end of difficulty using cyrillic text in a tk application in windows

[01:47]<hat0> i'll be happy to test that out or help if i can

[01:49]<kostix> ok, "don't care" is not the right wording, I do understand this

[01:51]<kostix> also I think Tk on X also has some problems. for instance, you can't have russian locale and write accented Latin-5 letters in Tk even if you have all necessary keymaps installed and they work for, say, GTK2+ apps. I heard two reports of this problem

[01:57]<kbk> kostix - "you can't have russian locale and write accented Latin-5 letters in Tk" ... Do you mean accented Cyrillic characters that appear in Latin-5, like ё or ѓ ? (Tk, internally, knows nothing about Latin-5, everything is converted to Unicode on input.

[01:57]<kbk> )

[01:58]* kbk just entered those letters, but isn't running in a Russian locale...

[02:00]<kostix> kbk: to put it simple: I have a russian friend who lives in spain. so he has russian and spain keyboard layouts installed. since he's russian, he has russian system locale. this way, he can only enter cyrillic (in tkabber and plain wish), but not "special" spanish letters, just ascii part. and at some point he changed locale to spanish and realized in this mode it works.

[02:00]<kostix> I can request from him some more strict description, if needed

[02:01]<kbk> Oh. Ok. I remember discussing this problem with you before.

[02:01]<kostix> "it works" means he can write those funky characters with tildas, acute sign, etc ;)

[02:01]<kbk> Is there any application in which it works?

[02:01]<kostix> uh, any non Tk app (such as GTK), as I understand ;)

[02:02]* dkf_ wonders wtf is going on in that situation

[02:02]<kbk> Really? And he doesn't need to swap keyboard maps to do it?

[02:03]<kostix> dkf_: I can say that on windows it's "just fundamentally wrong", so may be X code also has some issues.

[02:03]<kostix> kbk: I think I should stop now, ask that guy for exhaustive explanations and file a bug report ;)

[02:04]* kbk routinely switches between US and US-International keyboards, and *that* much works, at least.

[02:05]<kostix> http://ru.tkabber.jabe.ru/index.php/Tk_Windows_keysyms_bug_explained -- here's an unfinished article on the subject. I intend to eventually complete it and initiate some discussion here or on c.l.t

[02:06]<kostix> unfortunately, the most interesting part isn't written :}

[02:07]<stu> How does one enable keyboard layouts & foreign chars in X? There seems to be more than one way ... this is all rather unclear to me.

[02:07]<kostix> stu: indeed, there are about 4 or 5 ways to configure Xkb

[02:08]* stu isn't even sure if it's Xkb that requires configuring .. or something else.

[02:08]<kostix> stu: http://paste.tclers.tk/868 -- here's a typical us+ru combo used by 99.9% of russian X users

[02:10]<kbk> kostix - OK, so...

[02:10]<kostix> stu: I beleive you can have up to four active groups in Xkb + "modifier keys" in each which allow to temporatily switch from one group to another

[02:10]<stu> thanks. what if I don't have an xorg.conf? it there a command-line utility to switch layouts?

[02:11]<stu> I don't have to use xkbcomp, do I?

[02:11]<kbk> kostix - So %A is correct all the time, but the keysym is not?

[02:11]<kostix> kbk: yes, and this breaks bindings

[02:12]<kostix> kbk: but in fact I beleive the problem is more fundamental and want to write even more text on it

[02:13]<kbk> And what you'd loke to see is that if someone presses Alt+й, the bindings will see <Alt-Cyrillic_shorti> and not <Alt-q>?

[02:13]<kbk> like to see... sheesh, I can't type.

[02:15]<kostix> kbk: no, I would expect that alt-й will be the same as alt-q, i.e. it should not depend on active keymap. and this is the second part of the problem since what I've just said is questionnable

[02:16]<kbk> I'm not convinced that what you're asking for (make it <Alt-q>) is either desirable or possible. (But I concede that making it Alt-eacute is Just Plain Wrong...)

[02:17]<kostix> kbk: for instance, when you have an "active" keys for menu entries (those which are introduced using & in win32 api), they trigger depending on the keymap. i.e. if I have a menu entry "з&апуск", it could be accelerated (activated) only by russian "а", not latin and not "f" which is on the same key

[02:17]<kostix> kbk: sure

[02:17]<kostix> that's why it's complicated

[02:17]<kostix> I have a thought that it would be cool to provide both ways

[02:17]<kostix> may be by having somethin like <Key-U+0463>

[02:18]<kostix> in bindings

[02:18]<kbk> kostix - But... Consider switching to a German keymap...

[02:18]<kostix> I've used it for about a year some time ago :)

[02:18]<kbk> In that case, I press the key marked 'Q' on my keyboard, and %A gives me 'a'...

[02:19]<kbk> Firing the 'Q' binding for the key that types 'a' would be, uhm, peculiar.

[02:19]<kostix> kbk: my (first) idea is simple: bindings are mostly for firing something up

[02:20]<kostix> for instance, in tkabber we have a "smiley palette" bound to alt-e (in english keymap) and a hack to make work on the same *physical key* on russian -- the idea is that the user routinely depresses the same *key*

[02:20]<kostix> she doesn't care what symbol it would provide if used for entering text

[02:21]<kostix> so when I use event add <<EmoticonsPalette>> <Alt-e> it should just work on any keymap

[02:21]<kbk> So for that case, you have a binding on <Alt-Cyillic_u>, another on <Alt-e> and probably (as a workaround for the bug) <Alt-oacute> as well?

[02:21]<kostix> but with menu accelerators the situation is reversed, as I've pointed out earlier :\

[02:22]<kostix> kbk: yes, except for cyrillic binding, they're nonexistent in Windows. and in X too

[02:22]<kostix> you just can't manage to type anything which would hit that binding

[02:22]<kostix> so we have alt-q + alt-oacute

[02:23]<kbk> If Windows were to change so that you *could* bind to Cyrillic_u (and we've already agreed that oacute is Just Plain Wrong!), would that help?

[02:23]<kostix> in fact we have some special remapping for some keys ;)

[02:23]<kostix> kbk: no, because that would require me to insert one special binding for every conceivable keymap in existence

[02:24]<kostix> i.e. it would be no better than currently

[02:24]<kostix> but just for such bindings like firing something up, like posting a menu or opening some window

[02:25]<kostix> which is bound to a physical key (I stress the word "physical" here)

[02:25]<kbk> It's not clear to me that there *is* a better way that doesn't totally break AZERTY keyboards.

[02:25]<kostix> kbk: i.e. in such case I would prefer to bind to scancode

[02:25]<kostix> for such bindings keymaps just get in the way

[02:29]<kostix> kbk: just to clear up a bit: in X pressing alt-q works the same way irrelevant to the currently active keymap.

[02:30]<kbk> It's not obvious to me how binding to scancode helps. Let's say that there's a menu entry '&Annuler' in a French application.

[02:31]<kbk> A French user wants that to be invoked by <Alt-a> - and a Canadian user wants that to be invoked by <Alt-a>.

[02:31]<kostix> I said this about binding that just do something, menu accelerators are *completely* different story, I mentioned this and agree on it

[02:31]<kostix> that's why I said the problem is more fundamental than it appears on the first glance

[02:32]<kbk> OK... instead of a menu entry, make it a button label, that's more likely anyway, that there will be two buttons in a dialog marked &OK and &Annuler.

[02:33]<kostix> yes, exactly

[02:33]<kostix> so there are two distinct cases about this problem

[02:33]<kbk> Both the French and Canadian users want <Alt-a> to be &Annuler -- but the French user has the A key on the key that to the Canadian user is Q

[02:35]<kostix> kbk: what that article is about is more of a "invisible" bindings if you like. say, I press Ctrl-L in tkabber to invoke it's login dialog. it doesn't correspond to any button -- it's just a binding on the root window. and it must work for the physical key

[02:35]<kostix> no matter whether I'm French or Africaans ;)

[02:36]<kbk> How do you document *that*? You can't say, "Press Control-L", because the physical key might not have an L on it.

[02:37]<kostix> kbk: I dunno. it appears that 99% users have L where it usually is on a "typical US QWERTY keyboard"

[02:37]<kostix> deviations like dworak or germans with their z and y swapped are inescapable but you can't deal with all this hell anyway

[02:38]<kbk> OK, but what if the function were Control-Z instead? Amreicans have Z at the left of the lower row, the French have the Z where the Americans have the W, and the Germans have the Z where the Americans have the Y.

[02:38]<kostix> hm, really? I didn't know about this French feature

[02:38]<kostix> it appears more idiotic I thought it is :(

[02:39]<kbk> kostix - That's why I'd say, you *don't* deal with all of it; you bind according to the character that the key generates, and you document according to the character that the key generates.

[02:40]<kbk> If (as you apparently do) you have a large enough user base that swaps keyboard maps for a single keyboard, then you perhaps add extra bindings so tat someone can hit Control-Cyrillic_ka and still get the Control-r functionality.

[02:40]<kostix> kbk: so how we then implement posting an emoticon palette? it's bound to alt-e in tkabber for the obvious reason: "e" stands for "emoticon", so how do we deal with russian? this keymap doesn't have *any* ascii letter

[02:41]<kostix> kbk: so, for any non-ascii keymap I have to add yet another binding, you say?

[02:41]<kostix> or may be provide "national keyboard plugins"?

[02:41]<kbk> Either that, or localize bindings...

[02:41]<kbk> they're likely to be more mnemonic that way, anyway.

[02:42]<kbk> If an English speaker wants to have Control-C == "Close", then a French one might well want to have Control-F = "Fermer"

[02:42]<kostix> if we go this route this also means the same part is broken in X, just in some other way

[02:44]<kbk> In any case, generating a keysym of "aring" for "Cyrillic-ye" is crazy.

[02:45]<kostix> I've nailed it so it's easy to fix. in fact I have a working patch somewhere. but it did what you consider to be wrong ;)

[02:46]<kostix> and anyway it would be better for this problem to be discussed in more detail with more people, I think

[02:46]<kostix> before really implementing any fixes

[02:47]<kostix> kbk: one another point prevented me from submitting my patch somewhere is what will happen to people who use composing key? say, japanese?

[02:47]<kbk> Sure. But I don't think there is any good solution other than "bind to the symbol that's called out on the key".

[02:48]<kostix> when I have seen some code related to this, which I was likely about to break, I started to scratch my head

[02:48]<kostix> so I would like to talk to a CJK person on this (if the community has one)

[02:48]<kbk> Well... in Windows, when someone hits a dead-key sequence, generally the app doesn't see any keystroke except the last.

[02:48]<kostix> dead != compose, I think

[02:49]<kostix> on Japanese keyboards it switches alphabets, as I understand this

[02:49]<kostix> may be pickhq could help here -- he appears to know Japanese and is able to enter it

[02:50]<kbk> Right - but the 'compose' key is itself dead, AFAIK, or at the very least ignorable.

[02:51]<kbk> suchenwi is a fluent Chinese speaker and may very well know how Chinese keyboards work.

[02:51]<kostix> kbk: I think he just has another hack like "ruslish" in his sleeve :P

[02:53]<kbk> Well, yes, he does. But he's lived in China and may well have insight into how real Chinese input methods work.

[02:53]<kbk> There are a lot of them...

[02:53]<kbk> http://zsigri.tripod.com/fontboard/cjk/input.html

[03:02]<kbk> In any case, though, it appears to me that binding to scancode will only make a bad situation worse. http://www.win.tue.nl/~aeb/linux/kbd/scancodes.html shows just what a mess you get into *there*.