-
-
Notifications
You must be signed in to change notification settings - Fork 111
Backspace and cluster deletion
Occasionally, a text editor will be released or updated, that has issues working with Keyman and other language input methods, specifically related to when a backspace
key is 'pressed' or issued.
The problem revolves around the handling of deletion of clusters vs. individual code points. Instead of deleting only the code point preceding the insertion point, some editors delete the entire cluster when the backspace key is pressed.
With left-to-right(LTR) language you have a keyboard key sequence that inserts combined code points such as “क" and “ ्”, resulting in a cluster or grapheme “क्”. The cursor is positioned to the right of the cluster क्. When the backspace key is pressed, the expected behaviour should be to delete the code point " ्" while leaving "क" intact. However, some editors in recognising the cluster delete both code points on the single backspace key press.
A more extreme example can be seen in some languages, such as Khmer, where a single backspace can end up deleting a whole syllable which took up to 7 keystrokes to type.
There are two reasons:
- It is unfriendly to the end user, because the backspacing does not match the user's input expectations, nor conform to the pattern of the input method.
- For applications that are 'non-compliant' with modern input protocols/APIs, Keyman runs in a legacy compatibility mode, in which it emits backspace key events to delete characters one codepoint at a time. Applications which delete multiple codepoints from a single backspace event will break this legacy compatibility mode.
Keyman works as a rules-based input method: it maintains the context of what has been entered. If a certain codepoint sequence matches a rule then it will cause the correct number of codepoints already output to be deleted, using backspaces, and replaced by a new codepoint or sequence of codepoints. When the whole cluster and not a single character is deleted too many characters are removed before inserting the new characters. The end result to the user is that text is now jumbled. See the Malayalam example below.
In Windows, the base-level behaviour with backspace is to delete a single codepoint. The decision of how much to delete with backspace should be the responsibility of the input method. This behaviour is the norm in Windows edit and richedit controls, Microsoft applications, including Office and Edge, and the majority of third-party applications.
Malayalam | Key | Unicode | Notes |
---|---|---|---|
സ് | s | 0D38 0D4D |
|
സ്പ് | sp | 0D38 0D4D 0D2A 0D4D |
|
സ്പെ | spe | 0D38 0D4D 0D2A 0D46 |
|
സ്പീ | spee | 0D38 0D4D 0D2A 0D40 |
1 x bksp has deleted 1 codepoint |
Malayalam | Key | Unicode | Notes |
---|---|---|---|
സ് | s | 0D38 0D4D |
|
സ്പ് | sp | 0D38 0D4D 0D2A 0D4D |
|
സ്പെ | spe | 0D38 0D4D 0D2A 0D46 |
|
സ്ീ | spee | 0D38 0D4D 0D40 |
1 x bksp has deleted 2 codepoints (ERR) |
a. क + ् + क → क्क
b. क + ् + त → क्त
In row typing क + ् → क्
क् + क → क्क
क्क + backspace → क् Could then obtain row b. output by entering त क् + त → क्त When the whole cluster is deleted by backspace you lose the whole cluster क्क + backspace →
- Emojis