On Mon, 5 Oct 2015 16:51:25 +0200 Philippe Verdy <verd...@wanadoo.fr> wrote:
> 2015-10-05 13:50 GMT+02:00 Martin J. Dürst <due...@it.aoyama.ac.jp>: > > > In an editing tool (of which an editing interface is a part of), a > > lone surrogate should just be removed! Apparently, that's what > > happens in Richard's case, but only eventually. > Not silently ! Even if this removal is required to go on editing, > this must be notified to the user as it may occur in unedited parts > of the file (and it may be the sign that the document is not fully > plain text, so the user should not save the edited file) > If this is caused by a quirk in the user input (defect of the input > mode or keyboard layout), there should be a notification. The lone surrogates (as I surmise) in this case are caused by the user input being misinterpreted. The sequence of strings delivered to a program running X receiving the same sequence of keystrokes is U+1148F, U+114C0, U+0008, U+114BF, and I have no reason to doubt that the offending program is receiving the same sequence. My working hypothesis is that this is being simplified to U+1148F, U+D805, U+114BF; the presence of U+D805 is a program error. I can reproduce the problem in a previously empty file. Now, on Windows, old MS keyboards at least deliver supplementary characters in a pair of WM_CHAR messages. If one of these ligatures were corrupted so that only the first of the messages was delivered, it is not obvious to me how a program would readily detect the omission. It would only become obvious when the start of the next *character* was received. Richard.