Hi Anton,

Anton Lindqvist wrote on Fri, May 19, 2017 at 08:42:05AM +0200:

> 1. Run ksh under tmux.
> 
> 2. Input the following characters, without spaces:
> 
>    a (any character) ^B (backward-char) ö (any UTF-8 character)
> 
> 3. At this point, the prompt gets overwritten.
> 
> Since ksh read a single byte of input, it will display a partial UTF-8
> character before the whole character has been read. This is especially
> troublesome when the cursor is not placed at the end of the line. In the
> scenario above, after reading the first byte of 'ö' the following
> sequence will be displayed:
> 
>   0xc3 0x61 0x08
> 
> That is the first byte of 'ö' (0xc3), 'a' (0x61), '\b' (0x08). tmux
> does the right thing here, since 0xc3 is a valid UTF-8 start byte it
> expects it to be followed by a UTF-8 continuation byte which is not the
> case. The two first bytes (0xc3, 0x61) are discarded and the parser is
> reset to its initial state

I call that a bug in tmux.  At least for UTF-8, tmux should never reset
its parser.  Depending on the keyboard configuration, it may be possible
to enter single non-UTF-8 bytes.  For example, at the console, i can do
this:

 - type "printf x"
 - press Ctrl-V, Ctrl-Alt-Z
 - type "printf x | hexdump -C"

The result is:

  00000000  78 9a 78   |x.x|
  00000003

All of the shell, the console, and xterm more or less handle such
stunts, even though admittedly, that is not the usual way of typing
in a binary file.  tmux is a terminal emulator, so it ought to cope,
too, and not reset its state.

Note that this is yet another instance where the concept of arbitrary
locales is utterly broken and insecure.  On some non-OpenBSD system
supporting such insecure stuff, if the user has set an arbitrary,
non-UTF-8, state-dependent locale and somehow manages to insert an
invalid byte into the input stream, the state of the input stream
becomes invalid, no further characters can be read from that terminal,
and *there is no way to recover*.  The only remaining half-secure
option is for the shell to exit, which may also not be very secure,
on a different level.

But with UTF-8, there is no problem whatsoever dealing with invalid
bytes, so tmux ought to cope.

> causing the backspace to be accepted and the
> first character in the prompt to be overwritten.

That's part of the reason why tmux must not reset its state:
The shell won't know about it, and the screen will become garbled.

> Below is diff that make sure to read a whole UTF-8 character in
> x_emacs() prior doing another iteration of the main-loop

I don't think we can do that.  What if there is no next byte?
Then the shell will hang until, maybe considerably later, the next
character is typed.  Also, we cannot rely on parsing the UTF-8 start
byte (even if done correctly).  It tells the byte length of the
character only for valid byte sequences, and the byte sequence need
not be valid.

Besides, i think patching the shell to work around terminal bugs
(or terminal emulator bugs) is the wrong approach in the first place.

Yours,
  Ingo

Reply via email to