On Tue, Apr 06 2021 13:09:11 +0200, Martijn van Duren wrote:
> On Thu, 2021-04-01 at 10:39 +0300, Lauri Tirkkonen wrote:
> > On Thu, Apr 01 2021 09:30:36 +0200, Martijn van Duren wrote:
> > > However, based on the description by the Unicode Consortium I think
> > > OpenBSD does the right thing and xterm and others should be fixed,
> > 
> > practically, I doubt this will happen. I don't think the glibc people will 
> > be
> > convinced to break compatibility to their older versions, for example. I
> > explicitly mentioned I don't wish to engage in a discussion about which way 
> > is
> > _correct_ - I am interested in interoperability with real, existing systems.
> > 
> I´m not convinced that you´ve shown that it´s actually an
> interoperability issue. In your last mail you state that it´s a simple
> display difference between tmux and raw xterm on OpenBSD. To me that´s
> similar to most linux distro´s having grep being an alias for
> grep --color=auto by default and stating that we should do the same
> because you like pretty colours. What applications fail to operate or
> operate in an severely erroneous way because of this discrepency?

I'll try again to describe the problem, and show an example.

TUI applications often care, for layout purposes, how long a particular string
or line will be on the output device. A not insignificant number of those
applications use wcwidth() to figure out how much column space will be taken by
a certain character or string.

If the application performing the width calculations is running on a different
machine than the terminal, say, through ssh, it is important that the
application's idea of width matches what the terminal will eventually render; if
it doesn't, then the application could print the string over some other TUI
element, for example.

This is difficult and messy for many reasons already discussed, especially when
different operating systems disagree about the width or printability of a
character. Nevertheles , in 2021, wcwidth implementations mostly agree and even
things like emojis get a wcwidth of 2 everywhere I've observed (in contrast to
some -1 wcwidths of printable characters I observed on other OSes in the past).
But SHY seems to still be something that causes issues in terminals, at least
for me.

As the example, I ran the command "/exec cat longshy.txt /etc/motd" inside of
irssi, in a 80x24 terminal window, with a few different
terminal/OS-running-terminal/OS-running-irssi configurations. 'longshy.txt' is
available at https://hacktheplanet.fi/shy/longshy.txt

Let's start with st(1), since it's simple and uses wcwidth() directly to decide
how wide a character should be printed:

st on OpenBSD, local irssi https://hacktheplanet.fi/shy/st-openbsd-local.png
st on Debian, local irssi https://hacktheplanet.fi/shy/st-debian-local.png

Here we can see two key things:
 1) on Debian, st is rendering the SHY characters - on OpenBSD it is not
 2) on Debian, irssi considers the line long enough that it splits it and prints
    the remainder on the next line, indented

So, let's introduce ssh into the mix:

st on OpenBSD, irssi on Debian 
https://hacktheplanet.fi/shy/st-openbsd-ssh-debian.png
st on Debian, irssi on OpenBSD 
https://hacktheplanet.fi/shy/st-debian-ssh-openbsd.png

We begin to see differences that stem from wcwidth(SHY). These problems aren't
very big, since in both cases the output is still readable and no information is
lost.

Now, let's try xterm(1). It has been observed in this thread that xterm always
prints SHY.

xterm on OpenBSD, local irssi 
https://hacktheplanet.fi/shy/xterm-openbsd-local.png
xterm on Debian, local irssi https://hacktheplanet.fi/shy/xterm-debian-local.png

On OpenBSD, irssi thinks that the entire line fits into the 80 columns
available. But because xterm prints SHYs, the line overflows onto the next and
is promptly overwritten by the next line that irssi puts there (the motd).

And finally ssh with xterm:

xterm on OpenBSD, irssi on Debian 
https://hacktheplanet.fi/shy/xterm-openbsd-ssh-debian.png
xterm on Debian, irssi on OpenBSD 
https://hacktheplanet.fi/shy/xterm-debian-ssh-openbsd.png

This isn't the best example: there are many different problems that can arise
from the width calculation discrepancy - some of them can be more spectacular I
think, but I could only come up with this one on demand.

Despite the bad example, I do consider cases where text messes up in ways the
application did not intend (in the worst case, overwriting other text) on the
same terminal on different operating systems interoperability bugs. In this case
the outputs are different due to interactions between systems that use
wcwidth(SHY) = 1 (such as, apparently, xterm even locally) and OpenBSD. I
might not say it is "operating in a severely erroneous way", but then I don't
consider "severely erroneous" as a requirement to fix issues.

> If you want to show a hyphen in your text, use a hyphen. If you want to
> indicate where a word might be broken up in a hyphenated way across two
> lines if the software knows the localized grammar rules use a SHY.

Sometimes, I need to display text written or generated by others in my terminal
and it's difficult to tell them not to use SHYs because their text might appear
in my terminal :)

-- 
Lauri Tirkkonen | lotheac @ IRCnet

Reply via email to