On Tue, Apr 06 2021 13:09:11 +0200, Martijn van Duren wrote: > On Thu, 2021-04-01 at 10:39 +0300, Lauri Tirkkonen wrote: > > On Thu, Apr 01 2021 09:30:36 +0200, Martijn van Duren wrote: > > > However, based on the description by the Unicode Consortium I think > > > OpenBSD does the right thing and xterm and others should be fixed, > > > > practically, I doubt this will happen. I don't think the glibc people will > > be > > convinced to break compatibility to their older versions, for example. I > > explicitly mentioned I don't wish to engage in a discussion about which way > > is > > _correct_ - I am interested in interoperability with real, existing systems. > > > I´m not convinced that you´ve shown that it´s actually an > interoperability issue. In your last mail you state that it´s a simple > display difference between tmux and raw xterm on OpenBSD. To me that´s > similar to most linux distro´s having grep being an alias for > grep --color=auto by default and stating that we should do the same > because you like pretty colours. What applications fail to operate or > operate in an severely erroneous way because of this discrepency?
I'll try again to describe the problem, and show an example. TUI applications often care, for layout purposes, how long a particular string or line will be on the output device. A not insignificant number of those applications use wcwidth() to figure out how much column space will be taken by a certain character or string. If the application performing the width calculations is running on a different machine than the terminal, say, through ssh, it is important that the application's idea of width matches what the terminal will eventually render; if it doesn't, then the application could print the string over some other TUI element, for example. This is difficult and messy for many reasons already discussed, especially when different operating systems disagree about the width or printability of a character. Nevertheles , in 2021, wcwidth implementations mostly agree and even things like emojis get a wcwidth of 2 everywhere I've observed (in contrast to some -1 wcwidths of printable characters I observed on other OSes in the past). But SHY seems to still be something that causes issues in terminals, at least for me. As the example, I ran the command "/exec cat longshy.txt /etc/motd" inside of irssi, in a 80x24 terminal window, with a few different terminal/OS-running-terminal/OS-running-irssi configurations. 'longshy.txt' is available at https://hacktheplanet.fi/shy/longshy.txt Let's start with st(1), since it's simple and uses wcwidth() directly to decide how wide a character should be printed: st on OpenBSD, local irssi https://hacktheplanet.fi/shy/st-openbsd-local.png st on Debian, local irssi https://hacktheplanet.fi/shy/st-debian-local.png Here we can see two key things: 1) on Debian, st is rendering the SHY characters - on OpenBSD it is not 2) on Debian, irssi considers the line long enough that it splits it and prints the remainder on the next line, indented So, let's introduce ssh into the mix: st on OpenBSD, irssi on Debian https://hacktheplanet.fi/shy/st-openbsd-ssh-debian.png st on Debian, irssi on OpenBSD https://hacktheplanet.fi/shy/st-debian-ssh-openbsd.png We begin to see differences that stem from wcwidth(SHY). These problems aren't very big, since in both cases the output is still readable and no information is lost. Now, let's try xterm(1). It has been observed in this thread that xterm always prints SHY. xterm on OpenBSD, local irssi https://hacktheplanet.fi/shy/xterm-openbsd-local.png xterm on Debian, local irssi https://hacktheplanet.fi/shy/xterm-debian-local.png On OpenBSD, irssi thinks that the entire line fits into the 80 columns available. But because xterm prints SHYs, the line overflows onto the next and is promptly overwritten by the next line that irssi puts there (the motd). And finally ssh with xterm: xterm on OpenBSD, irssi on Debian https://hacktheplanet.fi/shy/xterm-openbsd-ssh-debian.png xterm on Debian, irssi on OpenBSD https://hacktheplanet.fi/shy/xterm-debian-ssh-openbsd.png This isn't the best example: there are many different problems that can arise from the width calculation discrepancy - some of them can be more spectacular I think, but I could only come up with this one on demand. Despite the bad example, I do consider cases where text messes up in ways the application did not intend (in the worst case, overwriting other text) on the same terminal on different operating systems interoperability bugs. In this case the outputs are different due to interactions between systems that use wcwidth(SHY) = 1 (such as, apparently, xterm even locally) and OpenBSD. I might not say it is "operating in a severely erroneous way", but then I don't consider "severely erroneous" as a requirement to fix issues. > If you want to show a hyphen in your text, use a hyphen. If you want to > indicate where a word might be broken up in a hyphenated way across two > lines if the software knows the localized grammar rules use a SHY. Sometimes, I need to display text written or generated by others in my terminal and it's difficult to tell them not to use SHYs because their text might appear in my terminal :) -- Lauri Tirkkonen | lotheac @ IRCnet