Hi,

Philip Guenther wrote on Tue, Jan 05, 2016 at 12:17:13AM -0800:
> On Mon, Jan 4, 2016 at 11:15 PM, Michal Mazurek <akf...@jasminek.net> wrote:
>> On 01:24:35,  5.01.16, Ingo Schwarze wrote:

>>> +If an output line would be broken after a non-blank character but
>>> +contains at least one blank character, break the line earlier,
>>> +after the last blank character.
>>> +This is useful to avoid line breaks in the middle of words, if
>>> +possible.

>> After a second look, even though the current documentation mentions
>> a "blank character" the source code shows that in fact a space is meant:
>>
>>         if (split_words) {
>>                 for (i = 0, last_space = -1; i < indx; i++)
>>                         if(buf[i] == ' ')
>>                                 last_space = i;
>>         }

> For a utility like fold(1) which is covered by the POSIX/Single Unix
> Specification, we should think hard about diverging from the the spec.
> SUSv7 says:
> 
> -s If a segment of a line contains a <blank> within the first width
>    column positions (or bytes), break the line after the last such
>    <blank> meeting the width constraints.  If there is no <blank>
>    meeting the requirements, the -s option shall have no effect for
>    that output segment of the input line.
> 
> which seems to agree with the current manpage, suggesting the
> code--not the documentation--is wrong.

Exactly, so i committed my version of the diff for now.
Thanks to jmc@ for checking the wording.

> (Yes, OpenBSD diverges from POSIX on some points...where we explicitly
> think POSIX is unsafe or Wrong.  To my knowledge, this is not one of
> those.)

I agree.

In case nobody beats me to adding UTF-8 support, i will probably
propose to intentionally violate POSIX in one *other* respect:  If
a double-width character is followed by one single backspace character
(rather than by two, as required by POSIX), back up by *two* display
columns (rather than by one, as required by POSIX) to correctly
handle groff(1) and mandoc(1) output and for consistency with all
the other base system utilities.  Of course, if a double-width
character is followed by two backspace characters (as required by
POSIX), ignore the second backspace (as required by POSIX).  That's
harmless because i know of no program (except those intended to
feed their output directly to a terminal like ul(1)) that produces
two backspaces after double-width characters, and because obviously,
a useful pipeline looks like

  ... | nroff -c | fold | ul

(which works on any terminal) rather than

  ... | nroff -c | ul | fold

which only works with a very small set of obscure terminals like
TERM=mime that do underline and bold by backspace overstriking, but
which fails on anything modern like xterm, tmux, or even vt100 which
has the :us:, :md:, :mr:, or even just the :so: capability, because
the (for example, ANSI) mode-switching sequences utterly screw up
column counting.  (Amusingly, the second pipeline also fails on
even dumber terminals like TERM=lpr that do underline and bold by
carriage return overstriking, if input lines containing markup
are wider than the terminal width, because the line will be
broken before the carriage return and the carriage return cannot
reverse the line feed.)

Yours,
  Ingo

Reply via email to