On Wed, Nov 04, 2015 at 12:09:45PM +0100, Ingo Schwarze wrote:
> Hi,
> 
> Jason McIntyre wrote on Wed, Nov 04, 2015 at 07:51:30AM +0000:
> 
> > the clarity is good, but i worry that you're making a clear sentence
> > harder to grasp. posix spec itself says -l counts "newline characters",
> > which i find easier to understand in itself, but as flag -l i find it
> > much easier to understand mentally that this option can count lines.
> > 
> > i guess we could leave the description as-is, but add a sentence to
> > qualift this statement.
> 
> POSIX requires that a text file must end in a newline character -
> if it doesn't, it is not a text file.
> 
> So we have the unusual situation here that POSIX specifies how a
> specific utility must deal with a specific kind of invalid input.
> More often, the behaviour for invalid input is left undefined.
> 
> Admittedly, this may seem somewhat confusing to people who don't
> know POSIX.  If you want it clarified, we would have to apply the
> following or a similar patch.  That *is* wordy, but making it shorter
> will make it even more confusing:
> 
>  - If we explicitly talk about the fact that a string of characters
>    not ending in a newline is not a line, we have to say that this
>    is implied by the definition of the term "text file", or we
>    provoke exactly the confusion Dan fell prey to, because many
>    people will consider that definition counterintuitive.
> 
>  - If we dive down to this level of detail, saying just "a string
>    of characters ending in a newline" is no longer enough because
>    that would disallow empty lines, so "zero or more characters"
>    is required.  Note that in the definition of a "word", "string
>    of characters" without further qualification does indeed require
>    at least one character.  Being extremely pedantic about invalid
>    trailing characters but at the same time slightly imprecise about
>    the much more common and perfectly legal case of empty lines
>    seems like a very bad idea to me.
> 
>  - If we try to be that precise, the sentence about words is no
>    longer correct.  If the file contains no whitespace character
>    whatsoever, its content is still considered to be a word, even
>    though it is not delimited by whitespace.  If we explicitly
>    distinguish between "ended by newline" and "ended by EOF" in the
>    definition of a line, than we have to apply the same precision
>    to the definition of a word and can no longer expect "delimited
>    by whitespace" to imply "delimited by EOF".
> 
>  - In any case, the word "maximal" is redundant and not helpful.
>    If a string is not maximal, it is obviously not delimited by
>    whitespace.
> 
> Do you want this?
> 
> Or do you say people should read POSIX to understand what a text
> file and what a line is?
> 

hi.

i don;t expect anyone to read posix ;) i would rather just leave this an
exercise to the reader.

jmc

> Yours,
>   Ingo
> 
> 
> Index: wc.1
> ===================================================================
> RCS file: /cvs/src/usr.bin/wc/wc.1,v
> retrieving revision 1.25
> diff -u -p -r1.25 wc.1
> --- wc.1      21 Apr 2015 10:46:48 -0000      1.25
> +++ wc.1      4 Nov 2015 11:04:20 -0000
> @@ -52,9 +52,12 @@ contained in each input file to the stan
>  If more than one input file is specified,
>  a line of cumulative count(s) for all named files is output on a
>  separate line following the last file count.
> -.Nm
> -considers a word to be a maximal string of characters delimited by
> -whitespace.
> +.Pp
> +In a text file, a line is a string of zero or more characters ending
> +in a newline character, which means that trailing characters after
> +the last newline character do not count as a line.
> +A word is a string of characters delimited by whitespace or by the
> +beginning or the end of the file.
>  Whitespace characters are the set of characters for which the
>  .Xr isspace 3
>  function returns true.

Reply via email to