On Wed, Nov 04, 2015 at 12:09:45PM +0100, Ingo Schwarze wrote: > Hi, > > Jason McIntyre wrote on Wed, Nov 04, 2015 at 07:51:30AM +0000: > > > the clarity is good, but i worry that you're making a clear sentence > > harder to grasp. posix spec itself says -l counts "newline characters", > > which i find easier to understand in itself, but as flag -l i find it > > much easier to understand mentally that this option can count lines. > > > > i guess we could leave the description as-is, but add a sentence to > > qualift this statement. > > POSIX requires that a text file must end in a newline character - > if it doesn't, it is not a text file. > > So we have the unusual situation here that POSIX specifies how a > specific utility must deal with a specific kind of invalid input. > More often, the behaviour for invalid input is left undefined. > > Admittedly, this may seem somewhat confusing to people who don't > know POSIX. If you want it clarified, we would have to apply the > following or a similar patch. That *is* wordy, but making it shorter > will make it even more confusing: > > - If we explicitly talk about the fact that a string of characters > not ending in a newline is not a line, we have to say that this > is implied by the definition of the term "text file", or we > provoke exactly the confusion Dan fell prey to, because many > people will consider that definition counterintuitive. > > - If we dive down to this level of detail, saying just "a string > of characters ending in a newline" is no longer enough because > that would disallow empty lines, so "zero or more characters" > is required. Note that in the definition of a "word", "string > of characters" without further qualification does indeed require > at least one character. Being extremely pedantic about invalid > trailing characters but at the same time slightly imprecise about > the much more common and perfectly legal case of empty lines > seems like a very bad idea to me. > > - If we try to be that precise, the sentence about words is no > longer correct. If the file contains no whitespace character > whatsoever, its content is still considered to be a word, even > though it is not delimited by whitespace. If we explicitly > distinguish between "ended by newline" and "ended by EOF" in the > definition of a line, than we have to apply the same precision > to the definition of a word and can no longer expect "delimited > by whitespace" to imply "delimited by EOF". > > - In any case, the word "maximal" is redundant and not helpful. > If a string is not maximal, it is obviously not delimited by > whitespace. > > Do you want this? > > Or do you say people should read POSIX to understand what a text > file and what a line is? >
hi. i don;t expect anyone to read posix ;) i would rather just leave this an exercise to the reader. jmc > Yours, > Ingo > > > Index: wc.1 > =================================================================== > RCS file: /cvs/src/usr.bin/wc/wc.1,v > retrieving revision 1.25 > diff -u -p -r1.25 wc.1 > --- wc.1 21 Apr 2015 10:46:48 -0000 1.25 > +++ wc.1 4 Nov 2015 11:04:20 -0000 > @@ -52,9 +52,12 @@ contained in each input file to the stan > If more than one input file is specified, > a line of cumulative count(s) for all named files is output on a > separate line following the last file count. > -.Nm > -considers a word to be a maximal string of characters delimited by > -whitespace. > +.Pp > +In a text file, a line is a string of zero or more characters ending > +in a newline character, which means that trailing characters after > +the last newline character do not count as a line. > +A word is a string of characters delimited by whitespace or by the > +beginning or the end of the file. > Whitespace characters are the set of characters for which the > .Xr isspace 3 > function returns true.
