Hi, Jason McIntyre wrote on Wed, Nov 04, 2015 at 07:51:30AM +0000:
> the clarity is good, but i worry that you're making a clear sentence > harder to grasp. posix spec itself says -l counts "newline characters", > which i find easier to understand in itself, but as flag -l i find it > much easier to understand mentally that this option can count lines. > > i guess we could leave the description as-is, but add a sentence to > qualift this statement. POSIX requires that a text file must end in a newline character - if it doesn't, it is not a text file. So we have the unusual situation here that POSIX specifies how a specific utility must deal with a specific kind of invalid input. More often, the behaviour for invalid input is left undefined. Admittedly, this may seem somewhat confusing to people who don't know POSIX. If you want it clarified, we would have to apply the following or a similar patch. That *is* wordy, but making it shorter will make it even more confusing: - If we explicitly talk about the fact that a string of characters not ending in a newline is not a line, we have to say that this is implied by the definition of the term "text file", or we provoke exactly the confusion Dan fell prey to, because many people will consider that definition counterintuitive. - If we dive down to this level of detail, saying just "a string of characters ending in a newline" is no longer enough because that would disallow empty lines, so "zero or more characters" is required. Note that in the definition of a "word", "string of characters" without further qualification does indeed require at least one character. Being extremely pedantic about invalid trailing characters but at the same time slightly imprecise about the much more common and perfectly legal case of empty lines seems like a very bad idea to me. - If we try to be that precise, the sentence about words is no longer correct. If the file contains no whitespace character whatsoever, its content is still considered to be a word, even though it is not delimited by whitespace. If we explicitly distinguish between "ended by newline" and "ended by EOF" in the definition of a line, than we have to apply the same precision to the definition of a word and can no longer expect "delimited by whitespace" to imply "delimited by EOF". - In any case, the word "maximal" is redundant and not helpful. If a string is not maximal, it is obviously not delimited by whitespace. Do you want this? Or do you say people should read POSIX to understand what a text file and what a line is? Yours, Ingo Index: wc.1 =================================================================== RCS file: /cvs/src/usr.bin/wc/wc.1,v retrieving revision 1.25 diff -u -p -r1.25 wc.1 --- wc.1 21 Apr 2015 10:46:48 -0000 1.25 +++ wc.1 4 Nov 2015 11:04:20 -0000 @@ -52,9 +52,12 @@ contained in each input file to the stan If more than one input file is specified, a line of cumulative count(s) for all named files is output on a separate line following the last file count. -.Nm -considers a word to be a maximal string of characters delimited by -whitespace. +.Pp +In a text file, a line is a string of zero or more characters ending +in a newline character, which means that trailing characters after +the last newline character do not count as a line. +A word is a string of characters delimited by whitespace or by the +beginning or the end of the file. Whitespace characters are the set of characters for which the .Xr isspace 3 function returns true.
