bug#69369: wc -w ignores breaking space over UCHAR_MAX

2024-02-25 Thread Pádraig Brady
On 24/02/2024 20:44, Aearil via GNU coreutils Bug Reports wrote: Hi, wc -w doesn't seem to recognize whitespace characters with a codepoint over UCHAR_MAX (255) as word separators. For example, using the character EM SPACE U+2003: $ printf "foo\u2003bar" | ./wc -w 1 I should get a word count

bug#69369: wc -w ignores breaking space over UCHAR_MAX

2024-02-24 Thread Aearil via GNU coreutils Bug Reports
Hi, wc -w doesn't seem to recognize whitespace characters with a codepoint over UCHAR_MAX (255) as word separators. For example, using the character EM SPACE U+2003: $ printf "foo\u2003bar" | ./wc -w 1 I should get a word count of 2, but instead the space is ignored while counting words.

bug#61300: wc -c doesn't advance stdin position when it's a regular file

2023-02-06 Thread Paul Eggert
On 2/6/23 11:38, Pádraig Brady wrote: Note also if you really want to read, you can always `cat | wc -c` rather than just `wc -c` Even that's not guaranteed, as 'cat' is not required to use the 'read' system call if it can determine that the standard input contains only NULs without calling

bug#61300: wc -c doesn't advance stdin position when it's a regular file

2023-02-06 Thread Pádraig Brady
On 06/02/2023 06:27, Stephane Chazelas wrote: On 2023-02-05 20:59, Paul Eggert wrote: On 2023-02-05 11:59, Pádraig Brady wrote: [...] Let's leave that as-is, please. If 'wc' can output the correct value without reading its input, POSIX does not require 'wc' to do the read, and it seems

bug#61300: wc -c doesn't advance stdin position when it's a regular file

2023-02-05 Thread Stephane Chazelas
On 2023-02-05 20:59, Paul Eggert wrote: On 2023-02-05 11:59, Pádraig Brady wrote: [...] Let's leave that as-is, please. If 'wc' can output the correct value without reading its input, POSIX does not require 'wc' to do the read, and it seems perverse to modify 'wc' to go to the effort to refuse

bug#61300: wc -c doesn't advance stdin position when it's a regular file

2023-02-05 Thread Paul Eggert
On 2023-02-05 11:59, Pádraig Brady wrote: Hopefully the attached addresses this. Thanks for fixing that. Note it doesn't add the constraint on the input being readable, which I'll think a bit more about. Let's leave that as-is, please. If 'wc' can output the correct value without reading

bug#61300: wc -c doesn't advance stdin position when it's a regular file

2023-02-05 Thread Pádraig Brady
On 05/02/2023 18:27, Stephane Chazelas wrote: "wc -c" without filename arguments is meant to read stdin til EOF and report the number of bytes it has read. When stdin is on a regular file, GNU wc has that optimisation whereby it skips the reading, does a pos = lseek(0,0,SEEK_CUR) to find out

bug#61300: wc -c doesn't advance stdin position when it's a regular file

2023-02-05 Thread Stephane Chazelas
"wc -c" without filename arguments is meant to read stdin til EOF and report the number of bytes it has read. When stdin is on a regular file, GNU wc has that optimisation whereby it skips the reading, does a pos = lseek(0,0,SEEK_CUR) to find out its current position within the file, fstat(0) and

bug#47702: wc man page: first you are talking about bytes, then you are talking about characters

2021-04-11 Thread Pádraig Brady
On 11/04/2021 02:42, 積丹尼 Dan Jacobson wrote: Man wc says Print newline, word, and byte counts for each FILE, and a total line if more than one FILE is specified. A word is a non-zero-length sequence of characters delimited by white space. first you are talking about

bug#47702: wc man page: first you are talking about bytes, then you are talking about characters

2021-04-10 Thread 積丹尼 Dan Jacobson
Man wc says Print newline, word, and byte counts for each FILE, and a total line if more than one FILE is specified. A word is a non-zero-length sequence of characters delimited by white space. first you are talking about bytes, then you are talking about characters. So

bug#46346: wc --human-readable or --verbose

2021-02-06 Thread Chris Elvidge
On 06/02/2021 01:38 pm, 積丹尼 Dan Jacobson wrote: wc needs a --verbose option. Else one is forced to do: $ file=e.html; echo $file:; for i in bytes chars lines words; do echo -en $i:\\t; wc --$i < $file; done e.html: bytes: 31655 chars: 29141 lines: 643 words: 1275 I mean sometimes we

bug#46346: wc --human-readable or --verbose

2021-02-06 Thread Pádraig Brady
On 06/02/2021 13:38, 積丹尼 Dan Jacobson wrote: wc needs a --verbose option. Else one is forced to do: $ file=e.html; echo $file:; for i in bytes chars lines words; do echo -en $i:\\t; wc --$i < $file; done e.html: bytes: 31655 chars: 29141 lines: 643 words: 1275 I mean sometimes we want to

bug#46346: wc --human-readable or --verbose

2021-02-06 Thread 積丹尼 Dan Jacobson
wc needs a --verbose option. Else one is forced to do: $ file=e.html; echo $file:; for i in bytes chars lines words; do echo -en $i:\\t; wc --$i < $file; done e.html: bytes: 31655 chars: 29141 lines: 643 words: 1275 I mean sometimes we want to send the output to a real person, and currently

bug#37093: wc runs 100% cpu when in pipeline or tee >(wc)

2019-08-20 Thread Assaf Gordon
tag 37093 notabug close 37093 stop Hello, On 2019-08-19 10:44 p.m., Edward Huff wrote: In the demo below, dd uses 0.665s to write 1GiB of zeros. sha256sum uses 4.285s to calculate the sha256 of 1GiB of zeros. wc uses 32.160s to count 1GiB of zeros. [...] baseline results: $ dd if=/dev/zero

bug#37093: wc runs 100% cpu when in pipeline or tee >(wc)

2019-08-20 Thread Bernhard Voelker
On 8/20/19 6:44 AM, Edward Huff wrote: > In the demo below, dd uses 0.665s to write 1GiB of zeros. > sha256sum uses 4.285s to calculate the sha256 of 1GiB of zeros. > wc uses 32.160s to count 1GiB of zeros. > > Linux localhost 5.2.8-200.fc30.x86_64 #1 SMP Sat Aug 10 13:21:39 UTC 2019 > x86_64

bug#37093: wc runs 100% cpu when in pipeline or tee >(wc)

2019-08-19 Thread Edward Huff
In the demo below, dd uses 0.665s to write 1GiB of zeros. sha256sum uses 4.285s to calculate the sha256 of 1GiB of zeros. wc uses 32.160s to count 1GiB of zeros. Linux localhost 5.2.8-200.fc30.x86_64 #1 SMP Sat Aug 10 13:21:39 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux coreutils-8.31-2.fc30.x86_64

bug#34524: wc: word count incorrect when words separated only by no-break space

2019-03-09 Thread Pádraig Brady
On 09/03/19 05:52, Bruno Haible wrote: > Hi Pádraig, > In regard to options for enabling various behaviors for wc(1), I'm thinking we might keep the strict POSIX isspace() behavior with LC_CTYPE=C and/or POSIXLY_CORRECT=1, and use iswnbspace() by default > > Since you plan to

bug#34524: wc: word count incorrect when words separated only by no-break space

2019-03-09 Thread Bruno Haible
Hi Pádraig, > >> In regard to options for enabling various behaviors for wc(1), > >> I'm thinking we might keep the strict POSIX isspace() behavior > >> with LC_CTYPE=C and/or POSIXLY_CORRECT=1, and use iswnbspace() > >> by default Since you plan to add a --words=... option in the future (as

bug#34524: wc: word count incorrect when words separated only by no-break space

2019-02-25 Thread Pádraig Brady
On 24/02/19 19:55, Pádraig Brady wrote: > On 24/02/19 17:07, Pádraig Brady wrote: >> So non break space is generally considered a word delimiter, >> though there are complications you detail from unicode. >> >> In regard to options for enabling various behaviors for wc(1), >> I'm thinking we might

bug#34524: wc: word count incorrect when words separated only by no-break space

2019-02-24 Thread Pádraig Brady
On 24/02/19 17:07, Pádraig Brady wrote: > So non break space is generally considered a word delimiter, > though there are complications you detail from unicode. > > In regard to options for enabling various behaviors for wc(1), > I'm thinking we might keep the strict POSIX isspace() behavior >

bug#34524: wc: word count incorrect when words separated only by no-break space

2019-02-24 Thread Pádraig Brady
On 24/02/19 05:58, Bruno Haible wrote: > [Ccing bug-libunistring, because this is about Unicode handling in GNU. The > original thread is in .] > >>> The man page for wc states: "A word is a... sequence of characters >>> delimited by white

bug#34524: wc: word count incorrect when words separated only by no-break space

2019-02-24 Thread Paul Eggert
Bruno Haible wrote: I would find it best to introduce an option '--unicode' to 'wc', that would produce Unicode compliant results, at the cost of - not following POSIX to the letter, It'd make sense to have an option. How about a more-general option --words, that would let the user define

bug#34524: wc: word count incorrect when words separated only by no-break space

2019-02-24 Thread Bruno Haible
[Ccing bug-libunistring, because this is about Unicode handling in GNU. The original thread is in .] > > The man page for wc states: "A word is a... sequence of characters > > delimited by white space." > > > > But its concept of white space

bug#34524: wc: word count incorrect when words separated only by no-break space

2019-02-23 Thread Pádraig Brady
On 18/02/19 00:12, vampyre...@gmail.com wrote: > $ wc --version > wc (GNU coreutils) 8.29 > Packaged by Gentoo (8.29-r1 (p1.0)) > > The man page for wc states: "A word is a... sequence of characters delimited > by white space." > > But its concept of white space only seems to include ASCII

bug#34524: wc: word count incorrect when words separated only by no-break space

2019-02-22 Thread Bob Proulx
vampyre...@gmail.com wrote: > The man page for wc states: "A word is a... sequence of characters delimited > by white space." > > But its concept of white space only seems to include ASCII white > space. U+00A0 NO-BREAK SPACE, for instance, is not recognized. Indeed this is because wc and

bug#34524: wc: word count incorrect when words separated only by no-break space

2019-02-18 Thread vampyrebat
$ wc --version wc (GNU coreutils) 8.29 Packaged by Gentoo (8.29-r1 (p1.0)) The man page for wc states: "A word is a... sequence of characters delimited by white space." But its concept of white space only seems to include ASCII white space. U+00A0 NO-BREAK SPACE, for instance, is not

bug#20120: wc output padding differs when "-" is in the file list

2018-10-22 Thread Assaf Gordon
tags 20120 wontfix close 20120 stop (triaging old bugs) On 19/03/15 04:38 AM, Pádraig Brady wrote: On 18/03/15 17:54, Bernhard Voelker wrote: On 03/16/2015 06:42 AM, Eric Mrak wrote: It seems that whenever STDIN is involved the results padding reverts to the BSD-style 7/8 padding. Thanks

bug#28468: Bug in wc -l found

2017-09-15 Thread Assaf Gordon
tag 28468 notabug stop Hello Rob, On 2017-09-15 03:03 AM, Weidner, Robert (I/EE-31, extern) wrote: > seems I found a bug in wc, have a look: [[ the attach screen shot shows: $ wc -l monitore-serNr_all-run2.txt 16 while the attached file appears to have 17 lines. ]] This is not a

bug#28468: Bug in wc -l found

2017-09-15 Thread Ruediger Meier
On Friday 15 September 2017, Weidner, Robert (I/EE-31, extern) wrote: > Dear GNU Team, > > seems I found a bug in wc, have a look: > > [cid:image001.png@01D32E12.3F5A7C20] > > Despite of it, I really want to say a BIG Thank you for great > tool-set, especially tree, whic

bug#28468: Bug in wc -l found

2017-09-15 Thread Weidner, Robert (I/EE-31, extern)
Dear GNU Team, seems I found a bug in wc, have a look: [cid:image001.png@01D32E12.3F5A7C20] Despite of it, I really want to say a BIG Thank you for great tool-set, especially tree, which I use for 20 years now! THX Rob Mit freundlichen Gruessen Robert Weidner FAS Architektur / zFAS

bug#23073: wc reports wrong byte counts when using '--from-files0=-'

2016-12-20 Thread Bernhard Voelker
On 12/20/2016 02:12 PM, Pádraig Brady wrote: > Right! > > While st_size would have been incorrect for subsequent > files since v7.1, it was only used since v8.24. > > Fixed with: > http://git.sv.gnu.org/gitweb/?p=coreutils.git;a=commitdiff;h=94d2c68 Thanks! Have a nice day, Berny

bug#23073: wc reports wrong byte counts when using '--from-files0=-'

2016-12-20 Thread Pádraig Brady
On 20/12/16 01:50, Bernhard Voelker wrote: > On 12/19/2016 08:00 PM, Pádraig Brady wrote: >> + [bug introduced in coreutils-7.1] > > FWIW I think that the bug was not introduced in v7.0-96-gc2e56e0: > I had a working 8.23 on a system here, so I took the time to search deeper. > I found the

bug#23073: wc reports wrong byte counts when using '--from-files0=-'

2016-12-19 Thread Bernhard Voelker
On 12/19/2016 08:00 PM, Pádraig Brady wrote: > + [bug introduced in coreutils-7.1] FWIW I think that the bug was not introduced in v7.0-96-gc2e56e0: I had a working 8.23 on a system here, so I took the time to search deeper. I found the reason to be the wrong value of the 'hi_pos' parameter

bug#23073: wc reports wrong byte counts when using '--from-files0=-'

2016-12-19 Thread William R. Fraser
Looks good :) On Mon, Dec 19, 2016 at 11:00 AM, Pádraig Brady wrote: > On 21/03/16 15:16, Pádraig Brady wrote: > > On 21/03/16 00:59, William R. Fraser wrote: > >> When wc gets its list of files by reading from stdin, using the argument > >> '--from-files0=-', it reuses the

bug#23073: wc reports wrong byte counts when using '--from-files0=-'

2016-12-19 Thread Pádraig Brady
On 21/03/16 15:16, Pádraig Brady wrote: > On 21/03/16 00:59, William R. Fraser wrote: >> When wc gets its list of files by reading from stdin, using the argument >> '--from-files0=-', it reuses the same fstatus struct for each file. >> >> The problem is that the 'wc' function checks the 'failed'

bug#23190: wc - Different output

2016-04-02 Thread Assaf Gordon
'{print $0}' /tmp/test.txt | wc -l Output: 21 cut /tmp/test.txt -f1 | wc -lOutput: 21 [...] File, test.txt (attached here with) could be missing last "new line". This is not a bug in 'wc', but the way it works (perhaps not intuitively): 'wc' does not count

bug#23190: wc - Different output

2016-04-02 Thread Seva Adari
Hello, I am not sure if this a bug or expected behavior! Here is different output from each run variation of wc invocation: wc -l test.txt Output: 20 awk '{print $0}' /tmp/test.txt | wc -l Output: 21 cut /tmp/test.txt -f1 | wc -l

bug#23073: wc reports wrong byte counts when using '--from-files0=-'

2016-03-21 Thread Jim Meyering
On Sun, Mar 20, 2016 at 5:59 PM, William R. Fraser wrote: > When wc gets its list of files by reading from stdin, using the argument > '--from-files0=-', it reuses the same fstatus struct for each file. > > The problem is that the 'wc' function checks the 'failed' member of

bug#23073: wc reports wrong byte counts when using '--from-files0=-'

2016-03-21 Thread Bernhard Voelker
On 03/21/2016 04:16 PM, Pádraig Brady wrote: On 21/03/16 00:59, William R. Fraser wrote: When wc gets its list of files by reading from stdin, using the argument '--from-files0=-', it reuses the same fstatus struct for each file. The problem is that the 'wc' function checks the 'failed' member

bug#23073: wc reports wrong byte counts when using '--from-files0=-'

2016-03-21 Thread Pádraig Brady
On 21/03/16 00:59, William R. Fraser wrote: When wc gets its list of files by reading from stdin, using the argument '--from-files0=-', it reuses the same fstatus struct for each file. The problem is that the 'wc' function checks the 'failed' member of this struct and if it is <=0, it skips

bug#23073: wc reports wrong byte counts when using '--from-files0=-'

2016-03-20 Thread William R. Fraser
When wc gets its list of files by reading from stdin, using the argument '--from-files0=-', it reuses the same fstatus struct for each file. The problem is that the 'wc' function checks the 'failed' member of this struct and if it is <=0, it skips doing fstat on the file. The main loop

bug#20954: wc - linux

2015-07-05 Thread Bob Proulx
tele wrote: Maybe we did not understand. I don't want change old definitions but create new option for wc or echo, because this above examples not make logic sense, What would such an option do? ( and it I want fix, however with sed is also fixed ) Your original message asked if echo | wc

bug#20954: wc - linux

2015-07-03 Thread tele
tag 20954 + notabug close 20954 thanks Maybe we did not understand. I don't want change old definitions but create new option for wc or echo, because this above examples not make logic sense, ( and it I want fix, however with sed is also fixed ) however now Iunderstand that they work

bug#20954: wc - linux

2015-07-02 Thread Stephane Chazelas
2015-07-01 19:41:00 -0600, Bob Proulx: [...] $ a= ; echo $s | wc -l 1 [...] No. Should be 1. You have forgotten about the newline at the end of the command. The echo will terminate with a newline. [...] Leaving a variable unquoted will also cause the shell to apply the split+glob

bug#20954: wc - linux

2015-07-02 Thread tele
tag 20954 + notabug close 20954 thanks tele wrote: Hi! Hi! From terminal: $ a= ; echo $s | wc -l 1 Do you mean $a instead of $s? Either way is the same though assuming $s is empty too. - Yes, my mistake :-) Should be 0 , yes ? No. Should be 1. You have forgotten about the

bug#20954: wc - linux

2015-07-02 Thread Bob Proulx
tele wrote: echo gives in new line, Yes. echo -n subtracts 1 line, echo -n is non-portable and shouldn't be used. echo -n suppresses emitting a trailing newline. Note that in both of these cases you are using the shell's internal builtin echo and not the coreutils echo. They behave the

bug#20954: wc - linux

2015-07-01 Thread Bob Proulx
tag 20954 + notabug close 20954 thanks tele wrote: Hi! Hi! :-) From terminal: $ a= ; echo $s | wc -l 1 Do you mean $a instead of $s? Either way is the same though assuming $s is empty too. Should be 0 , yes ? No. Should be 1. You have forgotten about the newline at the end of the

bug#20954: wc - linux

2015-07-01 Thread tele
Hi! From terminal: $ a= ; echo $s | wc -l 1 Should be 0 , yes ?

bug#20751: wc -m doesn't count UTF-8 characters properly

2015-06-07 Thread Valdis Vītoliņš
Thanks for clarification! I tested it with Bash script: chars=$(wc -m mylog|cut -d ' ' -f1) lines=$(wc -l mylog|cut -d ' ' -f1) let chars=$chars - $lines echo $chars and got the same number as given by vim :%s/.//gn (Which was place from what I got confused.) Hopefully this bug description

bug#20751: wc -m doesn't count UTF-8 characters properly

2015-06-07 Thread Stephane Chazelas
2015-06-06 21:49:16 +0300, Valdis Vītoliņš: Note, that UTF-8 characters can be counted by counting bytes with bit patterns 0xxx or 11xx: https://en.wikipedia.org/wiki/UTF-8#Description So, general logic should be, that, if: a) locale setting is utf-8 (e.g. LANG=xx_XX.UTF-8), or b)

bug#20751: wc -m doesn't count UTF-8 characters properly

2015-06-06 Thread Valdis Vītoliņš
Note, that UTF-8 characters can be counted by counting bytes with bit patterns 0xxx or 11xx: https://en.wikipedia.org/wiki/UTF-8#Description So, general logic should be, that, if: a) locale setting is utf-8 (e.g. LANG=xx_XX.UTF-8), or b) first two bytes of file are 0xFE 0xFF

bug#20751: wc -m doesn't count UTF-8 characters properly

2015-06-06 Thread Pádraig Brady
tag 20751 notabug close 20751 stop On 06/06/15 19:49, Valdis Vītoliņš wrote: Version: wc (GNU coreutils) 8.21 When 'wc -m' is invoked, it should print character count, but it counts incorrectly UTF-8 encoded characters. Attached files have 3, 4 an 6 bytes in them, but all have only two UTF-8

bug#20751: wc -m doesn't count UTF-8 characters properly

2015-06-06 Thread Glenn Morris
You mailed submit@debbugs without specifying a Package:, so your bug report ended up on the help-debbugs list. I have reassigned it to coreutils. (Please note there is no wc package.) (My mailer is messing up the UTF-8 characters in your report. Interested parties can see the original at

bug#20120: wc output padding differs when - is in the file list

2015-03-19 Thread Pádraig Brady
On 18/03/15 17:54, Bernhard Voelker wrote: On 03/16/2015 06:42 AM, Eric Mrak wrote: It seems that whenever STDIN is involved the results padding reverts to the BSD-style 7/8 padding. When files are given as input (excluding STDIN) the padding reflects the width of the largest count. When

bug#20120: wc output padding differs when - is in the file list

2015-03-19 Thread Bernhard Voelker
On 03/16/2015 06:42 AM, Eric Mrak wrote: It seems that whenever STDIN is involved the results padding reverts to the BSD-style 7/8 padding. When files are given as input (excluding STDIN) the padding reflects the width of the largest count. When files are given as input and one of these is -,

bug#20120: wc output padding differs when - is in the file list

2015-03-16 Thread Eric Mrak
It seems that whenever STDIN is involved the results padding reverts to the BSD-style 7/8 padding. When files are given as input (excluding STDIN) the padding reflects the width of the largest count. When files are given as input and one of these is -, the padding reverts again to the BSD 7/8

bug#9346: wc does not conform to POSIX (additional spaces)

2011-08-22 Thread Vincent Lefevre
http://pubs.opengroup.org/onlinepubs/9699919799/utilities/wc.html says: STDOUT By default, the standard output shall contain an entry for each input file of the form: %d %d %d %s\n, newlines, words, bytes, file But wc from GNU coreutils 8.12 adds spaces: $ echo | wc 1

bug#9346: wc does not conform to POSIX (additional spaces)

2011-08-22 Thread Pádraig Brady
tags 9346 + notabug On 08/23/2011 01:39 AM, Vincent Lefevre wrote: http://pubs.opengroup.org/onlinepubs/9699919799/utilities/wc.html says: STDOUT By default, the standard output shall contain an entry for each input file of the form: %d %d %d %s\n, newlines, words,

bug#9346: wc does not conform to POSIX (additional spaces)

2011-08-22 Thread Eric Blake
On 08/22/2011 07:07 PM, Pádraig Brady wrote: tags 9346 + notabug On 08/23/2011 01:39 AM, Vincent Lefevre wrote: http://pubs.opengroup.org/onlinepubs/9699919799/utilities/wc.html says: STDOUT By default, the standard output shall contain an entry for each input file of the form:

bug#9166: wc -m that the resulting number is wrong

2011-07-25 Thread Eric Blake
tag 9166 invalid thanks On 07/24/2011 03:58 PM, Paul Ingerson wrote: for example echo A | wc -m yield 2 instead of 1. Why is this? Thanks for the report; however, this is not a bug. In your example, echo A really did output two characters: A and newline. Try: echo A | od -tx1z to see

bug#9019: wc -l bug

2011-07-10 Thread L. A. Walsh
Andrey Sheyko wrote: Hello! I've found out that wc -l doen't count the last line if there is no CR in the end of line It's the 'CR' (or NL) at the end of the line that makes it a new line... without that, you just have text appended to the end of the file...

bug#9019: wc -l bug

2011-07-07 Thread Davide Brini
On Thu, 7 Jul 2011 12:31:16 +0400 Andrey Sheyko ashe...@yotateam.com wrote: Hello! I've found out that wc -l doen't count the last line if there is no CR in the end of line That is correct. The description for -l says: -l Write to the standard output the number of newline characters in

bug#9019: wc -l bug

2011-07-07 Thread Eric Blake
tag 9019 notabug thanks On 07/07/2011 02:31 AM, Andrey Sheyko wrote: Hello! I've found out that wc -l doen't count the last line if there is no CR in the end of line Thanks for the report. However, this is not a bug, but a requirement of POSIX. Furthermore, I think you meant NL, not CR.

Re: RFC: wc --max-line-length vs. TABs [Re: Bug in wc

2008-08-23 Thread Jim Meyering
Bruno Haible [EMAIL PROTECTED] wrote: Hi Jim, This behavior is not specified, and is currently untested. (it's a GNU invention, from Bruno Haible in textutils-1.22d, which was back in 1997) The intention of this option is and was to measure the maximum number of screen columns used by a

Bug in wc

2008-08-22 Thread Arnaldo Mandel
Dear maintainers, There is a bug in the implementation of the -L parameter in wc. It is triggered by http://www.ime.usp.br/~am/122/eps/gapqm2.gz Check this out: $ zcat gapqm2.gz |wc -l -c -L 1 6297954 6353180 That is, the single line is longer than the whole file. This was pointed out

Re: Bug in wc

2008-08-22 Thread Jim Meyering
is longer than the whole file. This was pointed out by William A. M. Gnann [EMAIL PROTECTED] Thanks for reporting it and for giving credit. FYI, here's a smaller reproducer: $ printf '\t'|wc -L 8 ___ Bug-coreutils mailing list Bug-coreutils

Bug in wc (cont.)

2008-08-22 Thread Arnaldo Mandel
My earlier bug report lacked a pssibly relevant piece of info: The bug showed up with versions 6.10 and 5.97 of wc, on Linux 2.6.24 and 2.6.11, i686 and x86_64, LC_ALL=C. am -- Arnaldo Mandel Departamento de Ciência da Computação - Computer Science Department

RFC: wc --max-line-length vs. TABs [Re: Bug in wc

2008-08-22 Thread Jim Meyering
Jim Meyering [EMAIL PROTECTED] wrote: Arnaldo Mandel [EMAIL PROTECTED] wrote: Dear maintainers, There is a bug in the implementation of the -L parameter in wc. It is triggered by http://www.ime.usp.br/~am/122/eps/gapqm2.gz Check this out: $ zcat gapqm2.gz |wc -l -c -L 1 6297954

Re: RFC: wc --max-line-length vs. TABs [Re: Bug in wc

2008-08-22 Thread Bo Borgerson
Jim Meyering wrote: I'm tempted to make the change, but it seems too drastic, after 11 years. Do any of you rely on the current TAB-counting behavior of GNU wc? Hi, It looks like TAB characters aren't alone in being counted by printed width rather than count: $ echo '好' | wc -L 2 Does it

Re: RFC: wc --max-line-length vs. TABs [Re: Bug in wc

2008-08-22 Thread Arnaldo Mandel
Bo Borgerson wrote (on Aug 22, 2008): Does it make sense to change the behavior for TAB, but not for wide characters? Relying on an undocumented tab length seems bad. However, on chars I suggest you just apply the bug-feature operator: document that line length is in chars, and explain

Re: RFC: wc --max-line-length vs. TABs [Re: Bug in wc

2008-08-22 Thread Bruno Haible
Hi Jim, This behavior is not specified, and is currently untested. (it's a GNU invention, from Bruno Haible in textutils-1.22d, which was back in 1997) The intention of this option is and was to measure the maximum number of screen columns used by a file. For many purposes, people are

bug in 'wc -l'

2004-02-06 Thread Thobias Salazar Trevisan
hi, I am sending you a patch to solve a 'problem' at the wc program. When used with -l option (to count the number of lines) the last line isn't counted. First of all, I do not know if it is really a bug, because a newline must end with a '\n' char. But assume a file, that the last line does

Re: bug in 'wc -l'

2004-02-06 Thread Alfred M. Szmidt
I am sending you a patch to solve a 'problem' at the wc program. When used with -l option (to count the number of lines) the last line isn't counted. It counts occurences of '\n' (i.e. newline). So I guess that the behaviour is correct. The problem is: if the input does not end with

Re: bug in 'wc -l'

2004-02-06 Thread era+gmane
On Fri, 6 Feb 2004 13:05:09 +0100 (MET), Alfred M. Szmidt [EMAIL PROTECTED] posted to gmane.comp.gnu.core-utils.bugs: I am sending you a patch to solve a 'problem' at the wc program. When used with -l option (to count the number of lines) the last line isn't counted. It counts

Re: bug in 'wc -l'

2004-02-06 Thread Alfred M. Szmidt
I am sending you a patch to solve a 'problem' at the wc program. When used with -l option (to count the number of lines) the last line isn't counted. It counts occurences of '\n' (i.e. newline). So I guess that the behaviour is correct. If that's your

Re: Bug in 'wc' --help output

2003-09-06 Thread DervishD
Hi Jim :) * Jim Meyering [EMAIL PROTECTED] dixit: Print newline, word and byte counts for each FILE Thanks for the report. That was fixed for coreutils-5.0.90. My excuses, I didn't remember to take a look at alpha.gnu.org :( Anyway, thanks for such a good job. The free software