Uniq *sometimes* fails to combine lines containing a null character: # uniq --version uniq (GNU coreutils) 8.4
##### Count duplicate text lines: # printf "\n\x00\n\x00\n" | cat -e | uniq -c 1 $ 2 ^@$ ##### Count duplicate binary lines: # printf "\x00\n\x00\n\n" | uniq -c | cat -e 2 ^@$ 1 $ ##### Whoops, fail to count duplicate binary lines: # printf "\n\x00\n\x00\n" | uniq -c | cat -e 1 $ 1 ^@$ 1 ^@$ This was the smallest test case; the original file had hundreds of lines with nulls (\x00) and Ctrl-A (\x01) characters, and it was quite a surprise when the output of 'sort testfile | uniq -c' had many pages of '1 ^@$' followed by '496 ^A$': it was counting the Ctrl-A lines correctly, but failing on the null-character lines. For automated testing with 'delta' or 'git bisect', this works: --- #!/bin/bash a=$(sort $1 | cat -e | uniq -c | md5sum -) b=$(sort $1 | uniq -c | cat -e | md5sum -) if [[ "$a" != "$b" ]]; then echo "PASS (bug present)"; exit 0 else echo "FAIL (bug absent)"; exit 1 fi ---- I regret not having the time to test this with coreutils 8.28, but I couldn't see anything in the git log to suggest this has been fixed: http://git.savannah.gnu.org/gitweb/?p=coreutils.git;a=history;f=src/uniq.c;h=d1dac93c010d7333ced4b54fccbd965cbd5729c2;hb=HEAD Cheers, PD