[Trisquel-users] Another uniq -u feature emerges

amenex Thu, 02 Jul 2020 10:02:08 -0700

Over a year ago, I lamented that sort followed by uniq -u wasn't removingduplicates from a list:

https://trisquel.info/en/forum/sort-and-uniq-fail-remove-all-duplicates-list-hostnames-and-their-ipv4-addresses

Recently I've been faced with the results of grep searches in other filesthat overlap becausethey contain the same string on which grep was searching. After sorting thegrep outputs, thencutting & pasting, I ended up with pairs of files that contain manyduplicates because the strings

were caught twice.

grep -h lns03.v6.018.net.il *Rev.oGnMap.txt >>PTR.IPv6-Data/IPv6-lns03.v6.018.net.il.txt ;grep -h cable-lns03.v6.018.net.il *Rev.oGnMap.txt >>PTR.IPv6-Data/IPv6-cable-lns03.v6.018.net.il.txt

The grep outputs were expected to list the PTR record in the first column andthe correspondingIPv6 address in the second column, because I reversed the order of thosecolumns in the outputsof the originsl nMap -oG searches as well as removing the parenthesesenclosing the IPv6 addresses.In the sorting scripts below, $1 is the PTR and $2 is the IPv6 address,except for the uniq -cscript where I printed $2 and $3 to skip the counts column produced by uniq-c.


Here are the three pairs of scripts intended to consolidate the files:

sort IPv6-lns03.v6.018.net.il.txt | uniq -u >IPv6-uniq.lns03.v6.018.net.il.txt ;sort IPv6-cable-lns03.v6.018.net.il.txt | uniq -u >IPv6-uniq.cable-lns03.v6.018.net.il.txt

sort -k 2 IPv6-lns03.v6.018.net.il.txt | uniq -c | awk '{print $2"\t"$3}' '-'> IPv6-uniq.lns03.v6.018.net.il.txt ;sort -k 2 IPv6-cable-lns03.v6.018.net.il.txt | uniq -c | awk '{print$2"\t"$3}' '-' > IPv6-uniq.cable-lns03.v6.018.net.il.txt


sort -u IPv6-lns03.v6.018.net.il.txt  > IPv6-uniqB.lns03.v6.018.net.il.txt

sort -u IPv6-cable-lns03.v6.018.net.il.txt >IPv6-uniqB.cable-lns03.v6.018.net.il.txt

The first pair produced zero bytes output for both scripts; the originalfiles were not zero.


The second pair reduced both files by half as expected.

Then I remembered to check this forum, wherein Magic Banana had suggestedusing sort -uinstead of the first pair's combination of sort and uniq -u. This third pairproduced theexact same halving of the original file sizes as my less efficient use ofuniq -c and awk

to eliminate the counts column. Thank you again, Magic Banana !

I had tried to "fix" the uniq -u debacle of the second pair of sortingscripts by copying theaffected file names directly from the File manager into the script text, asthat has been auseful workaround in the past, but this time the first pair of sortingscripts produced zero

bytes output again, same as did my first attempt.

What is it about uniq -u of which I should be wary ?

George Langford

[Trisquel-users] Another uniq -u feature emerges

Reply via email to