As apache server software on shared servers routinely performs hostname
lookups
on data requests made to the hosted domains on their servers, I'm compiling a
database of the thousands of example.com hostnames that are on the Internet.
I've reached an impasse: LibreOffice's Calc spreadsheet will filter _most_ of
the many duplicated lines in my lists, but a great many pairs, triplicates,
and quadruplicates of the lines in my lists still remain. There are enough of
them that their manual removal is tedious.
I've tried uniq -d to try to print one of each duplicated line, followed by
uniq -u to print only the unique lines, but the outputs retained these
duplicated lines nevertheless.
Here's a sample of my predicament:
jaholper1.example.com 95.182.79.24
jaholper1.example.com 95.182.79.24
jaholper1.example.com 95.182.79.33
jaholper1.example.com 95.182.79.33
jaholper4.example.com 109.248.200.4
jaholper7.example.com 109.248.203.131
jaholper7.example.com 109.248.203.188
jaholper7.example.com 109.248.203.189
jaholper7.example.com 109.248.203.191
jaholper7.example.com 109.248.203.198
jaholper7.example.com 185.186.141.79
jaholper7.example.com 185.186.142.10
jaholper7.example.com 185.186.142.10
jaholper7.example.com 185.186.142.100
jaholper7.example.com 185.186.142.100
jaholper7.example.com 185.186.142.101
jaholper7.example.com 185.186.142.101
uniq -d returns only one line: jaholper7.example.com 185.186.142.101
uniq -u keeps everything _but_ the last two lines.
Reversing the positions of the two columns in LibreOffice only makes
matters worse: Get single line output or complete erasure of the file.
It's been suggested that the IPv4 addresses can each be presented as a
single decimal number, but the thought of doing that for my thousands
of IPv4 addresses makes manual editing look pretty good.
George Langford