As apache server software on shared servers routinely performs hostname lookups
on data requests made to the hosted domains on their servers, I'm compiling a
database of the thousands of example.com hostnames that are on the Internet.

I've reached an impasse: LibreOffice's Calc spreadsheet will filter _most_ of
the many duplicated lines in my lists, but a great many pairs, triplicates,
and quadruplicates of the lines in my lists still remain. There are enough of
them that their manual removal is tedious.

I've tried uniq -d to try to print one of each duplicated line, followed by
uniq -u to print only the unique lines, but the outputs retained these
duplicated lines nevertheless.

Here's a sample of my predicament:

jaholper1.example.com   95.182.79.24
jaholper1.example.com   95.182.79.24
jaholper1.example.com   95.182.79.33
jaholper1.example.com   95.182.79.33
jaholper4.example.com   109.248.200.4
jaholper7.example.com   109.248.203.131
jaholper7.example.com   109.248.203.188
jaholper7.example.com   109.248.203.189
jaholper7.example.com   109.248.203.191
jaholper7.example.com   109.248.203.198
jaholper7.example.com   185.186.141.79
jaholper7.example.com   185.186.142.10
jaholper7.example.com   185.186.142.10
jaholper7.example.com   185.186.142.100
jaholper7.example.com   185.186.142.100
jaholper7.example.com   185.186.142.101
jaholper7.example.com   185.186.142.101

uniq -d returns only one line: jaholper7.example.com    185.186.142.101
uniq -u keeps everything _but_ the last two lines.

Reversing the positions of the two columns in LibreOffice only makes
matters worse: Get single line output or complete erasure of the file.

It's been suggested that the IPv4 addresses can each be presented as a
single decimal number, but the thought of doing that for my thousands
of IPv4 addresses makes manual editing look pretty good.

George Langford

Reply via email to