Yesterday in some despair, I wrote:

> I note on looking at the resulting files that my earlier generated domain visit
> counts are inaccurate, but that it is not the result of an inaccurate join.

With a better grasp of the scripting & join process, I re-started from the
collected original data, cleaned out the invalid material and (a great many)
IPv6 data and repeated the appropriate steps. Now, the domains-visited counts
are correct and the visits per domain data are carried through intact; there
are no empty cells.

I had been guilty of not reading (i.e., unaware of !) the "info sort" material; even if I had, Magic Banana would still have had to interpret it for me. "cut"
is not yet in my scripting vocabulary.

I looked into that so as to learn what "cut -d ' ' -f -2" means in the present context. I've used "tr ':' '\t'" to separate the IPv6 data into fields so that I
could use LibreOffice Calc to sort the fourth column so as to collect all the
IPv6 data into one series of lines in the spreadsheet. Now I'll use it again
to reconstruct the IPv6 files in their own domain.txt, IPv6, Visits-per-domain
spreadsheet file. As long as I'm not making entries one-at-a-time, I'm happy
when my inefficient script does the same task in less than an hour.

I can look up the PTR records that go with the IPv4 data, but in my parallel
study of the gratuitously resolved IP address data (now about 90% PTR's or
hostnames) the reverse is extraordinarily tedious. I've been using Google to
gather the server CIDR data (with nMaps ASN-query function) for the multi-
addressed PTR's, and Google does not like scripts ... I am challenged every
few minutes while at that task.

I adjusted Magic Banana's suggested script for my actual data:

From:
join

Reply via email to