I'll restate the problem, unencumbered by distracting arrays of colons and
hexadecimals.
All 387,000 rows fall into one of three types, each IP address appearing only
once in the first column:
Type A: $1("key" IP address), $2(CIDR block), $3(country code), $4(AS number)
Type B: $1(IP address falling within the $2CIDR block of Type A), $2(Type A's
"key" IP address, repeated many times in successive rows)
Type C: $1(hostname), $2(Ip address from which $1hostname can be resolved),
$3(CIDR block), $4(country code), $5(AS number)
(Type C is not very populous and can be handled with Leafpad)
The desired script:
awk should locate Type A's $1Key and find all the Type B rows whose
$2Key match $1's Key, and then
copy Type A's columns $2, $3 & $4 in place of Type B's column $2 in
every instance of a match with Type A's $1Key
I have found a small number of Type A rows with no data, but those I can look
up with whois and fix easily.
The already looked-up hostnames are the only non-IP data in the $1 columns of
Types A & B, so awk can safely
concentrate on all the Columns $1.
Also, all the IP addresses of looked-up hostnames will not reappear as
not-looked-up IP addresses.
If awk can do everything described above with the first Type A $1Key before
proceeding, even if that
involves searching the entire 370,000 rows once for each Type A $1Key, then
we're on the right track.
George Langford