I'll restate the problem, unencumbered by distracting arrays of colons and hexadecimals.

All 387,000 rows fall into one of three types, each IP address appearing only once in the first column:

Type A: $1("key" IP address), $2(CIDR block), $3(country code), $4(AS number)

Type B: $1(IP address falling within the $2CIDR block of Type A), $2(Type A's "key" IP address, repeated many times in successive rows)

Type C: $1(hostname), $2(Ip address from which $1hostname can be resolved), $3(CIDR block), $4(country code), $5(AS number)

(Type C is not very populous and can be handled with Leafpad)

The desired script:
awk should locate Type A's $1Key and find all the Type B rows whose $2Key match $1's Key, and then copy Type A's columns $2, $3 & $4 in place of Type B's column $2 in every instance of a match with Type A's $1Key

I have found a small number of Type A rows with no data, but those I can look up with whois and fix easily.

The already looked-up hostnames are the only non-IP data in the $1 columns of Types A & B, so awk can safely
concentrate on all the Columns $1.

Also, all the IP addresses of looked-up hostnames will not reappear as not-looked-up IP addresses.

If awk can do everything described above with the first Type A $1Key before proceeding, even if that involves searching the entire 370,000 rows once for each Type A $1Key, then we're on the right track.

George Langford

Reply via email to