First of all:

SmallerFile_0.txt is not sorted (conceptcable.com would be first): below, I sort the files; I do not understand why OutputFile_0.txt does not associate pool.mirgiga.net with Uhnagty, Yjnmase, and Bnhjyht: below, I assume it should.


The format you use is redundant. Moreover, in the output, it becomes hard (if not impossible) to set apart what comes from the "larger file" and from the "smaller file". I suggest to transform the two input files to have no duplicate in the first columns and a list of comma-separated values in the second columns (if commas can appear in the files, change that character), using twice the same command line: $ sort -k 1,1 LargerFile_0.txt | awk '{ if ($1 == key) printf "," $2; else { printf "\n" $0; key = $1 } }' | tail -n +2 > LargerFile_0.csv $ sort -k 1,1 SmallerFile_0.txt | awk '{ if ($1 == key) printf "," $2; else { printf "\n" $0; key = $1 } }' | tail -n +2 > SmallerFile_0.csv

You then only need to "join" the two files (see https://en.wikipedia.org/wiki/Relational_algebra#Natural_join_(%E2%8B%88) for the theory):
$ join LargerFile_0.csv SmallerFile_0.csv
pool.giga.net.ru 91.210.179.94,91.210.179.95,91.210.179.96,91.210.179.97,91.210.179.98,91.210.179.99 Evgbhan,Ghbfght,Kmnslet,Loasfrt,Wnhmahy pool.mirgiga.net 78.158.193.1,78.158.193.10,78.158.193.104,78.158.193.105,78.158.193.106,78.158.193.107,78.158.193.11,78.158.193.110,78.158.193.111,78.158.193.112,78.158.193.113 Bnhjyht,Uhnagty,Yjnmase pool.sevtele.com 46.172.203.8,46.172.203.80,46.172.203.83,46.172.203.85,46.172.203.87,46.172.203.88 Ghbfght

As a script taking the two files as arguments and running everything in parallel:
#!/bin/sh

if [ -z "$2" ]
then
    printf "Usage: $0 file1 file2
"
    exit
fi

TMP=$(mktemp)
trap "rm $TMP* 2>/dev/null" 0

mkfifo $TMP.1 $TMP.2

sort -k 1,1 "$1" | awk '{ if ($1 == key) printf "," $2; else { printf "\n" $0; key = $1 } }' | tail -n +2 > $TMP.1 & sort -k 1,1 "$2" | awk '{ if ($1 == key) printf "," $2; else { printf "\n" $0; key = $1 } }' | tail -n +2 > $TMP.2 & join $TMP.1 $TMP.2 # | awk '{ for (i = 1; ++i

Reply via email to