[Trisquel-users] Re : Script needed to compare one two-column file with another two-column file

lcerf Thu, 06 Jun 2019 20:09:37 -0700

First of all:

SmallerFile_0.txt is not sorted (conceptcable.com would be first): below, Isort the files;I do not understand why OutputFile_0.txt does not associate pool.mirgiga.netwith Uhnagty, Yjnmase, and Bnhjyht: below, I assume it should.

The format you use is redundant. Moreover, in the output, it becomes hard(if not impossible) to set apart what comes from the "larger file" and fromthe "smaller file". I suggest to transform the two input files to have noduplicate in the first columns and a list of comma-separated values in thesecond columns (if commas can appear in the files, change that character),using twice the same command line:$ sort -k 1,1 LargerFile_0.txt | awk '{ if ($1 == key) printf "," $2; else {printf "\n" $0; key = $1 } }' | tail -n +2 > LargerFile_0.csv$ sort -k 1,1 SmallerFile_0.txt | awk '{ if ($1 == key) printf "," $2; else {printf "\n" $0; key = $1 } }' | tail -n +2 > SmallerFile_0.csv

You then only need to "join" the two files (seehttps://en.wikipedia.org/wiki/Relational_algebra#Natural_join_(%E2%8B%88) forthe theory):

$ join LargerFile_0.csv SmallerFile_0.csv

pool.giga.net.ru91.210.179.94,91.210.179.95,91.210.179.96,91.210.179.97,91.210.179.98,91.210.179.99Evgbhan,Ghbfght,Kmnslet,Loasfrt,Wnhmahypool.mirgiga.net78.158.193.1,78.158.193.10,78.158.193.104,78.158.193.105,78.158.193.106,78.158.193.107,78.158.193.11,78.158.193.110,78.158.193.111,78.158.193.112,78.158.193.113Bnhjyht,Uhnagty,Yjnmasepool.sevtele.com46.172.203.8,46.172.203.80,46.172.203.83,46.172.203.85,46.172.203.87,46.172.203.88Ghbfght

As a script taking the two files as arguments and running everything inparallel:

#!/bin/sh

if [ -z "$2" ]
then
    printf "Usage: $0 file1 file2
"
    exit
fi

TMP=$(mktemp)
trap "rm $TMP* 2>/dev/null" 0

mkfifo $TMP.1 $TMP.2

sort -k 1,1 "$1" | awk '{ if ($1 == key) printf "," $2; else { printf "\n"$0; key = $1 } }' | tail -n +2 > $TMP.1 &sort -k 1,1 "$2" | awk '{ if ($1 == key) printf "," $2; else { printf "\n"$0; key = $1 } }' | tail -n +2 > $TMP.2 &join $TMP.1 $TMP.2 # | awk '{ for (i = 1; ++i

[Trisquel-users] Re : Script needed to compare one two-column file with another two-column file

Reply via email to