D'oh!  That's what I get for coding into the wee hours of the morning on too
much coffee and not enough food.  :-)

I've attached corrections to the addnodes.sh and dedupe.sh scripts.  It
occurred to me later that if addnodes.sh is run periodically without *really*
eliminating duplicate noderefs using the same version, then the seednodes.ref
file is going to just grow and grow and grow.

I've solved the problem by a sort of "brute force" method for now.  In the case
of duplicate noderefs using the same version, only one is saved, the others are
mindlessly tossed in the bit bucket.

I've also improved the code in a few other places, removing some unnecessary
stuff, simplifying and speeding it up a good bit.

Enjoy!

On 17-Feb-2004 Conrad Sabatier wrote:
> OK, I've finally come up with a deduping script.  It's not perfect, in that I
> didn't know what to do with duplicate nodes with the same version, so I just
> left them alone, but it's a start.  Hopefully, over time, it'll work out OK,
> the idea being that as later versions of nodes are added, the older dupes
> will
> drop out.
> 
> For this reason, I've modified the addnodes.sh script to just add everything
> in
> noderefs.txt (after first deduping it) to seednodes.ref, and then deduping
> seednodes.ref.  The problem with the earlier version of addnodes.sh was that
> if
> a node was already present in seednodes.ref, the one in noderefs.txt would
> never get added, even if it was newer.
> 
> I've also cleaned up and "normalized" all the other scripts, cleaning up the
> headers and adding usage tips, as well as made them safer by having them
> backup
> files before modifying them.
> 
> Hope you'll find them useful.
> 
> Conrad (my God, it's nearly 5 a.m.!)
> 
> -- 
> Conrad Sabatier <[EMAIL PROTECTED]> - "In Unix veritas"

-- 
Conrad Sabatier <[EMAIL PROTECTED]> - "In Unix veritas"
#!/bin/sh
#
# dedupe.sh
#
# Dedupe nodes in input file (seednodes.ref or noderefs.txt)
#
# Author: dolphin
#
# Usage: ./dedupe.sh inputfile
#
# Put this script in your main freenet dir and run from there
# (companion script addnodes.sh expects to find it in current dir)
#
# Algorithm:
#
# Step 1:       backup input file!  :-)
#
# Step 2:       read noderef records one at a time from input file,
#                       saving version info to a temp file for each unique IP
#
# Step 3:       remove all but the highest version number in each IP's
#                       temp file
#
# Step 4:       reread the input file one record at a time as above,
#                       saving only the first node whose version is found in
#                       its IP's temp file, discarding all others, writing
#                       output to a temporary nodes file
#
# Step 5:       move temporary nodes file to original input file
#
#                       Not terribly elegant, but it gets the job one  :-)
#
#                       One little bugaboo: if a node has only multiple
#                       references to a single version, all but one are deleted.
#                       Didn't know how else to handle this without more
#                       to go on.  :-)

if [ $# -ne 1 ]
then
        echo "Usage: ./dedupe.sh infile"
        exit 1
fi

infile=$1
cp ${infile} ${infile}.bak      #save a copy just in case!

outfile=/tmp/$(basename ${infile}).$$

rm -f /tmp/node.* ${outfile}

#
#function: read a single noderef from input, save to temp file
#
read_one_noderef()
{
        rm -f /tmp/noderef.$$
        
        while read line
        do
                echo $line >> /tmp/noderef.$$
                if [ "$line" = "End" ]
                then
                        break
                fi
        done
}

# Collect node version info

i=0

while :
do
        read_one_noderef

        if [ ! -f /tmp/noderef.$$ ]
        then
                break   #end of file
        fi
        
        physical_tcp=$(awk '/^physical.tcp=/ {split($0, a, "[=:]"); print a[2]}' /tmp/n
oderef.$$)
        version=$(awk -F',' '/^version=/ {print $4}' /tmp/noderef.$$)
        echo "${version}" >> /tmp/node.${physical_tcp}
        i=$((i + 1))
done < $infile

echo "$i node references found in $infile"
echo "Deduping..."

#
# unique-ify version info (determine latest version of a node)
#
for f in /tmp/node.*
do
        sort -urn -o ${f}.tmp $f
        head -n 1 ${f}.tmp > $f
done

i=0

while :
do
        read_one_noderef

        if [ ! -f /tmp/noderef.$$ ]
        then
                break   #end of file
        fi
        
        physical_tcp=$(awk '/^physical.tcp=/ {split($0, a, "[=:]"); print a[2]}' /tmp/n
oderef.$$)
        version=$(awk -F',' '/^version=/ {print $4}' /tmp/noderef.$$)

        if grep -q $version /tmp/node.${physical_tcp}
        then
                cat /tmp/noderef.$$ >> $outfile
                cat /dev/null > /tmp/node.${physical_tcp}
                i=$((i + 1))
        fi
done < $infile

mv $outfile $infile

echo "Saved $i unique nodes to $infile"

rm -f /tmp/node*
#!/bin/sh
#
# addnodes.sh
#
# Add nodes from current routing table to seednodes.ref
# Dedupes seednodes.ref after adding nodes
#
# Author: dolphin
#
# Usage: ./addnodes.sh
#
# Put this script in your main freenet dir and run from there
# (expects to find noderefs.txt, seednodes.ref and dedupe.sh
# in current dir)
#
# Requires: wget

cd $(realpath $(dirname $0))
cp noderefs.txt noderefs.txt.bak
echo "Fetching noderefs.txt..."
wget -q -O noderefs.txt http://localhost:8888/servlet/nodestatus/noderefs.txt
./dedupe.sh noderefs.txt
cat noderefs.txt >> seednodes.ref
./dedupe.sh seednodes.ref
_______________________________________________
Support mailing list
[EMAIL PROTECTED]
http://news.gmane.org/gmane.network.freenet.support
Unsubscribe at http://dodo.freenetproject.org/cgi-bin/mailman/listinfo/support
Or mailto:[EMAIL PROTECTED]

Reply via email to