[Tech] Node credibility estimation

Ed Tomlinson Sun, 29 Jan 2006 09:40:43 -0500

On Saturday 28 January 2006 10:16, Matthew Toseland wrote:
> On Sat, Jan 28, 2006 at 02:52:27PM +0000, Matthew Toseland wrote:
> > Level 1: All our directly connected nodes get credibility of 100%.
> > Level 2: A node's credibility is simply the number of our directly
> > connected nodes which are connected to it, divided by the total number
> > of our directly connected nodes. For example, we have 6 nodes, E is
> > connected to 3 of them, so it gets a credibility of 50%.
> > Level 3: The sum of the credibility of each level 2 node it is connected
> > to, divided by the number of level 2 nodes. So if F is only connected to
> > E, and there are 12 level 2 nodes, it gets 50%/12. Better: divide by the
> > total credibility:
> > 
> > At level 1, we have 6 nodes with 100% cred.
> > At level 2, we have 18 nodes with total cred of 10.
> > At level 3, for example, we have a node connected to 3 level 2 nodes,
> > with a total credibility of 2. This gives it 20% credibility.
> > 
> > How to combine credibility for nodes which connect at multiple levels?
> > 
> > If A is connected to a level 2 node at cred 50%, and a level 3 node at
> > cred 20%, and the two are independant, in other words, the level 3 node
> > is not connected to the level 2 node, we can simply add the fractions:
> > 50% / 10 + 20% / total 3rd level cred.
> > 
> > If A is connected to a level 2 node at cred 50%, and a level 3 node at
> > 20%, and the level 3 node is connected to 3 level 2 nodes including the
> > mentioned one, we have two options:
> > a) We ignore the level 3 connection.
> > b) We factor out the level 2 connection when calculating the cred
> > proportion for the level 3 node.
> 
> Major correction:
> 
> We have to divide by the number of nodes which might provide
> credibility, NOT by the total credibility of those nodes.


This makes more sense - I was having trouble digesting the divide by 
credibility idea.

> The reason for this is that we don't want 4th level nodes to have 100%
> credibility just because there is only one 3rd level node, which happens
> to be connected to one 2nd level node, which happens to be connected to
> one 1st level node! The whole point of the algorithm is to find parts of
> the network that are probably fictitious.

I tend to use trust in place of credibility...

The further away a node is the less we should trust it.   So the simplest case 
is:

A -> B -> C -> D

from D from A perspective 

D = 100 / f(ABCD) = 25

I think the number of paths matter.  If one node lies and we are using only
the best path, max of the terms instead of sum, we could easily end up trusting 
more than is wise.

A -> B -> C -> D
A -> C -> D
A -> F -> E -> D

D = 100 / (f(ABCD)  + 100/f(ACD) + 100/f(AFED) = 25 +?33 + 25 = 83

A second issue is what f(nodes) should do.  In my examples I have used a
sum of nodes in the path.  Should hops be used instead?  it could just as 
easily be a squared or factorial if we are going to trust nodes further away 
less...  What makes sense statistically (no alchemy wanted)?  This choice 
also has a big influence on how deep we probe (HTL).

        sum     square  factorial       nodes
B =     50      25      50      100 / f(AB)
C =     33      11      17      100 / f(ABC)
D =     25      6       4       100 / f(ABCD)

adding D->E and using hops

        sum     square  factorial       hops
B =     100     100     100     100 / f(AB)
C =     50      25      50      100 / f(ABC)
D =     33      11      17      100 / f(ABCD)
E  =    25      6       4       100 / f(ABCDE)

A third issue, what do we do with nodes that currently are not connected? 
If they are old I think we need to ignore them, if they are within the latest
set of good releases maybe we should extend some trust?

Another issue.  Here we have assumed that a connection is good enough to
trust a node.  Is it?  It would probably be much better to extend trust to
nodes that return data normally in a statistical sense.   This implies we need
to track the percentage of requests that return data (and the standard 
deviation)
and decide if a given node is really trustable.  Probably this needs to be done
with some sort of running average so we know an node is trustable now.

> Should implement this post-0.7 and play with it a bit. It may be that the
> number of hops matters...
> > 
> > 
> > Okay, what is the point of all this?
> > 
> > It lets us select nodes which are reasonably likely to be real. This is
> > very VERY useful for premix routing, although obviously there are
> > statistical issues; we need to create a cell amongst which any node is
> > equally likely to send a request through any other node, or there are
> > statistical attacks. But we can use this sort of reasoning to assure
> > integrity within the cell.
> > 
> > It is also useful for any collaborative network activities such as
> > estimating the size of the network, or the distribution of link lengths.

It also useful in a darknet if you are willing to let your node become a little
gray.  If two nodes find each other credible and the both need more connections
(what is the optimal number of connections for a 0.7 node?)  they could decide 
to swap connection info...  It would be interesting to see what simulations say 
would happen with this both with all node trustable and with a few that lie.

Ed Tomlinson

[Tech] Node credibility estimation

Reply via email to