[Tech] Theoretical Master Thesis on sybil attacks on freenet

Thomas Bruderer Fri, 21 Sep 2007 19:33:04 +0000 (UTC)

12345678901234567890123456789012345678901234567890123456789012345678901234567890
I'll go through your thoughts point by point.


> '    The calculation of the routing table has changed several times, two of 
> them were major changes in the protocol. After the initial Paper the first 
> big change was called the "Next Generation Routing" [7] and the last big step 
> was the introduction of a small world topology. [1] This newest attempt was 
> the creation of a p2p network which relies on personal contacts of people.'
>
> Not true. We always relied on a small world topology: The network is only 
> navigable if it is a small world network. However, the original Freenet 
> concept was purely opennet, and therefore relied to a very large degree 
> on "destination sampling" (connect to the data source when you get a 
> successful request). We understand this rather better now than we did thanks 
> to Oskar's recent work.
OK, good to know. Then only the routing table calcualtion has changed, which is
also a change in routing. :) And yes the work of Oskar is a real big step
forward. Thats a very good base now, where the original paper had no theoretical
base at all. This will definitly help, also to prove more, which I tried, but I
was limited to very easy cases.
>
> With regards to random routing for a few steps ... unfortunately it doesn't 
> solve the problem (it's been proposed lots of times of course):
> - You can only do it for a few hops before getting into real routing. 
> Therefore your anonymity set is only those nodes which could have random 
> routed to you. And it can be a fairly uneven set.
> - Much worse: If two requests can be correlated by e.g. both being part of 
> the 
> same splitfile, both being post inserts from the same frost ID etc ... then 
> you can do *nasty* statistical attacks based on the proportion of that known 
> quantity that are routed through the attacker's node.
I definitly wouldn't support a random routing, but in a ring It was very easy to
proof that  information theoretical anonymity.
>
> Re section 4.2:
> - Are the "1" occurrences leaf nodes ?
> - What do the "2" and "3"'s look like topologically speaking?
I havent really looked into detail, but the way I counstructed it, there are no
nodes with low degrees at all. This was a highly connected graph. So to speak I
can generally say nothing about this structures, but I might look into it, the
simulation was very simple and such ideas can be checked fast. However I think
it will be difficult to see similarites between the cases. 

I node which sends two messages which are received by two sibyl nodes (not
necessarily different one) could be pinpointed basically to the intersection of
those two source sets. This can be near one or the other node, or far away. 

>
> What exactly sucks about the tree? It would be helpful to understand this so 
> we can advise people not to abuse darknet by setting up such structures (e.g. 
> mirroring external authority structures). Clearly if you get a request from 
> below you in the tree, you can identify that the node is within that 
> subtree - is that the totality of the dangers of a tree topology?
A tree structure does not suck, in fact it does prevent the correlation attacks
to do any more harm than a single message. Beside the point you mentioned I dont
see any problem for tree structures. (well beside the other obvious fact that it
happens easily that a tree gets disconnected)
>
> Theorem 7: Cool! On the circle, if the source is just below the left 
> attacker, 
> the left attacker will get requests which get very close to the exact 
> opposite of the request sender's own location. And you have the opposite with 
> the right hand attacker. The more blocks, the closer the requests will get to 
> the exact opposite point of the sender on the ring, and the more accurately 
> you can identify the source...
>
> Basically the operation is this (assuming pure greedy routing and no overload 
> etc):
> We have a request for location T. Our location is L. We want to find S, the 
> location of the source node. We can reasonably conclude that S is between T' 
> (exact opposite of T) and L (our location), within the shortest arc linking 
> the two points. If there are enough blocks, enough different T's, the point 
> at which we stop getting requests tells us where the source is on the circle.
>
> However:
> - Routing may not be purely greedy, because of e.g. backoff.

yes you are right and all randomization effects are not accounted in this paper,
the simulation did also a (perfect) shortest path routing and additionally the
complete routing table was known by the attacker. well in the ring its obvious
that you know the routing table, and this abstraction was used for general
graphs aswell.
> - The topology may not be close enough to a ring for this to work anyway. Of 
> course there's a question as to whether the network will work at all in that 
> case.
I did it on general graphs, and Its worse for them since I cannot only pinpoint
two directions. Lets say I have 10 edges I can at least divide the complete
network in 10 parts (possibly equally in size) also the source of the message
can send into 10 directions, and only a few of those starting directions will go
through my sibyl nodes. For single messages this gives not much. I done a step
by step analysis. The first message which was captured normally said nothing
more than Its in a group of 150 nodes. After another node captured another
message from this same file, often it reduced the setsize significantly since
the routing for another message was so completly different that the intersection
of those two sets gave a much smaller set normally in the region of 30 - 60 
nodes.

> - Routing may be deliberately obscured by e.g. a few random hops from time to 
> time.
yes.
> - Routing will not always stop when it reaches the closest node to the 
> target. 
> Probe requests do, but real requests and inserts have an HTL limit, currently 
> continuing for 10 hops after the last improvement in the closeness to the 
> target.
longer routes doesnt effect the mesurement at all, I throw the messages away (in
my simulation) after the first sibyl node captured a message. If you do longer
routes, it only affects it that potential sibyl node (which wasnt in range in
the short route) can see the message, and if the routing is known it will give
you still the source set, which will be bigger indeed.

In reality since there are random effects the attack should work with a set of
probabilities of those possible source nodes, probably throw away nodes with
less than 1%, and in the end a case I referenced to a setsize of 1 will look in
such a case like -> (88% 5% 4% 1% 1% 1% [likelyhood to be source] )
> - If the source is a long way away, we will only receive a small fraction of 
> the requests it sends. How much does this limit accuracy? If you're too far 
> away you may not get any requests, or not enough to connect them together.
The likelihood is smaller, and if you only get one message it is a big set ( at
least number of hops ) but if you get two differently routed message from a far
node, it will be identified obviously easily, since the intersection of such
sets are probably small. However this is only a feeling. And random effects do
affect long routes more than small ones, so in reality your assertion is
probably true. In a a perfect routing I would say the accurcy would be 
increased.


> - Finding the target's rough location doesn't instantly find the target. An 
> attacker would probably try to narrow down the search by connecting to nodes 
> closer to the location in question.
yes. And remember I have global knowledge. So I can pinpoint also nodes which I
am not directly connceted. but If I could say:

between Adresse 0.73435445 and 0.73437930 is a node which sent file A. Now what?
This doesnt even give me the IP! So I need to go near this node in any case.
> - It doesn't work on popular files which multiple requestors may be fetching 
> simultaneously
short routes mean only nearby sibyl nodes get information. But it still works
since the routing will be the same, but it will stop earlier, therefore less
information.

Well this are mostly my thoughs not based by much facts, but Its based on what I
have seen and searched for.
>
> It would be fascinating to do a more accurate attack simulation! How much 
> information can be gained about a requestor, and to what degree is this 
> constrained by the above factors? (Introducing them one at a time, of 
> course...)
>
> Another attack you can do, if you are close to the target, is take into 
> account the exact local topology and use the proportion of requests coming to 
> your node to identify which node it is.
>
> "Beware of high degree" is an interesting message. I'm not quite sure how you 
> support it. Generally we've assumed that reasonably high degree is a good 
> thing - certainly it will *significantly* reduce the number of hops needed to 
> find data, and it's quite helpful in getting small-world-like networks... 
> (Too few connections and you probably won't have the occasional long range 
> one). Are you saying high average degree is a problem, or that individual 
> nodes with degree far above the average (aka ubernodes) are a problem?
>
well its based on some observations:

one-connected-topologies dont give away more information even if multiple
messages are dependant
two-conncted-topologies give already away  more information
and the three-conncted graph I simulated was very intersting  since correlation
was very intersting (if a message was received)
and if I think further, a full graph is practically a p2p network without
anonymisation, since you request the most likely one directly.

I think in a network with higher degree it will be easier to do correlation upon
this facts. But there are no hard proves. To theoretically analyze a three
connected graph would be so interesting, but so difficult, I dont even have an
idea how to do it.

There is a second effect: higher degrees means the routes get shorter. If you
have not many sibyl nodes, short routes will mean you dont intercept many. But
if you intercept one, the anonymity set is very small. (thats the old good
routing versus anonymity problem)

ubernodes are more attackable in this sense, since they can route to so many
nodes, they will have shorter routes and the set sizes drop, and even if no
sibyl node is near. two intercepted messags will reveal quite a lot about such
big nodes.. Its not an average problem, well maybe too, but its also a problem
for the individual. But these are the effects which I havent understood
completly. But probably the most intersting ones.

> As far as dependant vs independant messages go ... we have a few options:
> - Bundle all dependant messages into huge files and transfer them all at once 
> in a single request. This isn't very practical: transferring a 1GB file with 
> no multi-sourcing is going to take roughly forever. Having variable block 
> sizes up to some large limit might help a bit, but would suck for various 
> network/practical/code reasons, as it did on 0.5, and it wouldn't solve the 
> problem, only make correlation attacks a bit harder.
> - Bundle dependant requests together, and route them as a blob, keeping them 
> bundled for a number of random hops. This has performance issues, relies on 
> tunnels which may break if any node goes offline, doesn't work well with long 
> lived requests ...
> - Premix routing. Establish an explicit anonymity set of most of the nodes 
> within some number of hops, and use the darknet to work out which nodes 
> are "real" versus which are bogus Sybil's, or try to plug in a network-wide 
> mixnet layer like I2P for opennet.
> - Just live with it: you can identify your neighbours' traffic. This is 
> plainly unacceptable.
The last one isnt as unaccaptle in the friends network, but its certainly a big
problem for an opennet. And yes it is very difficult.

What I learnt from the thesis is definitly that anonymity is very relative, and
its more difficult than it often looks at first sight.

It was interesting to hunt some of those questions down, especially there are
not always such intuitive.

Sorry for the lengthy answer, but I had to describe the answers in detail
because some of those details are not mentioned at all in the paper, but there
is a good reason for it, because its not well understood from me. I could've
answered I dont know on some of them, but I think you are also interested in the
ideas which are not fully understood, and might be elaborated in the future.

Thanks for your questions, it was a very good input, and I hope it gives me the
motivation to do really redo the paper. I'll meet my prof this month, so he can
tell me what is that worse on the paper (not the ideas, but the writing) that I
can improve it  a lot.

Greets
Thomas

[Tech] Theoretical Master Thesis on sybil attacks on freenet

Reply via email to