[Tech] Rethinking security, bursting, load management

Matthew Toseland Fri, 5 Jun 2009 15:20:32 +0100

The following is 0.9 stuff. 0.7.5 is a stabilisation release which should be 
out within weeks. 0.8's main features will be Freetalk, MHKs, Bloom filter 
sharing and related changes, resulting in significant gains to usability 
(Freetalk), data retention and speed. Also, bursting is generally more of an 
issue with faster connections, as are slowly being rolled out across the 
world's major cities, and are already common in a few countries. However, IMHO 
these issues are worth considering.

The issue of bursts has come up a few times. Our current load management
generally avoids sustained bursts because it is based on measuring the total
load on the network and guesstimating a safe speed at which to send requests,
using the average time taken to send a request and the probability of a request
being rejected or timing out. However, the network is heterogenous, conditions
in one place are not the same as in another, and users have differing views on
security. So perhaps we could improve performance with a new load management
scheme which adapts better to local conditions - most likely based on token
passing and queueing (for bulk requests, real time flagged requests would be
queued minimally or not at all). This would likely be more varied from place to
place, *and* more bursty from time to time.

Recently there has been a consensus that Freenet cannot safely burst (with the
exception of purely local traffic, CBR padded links, and invisible links such
as LAN connections, private wifi or sneakernet), because a burst is visible on
the network level - it's an explosion in traffic levels that fans out from the
originator, becoming less severe on each hop. If an attacker can see the
traffic levels, and also has a node close enough to identify the keys involved,
he can tell who originated the burst and what they are fetching (or inserting).
Thus we can never come close to Perfect Dark's speed, for example, because it
relies heavily on bursting (as well as having severe lower limits for bandwidth
and disk usage).

However, I am not convinced. I think we are trying to fly before we can walk
here, and in any case there is a fundamental flaw in the argument.

THE FUNDAMENTAL FLAW:
A powerful passive attacker (which is required for the above attacks) can see
the traffic flows. They cannot be easily disguised, even if we manage to
obscure individual requests (which is obviously essential!). We can make
downloads faster or we can make them take longer. If they are faster, then they
show up more obviously on the graph. If they are really slow then it is
conceivable that noise in the number of requests succeeding on the node may
cover for it. But on the other hand, it means the burst is over a longer
period, allows the attacker more time to try to move towards the originator (on
opennet), the whole network may be smaller and overall usage lower because of
behavioural effects, the attacker will see the same number of key-based
samples, and there is a greater chance of downtime during the period (which can
be unambiguously seen from traffic analysis). Intersection attacks are
possible: compare the time when a specific node is up and is receiving data
with the time when a specific request is on the network. If many nodes are
continually receiving data (not just requesting stuff that has fallen out),
then this attack is significantly harder, hence the traditional view that every
node will have a huge queue and be constantly downloading. The more nodes which
appear on traffic flows to be request sources, the more anonymity a requester
has, and the more nodes *nearby* which appear to be traffic sources, the more
anonymity he has against an attacker relatively nearby. On opennet,
mobile-attacker adaptive search is so powerful that bursts probably make very
little difference. On darknet, bursts may still be a genuine concern, if link
traffic levels are observable. Tunnels obviously will help, but are likely to
have a huge performance impact, and there will be big issues over securing them
against traffic analysis.

OPENNET:
Assume we are trying to trace a request or an insert with predictable content;
in the case of an insert with content not identifiable until after the data has
been announced, bursting may be something of a concern.

If an attacker is close enough to identify the data, whether or not he has
traffic data, he can get a rough bearing from the locations of the keys, and he
can move towards that location, in the standard adaptive search attack. He can
do this without being particularly powerful; this attack basically breaks
opennet IMHO. The attacker can be quite a long way away. Data from traffic
bursts does help, but the adaptive search attack is so much easier that it may
not be relevant. A purely passive attacker will find nothing out in any case;
nodes are needed on the network for all of these attacks. And limits on path
folding are set in terms of time as well as in terms of requests, so there may
be some security benefit to a transfer being faster and taking less time.

DARKNET:
Some darknet connections will be inherently invisible (private wifi links, LAN
connections, sneakernet) or unobservable (constant bitrate padding, hard stego
with rates determined purely by the transport e.g. faking a VoIP stream or
gaming session). On these links we can happily send as much traffic, local and
remote, as is physically possible, subject to concerns about nodes getting too
great a proportion of our traffic (which are less of a worry on darknet
anyway). For the rest, it is conceivable that severely limiting the incoming
bandwidth used by outgoing requests, so that it is obscured by variations in
request success, may help, but I am not convinced: the traffic levels will
still "point to" the originator, overall, it will be a weaker signal but over a
longer period. Also, we do not want darknet to be radically slower than
opennet, or nobody will use it: darknet must be at least as fast as opennet by
default (with enough connections), and we must allow users to add security at a
cost in performance if they need it, through steganographic and CBR transports,
and maybe tunnels.

TUNNELS:
I don't think there is much point turning tunnels on on opennet, for example
(Sybil is just way too easy), but really paranoid users can set MAXIMUM
security level and use tunnels (provided that it is possible to prevent tunnels
being obvious on traffic analysis!); of course this assumes that we actually
implement tunnels, which may be quite some time if it is a major performance
cost.

IMPLEMENTATION AND DOCUMENTATION CHANGES:
1. We need a new load management scheme, most likely some variant on token
passing: For bulk requests, we calculate our capacity for running requests, and
when we are able to accept some, we ask some of our peers to send us requests.
If they send too many we reject some. We queue requests waiting for an
opportunity to forward them to a good node (hence optimising routing), with an
eventual timeout if we don't manage to do so. Hence load propagates backwards,
because if a request doesn't move forward, its slot is not made available for a
new request. There are lots of parameters we can tune for example protection
against ubernodes, how much of our traffic are we happy with any individual
connection having, burstiness i.e. how much do we allow nodes which haven't
sent many requests recently to send more now, and reciprocity, i.e. favouring
nodes that have been useful to us.

2. In general, bursting is not a big problem for security, certainly not on
opennet. Load management should not specifically try to limit it unless on
darknet and the user has a high security level and/or has configured that they
are prepared to sacrifice significant amounts of performance to improve
security slightly.

3. Darknet users must (eventually) have the option for fully padded constant
bitrate connections, and connections using steganography in such a way that the
traffic flow levels are determined by the steganography (as in faking a VoIP
call), and of course by network conditions, but not by the level of traffic
that is actually available. Eventually we will need to devise means to use some
of the surplus bandwidth for exchanging data pre-emptively etc.

4. It must be easy for darknet users to indicate to the node that a connection
is unobservable, and to configure how much ubernode protection they need,
probably via per-peer trust levels.

5. We should encourage users to have data constantly downloading, but we need
to be realistic about this. More security comes mainly from more total
downloaders at any given time, which IMHO results from a bigger and faster
network.

6. Responses to offered keys, and data fetched from Bloom filters, when they
are purely local, should generally be exempt from all kinds of limiting,
because the only node able to see them is the one they are being fetched from.
On opennet, this may make it possible for the peer to guess that the data is
needed locally - but fetching the data over a longer period of time, or
fetching it from the broader network and ignoring the node which already has
it, will just increase our exposure - especially the latter means we are
potentially vulnerable to far more nodes.

7. Persistent passive requests are a good thing, and can introduce some
uncertainty into such calculations as #6.

8. Tunnels are a good thing, but given the likely performance cost not an
immediate priority.

9. Inserts of data which is not predictable by an attacker should be
encouraged, where this is reasonably possible. There are strong arguments that
the need to heal splitfiles is critical to good performance. Hopefully this
will be reduced when we have Bloom filter sharing. If an insert is
indistinguishable, there is a good chance of its remaining reasonably
anonymous, even on opennet; bursts are a threat, and can perhaps be covered
relatively easily. In traffic terms, an insert is similar to an answered
request, unless you can trace it across the network and show that it is longer
than a request. So at normal or higher, unless the user overrides a specific
setting, we should seriously consider severely limiting the bandwidth usage of
inserts. In practice, we already do this, by treating our inserts just like any
other requests from any other peer.

10. Favouritism: How much can we prefer our own requests to others, in terms of
what is feasible for the network as a whole, and in terms of what is safe? At
the moment, we don't favour our requests at all, except that we generally take
into account the fact that they won't generate any output usage (which can be a
big factor, since usually input bandwidth is way less than output bandwidth).
IMHO this is an open question. If we favour our own requests too much, opennet
peers will drop us, darknet peers will refuse to answer our requests by
reciprocity for us not answering theirs (assuming we implement this, which imho
is a good idea eventually), and peers we do have will know our requests are
local. IMHO the above reasoning does allow for us to use excess capacity for
our own requests, this is bursting; in other words, as a starting point, if
there are requests to send from other peers and local requests to send we treat
them equally (implying for example that we don't consider far more of our
queued requests than theirs), but if there are no requests to send and there is
the capacity to send some more, we can send our local requests, unless we are
on darknet, doing a non-identifiable insert, and are very worried about our
security.

DOCUMENTATION/ATTITUDE/GOALS CHANGES:

1. Opennet security sucks. Really, it does. It may be better than some of the
alternatives, but it isn't vastly better. But IMHO we have the potential to
achieve fairly interesting performance on opennet - not instant results, but
fairly good throughput and reachability for rarer content.
2. Unpredictable inserts can be reasonably secure even on opennet, provided
precautions are taken and the attacker isn't too powerful.
3. It is a good idea to have stuff downloading constantly, although downloading
stuff you don't need will slow down the network at large. Bloom filter changes
mean it won't be cached on your node, but it will be cached by reasonably
nearby nodes.
4. Invisible connections and unobservable connections are a good thing.
5. If an attacker is nearby, there is very little you can do either on darknet
or on opennet, unless we have tunnels. Freenet is really designed to protect
against a distant attacker, it is unlikely to work well if an attacker has
compromised a uniform 10% of the network - but that is rather difficult on
darknet.
6. The more/bigger content you request, the harder it is to protect you.
7. If an attacker is not nearby, you are on darknet, and the data to be traced
is identifiable (e.g. a large splitfile request or a reinsert), he will
eventually be able to find you, but that will involve compromising a long chain
of nodes either by electronic or social means.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 835 bytes
Desc: This is a digitally signed message part.
URL:
<https://emu.freenetproject.org/pipermail/tech/attachments/20090605/b3380bf4/attachment.pgp>

[Tech] Rethinking security, bursting, load management

Reply via email to