[Tech] Two level lookup was [[email protected]: Re: Permanently immature networks]

Matthew Toseland Wed, 8 Feb 2006 01:32:19 +0000

A radical idea (something which I hope we won't need for 0.7.0!):

Problems:
- It is likely that most nodes, on both open- and dark- networks, will
  go up and down on a regular basis.
- It is likely that this will cause *significant* routing flux.
- This may well cause major problems related to data migration.


Related issues:
- We will need passive requests in 0.8 if not in 0.7. These will allow
  streams, efficient polling of multiple outboxes, freesites, etc, and
  all manner of really cool things.
- It would be nice to be able to exploit external or high level caches
  of data, as the "permanence" suggestions...

So:
- As oskar suggests, we could have a two level lookup system. We have:
-- The local datastore. As now. This is checked first.
-- The local explicitly published content. Applications can register
blocks of data with Fred, which they can serve over FCP. These will then
be published to the network via conventional inserts.
-- A cache of passive requests. When we attempt a fetch of data from the
network, and it fails, we can leave passive requests. These form a chain
from the requestor to the optimal nodes for the data.
-- A cache of known data locations. When we fetch a datum from the
network successfully, we automatically subscribe to its location. We
remember where we got it from - not just the node immediately
downstream, but the ultimate location (or its proxy). We subscribe to
this; when that node purges the data from its cache, it will tell us
somewhere else to get it from, or it will tell us that it is no longer
available. Then, when we lose the data from our store, we still know
where to get it from. In order to get into the known data locations, the
data must have passed through our node at some point, in its entirety,
and validated correctly. We know enough to find the node: its identity
and its current location. We may keep several entries for each datum.
-- A cache of speculative data locations. These are the same as the last
item, except that they are less trusted, they come after, and they do
not require that the data was ever successfully published. In other
words, nodes can publish pointers without publishing the data. The catch
is, the published data is linked to a node identity. We can offer
various means of anonymizing these. But if you do not serve the data you
have promised, you will be discredited, as will your peers. If they are
sufficiently pissed off with you (or with your next hop if you are
spoofing many nodes), they will stop relaying your speculative
publishes.

So, in a typical request:
- We try the local datastore.
- We route the request.
- If it fails, we check the local explicitly published content.
- Then we check the known data locations. If we find it, we attempt to
  fetch it through that mechanism. If that fails, we punish the source
  node, and everyone involved in it.
- If that fails, we check the speculative data locations.
- If all of the above fail, we fail the request. If any of the above
  succeed, we cache the data.

So, what is the practical effect of this?

1. We can greatly ameliorate the problems with routing flux. The known
data locations system above will help a lot. We can automatically move
the pointers on a swap if we need to without expending massive amounts
of bandwidth, but it should migrate naturally without major problems.
2. We can find rare data much more easily.
3. We can do semi-permanence, transparent proxying and so on. (aka
inverse passive requests). Applications: backup, transparent proxying,
etc etc.
4. For really popular data we can automatically select the nearest node.
It is open for discussion whether this is a good idea; it implies for
example fetching the data immediately if we know where it is, which
means it won't go through the relevant specialized location. But we
could make it do so; we could fetch it from the nearest location and
then insert it to the best location, for example.
5. There must be some synergy between the passive request mechanism and
the known data locations mechanism. Hopefully they will reinforce each
other and be similar in many ways.

----- Forwarded message from Oskar Sandberg <oskar at freenetproject.org> -----

From: Oskar Sandberg <[email protected]>
To: Matthew Toseland <toad at amphibian.dyndns.org>
Subject: Re: Permanently immature networks

It could be that it is just a question of trying keep the rate 
sufficiently low that data has time to migrate. Or you may have to 
consider a two level lookup system where one first looks up a pointer to 
the data which can easily be updated when the node that has the data 
changes position. I don't know.

// oskar

Matthew Toseland wrote:
>Won't it result in it being impossible to find data? It might be
>possible to find locations, but the part of the network specialized in a
>given location changes from hour to hour and we therefore have little
>chance of finding data that is more than a few minutes old? Bloom filters
>can help to find data, but only within a certain radius...
>
>Maybe we can use Bloom filters in a different way to solve the problem.
>When we swap locations, we exchange filters. We keep the original swap
>path to fetch from the old node. These are only used as a last resort,
>after we get a DNF. That would rather increase the overheads of
>successful swaps though... the bloom filters might be rather large, with
>large stores and small keys... We could only include stuff near our
>location, but that won't help much, and in any case how close is close
>enough? We could shrink the filters by allowing say 25% false
>positives...
>
>On Fri, Feb 03, 2006 at 01:55:53PM +0100, Oskar Sandberg wrote:
>
>>I think that a permamentently immature network might not be so bad, 
>>because it makes abuse harder. In fact, I was going to suggest that 
>>nodes should drop their ID and pick a new one at random every so often.
>>
>>// oskar
>>
>>Matthew Toseland wrote:
>>
>>>How do we deal with nodes not generally being up 24x7? Won't this result
>>>in constantly fluctuating topology and therefore constantly fluctuating
>>>locations? Which is bad, because it trashes the datastores... we can
>>>maybe partially make up for this through Bloom filters...
>>>
>>>Another strategy might be to consider all connections, or all
>>>connections we've heard from in X period, when calculating whether a
>>>swap is worthwhile. This won't stop the imbalance we get from swaps not
>>>being routed through down nodes, but it might help?

----- End forwarded message -----
-- 
Matthew J Toseland - toad at amphibian.dyndns.org
Freenet Project Official Codemonkey - http://freenetproject.org/
ICTHUS - Nothing is impossible. Our Boss says so.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 189 bytes
Desc: Digital signature
URL: 
<https://emu.freenetproject.org/pipermail/tech/attachments/20060208/2065ec6f/attachment.pgp>

[Tech] Two level lookup was [[email protected]: Re: Permanently immature networks]

Reply via email to