A radical idea (something which I hope we won't need for 0.7.0!): Problems: - It is likely that most nodes, on both open- and dark- networks, will go up and down on a regular basis. - It is likely that this will cause *significant* routing flux. - This may well cause major problems related to data migration.
Related issues: - We will need passive requests in 0.8 if not in 0.7. These will allow streams, efficient polling of multiple outboxes, freesites, etc, and all manner of really cool things. - It would be nice to be able to exploit external or high level caches of data, as the "permanence" suggestions... So: - As oskar suggests, we could have a two level lookup system. We have: -- The local datastore. As now. This is checked first. -- The local explicitly published content. Applications can register blocks of data with Fred, which they can serve over FCP. These will then be published to the network via conventional inserts. -- A cache of passive requests. When we attempt a fetch of data from the network, and it fails, we can leave passive requests. These form a chain from the requestor to the optimal nodes for the data. -- A cache of known data locations. When we fetch a datum from the network successfully, we automatically subscribe to its location. We remember where we got it from - not just the node immediately downstream, but the ultimate location (or its proxy). We subscribe to this; when that node purges the data from its cache, it will tell us somewhere else to get it from, or it will tell us that it is no longer available. Then, when we lose the data from our store, we still know where to get it from. In order to get into the known data locations, the data must have passed through our node at some point, in its entirety, and validated correctly. We know enough to find the node: its identity and its current location. We may keep several entries for each datum. -- A cache of speculative data locations. These are the same as the last item, except that they are less trusted, they come after, and they do not require that the data was ever successfully published. In other words, nodes can publish pointers without publishing the data. The catch is, the published data is linked to a node identity. We can offer various means of anonymizing these. But if you do not serve the data you have promised, you will be discredited, as will your peers. If they are sufficiently pissed off with you (or with your next hop if you are spoofing many nodes), they will stop relaying your speculative publishes. So, in a typical request: - We try the local datastore. - We route the request. - If it fails, we check the local explicitly published content. - Then we check the known data locations. If we find it, we attempt to fetch it through that mechanism. If that fails, we punish the source node, and everyone involved in it. - If that fails, we check the speculative data locations. - If all of the above fail, we fail the request. If any of the above succeed, we cache the data. So, what is the practical effect of this? 1. We can greatly ameliorate the problems with routing flux. The known data locations system above will help a lot. We can automatically move the pointers on a swap if we need to without expending massive amounts of bandwidth, but it should migrate naturally without major problems. 2. We can find rare data much more easily. 3. We can do semi-permanence, transparent proxying and so on. (aka inverse passive requests). Applications: backup, transparent proxying, etc etc. 4. For really popular data we can automatically select the nearest node. It is open for discussion whether this is a good idea; it implies for example fetching the data immediately if we know where it is, which means it won't go through the relevant specialized location. But we could make it do so; we could fetch it from the nearest location and then insert it to the best location, for example. 5. There must be some synergy between the passive request mechanism and the known data locations mechanism. Hopefully they will reinforce each other and be similar in many ways. ----- Forwarded message from Oskar Sandberg <oskar at freenetproject.org> ----- From: Oskar Sandberg <[email protected]> To: Matthew Toseland <toad at amphibian.dyndns.org> Subject: Re: Permanently immature networks It could be that it is just a question of trying keep the rate sufficiently low that data has time to migrate. Or you may have to consider a two level lookup system where one first looks up a pointer to the data which can easily be updated when the node that has the data changes position. I don't know. // oskar Matthew Toseland wrote: >Won't it result in it being impossible to find data? It might be >possible to find locations, but the part of the network specialized in a >given location changes from hour to hour and we therefore have little >chance of finding data that is more than a few minutes old? Bloom filters >can help to find data, but only within a certain radius... > >Maybe we can use Bloom filters in a different way to solve the problem. >When we swap locations, we exchange filters. We keep the original swap >path to fetch from the old node. These are only used as a last resort, >after we get a DNF. That would rather increase the overheads of >successful swaps though... the bloom filters might be rather large, with >large stores and small keys... We could only include stuff near our >location, but that won't help much, and in any case how close is close >enough? We could shrink the filters by allowing say 25% false >positives... > >On Fri, Feb 03, 2006 at 01:55:53PM +0100, Oskar Sandberg wrote: > >>I think that a permamentently immature network might not be so bad, >>because it makes abuse harder. In fact, I was going to suggest that >>nodes should drop their ID and pick a new one at random every so often. >> >>// oskar >> >>Matthew Toseland wrote: >> >>>How do we deal with nodes not generally being up 24x7? Won't this result >>>in constantly fluctuating topology and therefore constantly fluctuating >>>locations? Which is bad, because it trashes the datastores... we can >>>maybe partially make up for this through Bloom filters... >>> >>>Another strategy might be to consider all connections, or all >>>connections we've heard from in X period, when calculating whether a >>>swap is worthwhile. This won't stop the imbalance we get from swaps not >>>being routed through down nodes, but it might help? ----- End forwarded message ----- -- Matthew J Toseland - toad at amphibian.dyndns.org Freenet Project Official Codemonkey - http://freenetproject.org/ ICTHUS - Nothing is impossible. Our Boss says so. -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 189 bytes Desc: Digital signature URL: <https://emu.freenetproject.org/pipermail/tech/attachments/20060208/2065ec6f/attachment.pgp>
