For a while now I have been planning a project for a distributed database 
that integrates every kind of data.  I have primarily
been focusing my effort on the database schema aspect of it, trying to find 
more sophisticated ways of structuring data than the
concepts of traditional relational databases offer.  I was not spending much 
time pursuing information and ideas about the
network infrastructure of the system; although, what I had in mind of that 
regard was something like Freenet.  Recently I
became familiar with the Freenet Project, and I am happy to see such a 
project in existence.  I share the same core values with
which it is being built.

Although, whereas the Freenet Project aims at creating a simple document 
storage and retrieval system, my project aims at
created a distributed database.  There are many things that the latter could 
do that the former cannot.  As a first example, you
can imagine the kind of database applications you use over the web - things 
like apartment finding, personal ads, restaurant and
entertainment info, etc..  The major way in which this project can be an 
advancement over the web is to, at a certain layer of
abstraction, disassociate data from its location on the network.  So when 
you want to find a particular piece of data, you simply
ask for that data.  You don't have to figure out what website to look on, or 
go through pages and pages of Google search
results, or deal with different websites which all have unique, unfamiliar 
user-interfaces.  As for inserting data, you simply insert
it and it will reach individuals interested in that kind of data.  At the 
layer of abstraction the common user will deal with on a
daily bases, the data is simply "on the network"; it is not at a particular 
location on the network.

As for e-commerce, companies can have their inventory in this distributed 
database freenet.  Orders can be put into the freenet
and propagated back to the companies via the general mechanisms which are 
part of the network infrastructure.  E-mail would
be defined via a simple database schema (To, From, Message Body, etc.) and 
propagate from sender to recipient via the same
general mechanisms of the network infrastructure.

By only a tiny stretch of my original idea, the project can become a more 
general case of a document storage and retrieval
system.  As a first approximation you could think of it as a multi-media 
database; that is, a database in which some of the
entries are files.  In this case, you can imagine the example of the data 
associated with a particular publication also including the
publication itself (e.g. a file in PDF format).

In finding how my project and the Freenet Project might come together, I 
came up with four "layers of abstraction" that
describe such a system.  It is by no means a complete description of an 
architecture.  However, depending on what other
people think of it, it may be a starting point for further discussion.

I briefly read about "emergent systems" in a biology text book and find it 
very interesting.  It seems to me that an "emergent
system" is not a physically existing thing; rather, it is a tool of 
description for painting a "big picture" of a very complex system.  I
see the five/seven layers of network protocols as employing the same general 
method of description.  Though, the latter case is
slightly different in that we are creating the system itself, not just a 
description of it.

--
The first layer I will talk about is the lowest level protocols that can be 
used to communicate via the internet; that is, IP, UDP,
and TCP.  This is the basic means available to us for implementing something 
more desirable and complex on top of.  If we had
the means of redefining this layer to suit the values of Freenet, that would 
be ideal.  But I will assume that we have not the
persuasive power to effect the make-up of this layer.

Layer 1 has some properties that we, as believers in the Freenet cause, 
don't like; namely, IP addresses and the potential for
eavesdroppers.  Layer 2 is here to deal with those properties.  Layer 2 
provides some corrective measures to layer 1 in order
to conceal the physical location of peers from each other, and to make an 
eavesdropper's success extremely unlikely.  Layer 2
exists as to be a precursor to layer 3.  That is, with the problems of 
anonymous message exchange and secure communication
channels being dealt with by layer 2, layer 3 can focus on its more 
high-level job.  So, the end result of creating layer 2 is a
more ideal version of the layer 1 network, on top of which we are in a 
better situation for creating layer 3.

The problem of creating layer 2 could basically be stated like this:  Create 
an environment in which messages can be passed
from sender to receiver such that neither party has knowledge of the other's 
physical location; additionally, there is little or no
chance of the success of an eavesdropper gaining information about who is 
communicating or the contents of the message being
sent.  From this arises the concept of a physical-location-independent 
identifier which machines use in order to talk to each
other.  (Note that a single machine could have more than one 
location-independent identifier if it wants.)

Above are the design goals of layer 2.  I am not an expert in the area, so I 
am hesitant to share the rough idea I have of how
they could be achieved.  But from what I read of anonymous remailers, it 
seems like you could use the same basic notions
here.  That is, a node on the network communicates via anonymous 
remailer-type things.  Certain configuration information
would have to be set for a networked node, such as what anonymous remailers 
are trustworthy for what types of data, and
reliableness.  So then the two participants in a message exchange each have 
a "line of defense": their list of anonymous
remailer-type things through which the message passes.  (The configuration 
information itself could be shared between users at
layer 4.)

--
Layer 3 is the distributed-database layer.  It provides the functionality 
for inserting, updating, and retrieving data distributed
across the network.  Its goal is to make data independent from physical 
location for the sake of layer 4, while managing the
complexities of a distributed-database.

Knowledge of where data on the network is located has to be maintained in 
some hierarchical fashion as it is in DNS.
Although, it will have to differ from DNS, which has "authoritative" sources 
of data.  Instead, data will be spread across the
network at different locations.  Inserts and updates of data will propagate 
to all those locations.  Requests for data may be
fulfilled by any of those locations.

Our goal is to design a system in which a request for a piece of data will 
result in the most up-to-date as possible data.  So, we
want routing which is (a) complete - all subscribed nodes are informed of 
inserts and updates, and (b) efficient - inserts and
updates reach all subscribed nodes as quickly as possible.

--
While layer 4 "knows" what data it wants to receive, or what it wants to 
insert or update, it doesn't itself "know" how to get it.
It passes those requests onto layer 3 as if it were a single database.  As a 
first approximation, you can imagine layer 4 passing
SQL-like commands to layer 3.


_________________________________________________________________
Get your FREE download of MSN Explorer at http://explorer.msn.com/intl.asp.


_______________________________________________
freenet-tech mailing list
[EMAIL PROTECTED]
http://lists.freenetproject.org/mailman/listinfo/tech

Reply via email to