While we've got the Freenet and I2P people together on one list I'd like
to put forward an idea that's been growing in my brain for some time, as
the taxi driver says. The idea is hypertext with location-independent,
verifiable identifiers, which I call the free-floating web. It's not a
new idea by any means, but it would be useful if we could agree on a
standard so that files can easily migrate between different anonymous
and censorship-resistant networks.
Goals:
pseudonymous and anonymous publishing
incremental verification
plausible deniability
mutable files which publishers can update
immutable files which publishers can't be forced to update
Tools:
one-way hashes
block ciphers
public-key signatures
1. Immutable files
Freenet's CHKs offer immutability and plausible deniability: a file is
encrypted with its hash, the hash of the encrypted file identifies it,
and the hash of the unencrypted file (which isn't revealed to relays)
can be used to decrypt it.
CHKs can be combined with hash trees to allow incremental verification
and parallel downloads. The file is encrypted with its hash, and the
encrypted blocks form the bottom layer of the hash tree. The network
representation of the file contains all the blocks of the hash tree in
depth-first order, so that each subtree occupies a contiguous range. For
example if the hash tree looks like this:
A1
B1 B2
C1 C2 C3 C4
D1 D2 D3 D4 D5 D6 D7 D8
then the network representation looks like this:
A1 B1 C1 D1 D2 C2 D3 D4 B2 C3 D5 D6 C4 D7 D8
The on-disk representation might be different - for example, the file
might be stored unencrypted in a shared folder, with the encryption key
and the rest of the hash tree stored in a separate metadata file
(convenience may be more important than plausible deniability for some
users).
The network representation is designed to allow parallel downloads -
each subtree can be requested and verified independently. Given the root
hash of the tree, the root hash of the subtree, and the starting and
ending offsets, a server can quickly find the requested blocks and if
necessary encrypt them as it reads them from disk. The client and relays
can verify each block as soon as it's received, using a hash from the
request message or from a previous block in the subtree.
A hyperlink to an immutable document contains the root hash of the tree,
the hash of the unencrypted file, the hash function and the block
cipher. A request message contains the root hash, the hash function, and
optionally the subtree hash and the starting and ending offsets. Relays
and caches can verify the file but they can't decrypt it without the
hash of the unencrypted file.
2. Mutable files
Mutable files can be implemented using public-key signatures. The
publisher creates a redirect block which contains:
1. A file name, chosen by the publisher (each public key defines a
separate namespace)
2. A hyperlink to the latest (immutable) version of the file
Fields 1 and 2 are encrypted with a unique symmetric key to hide them
from relays. The following fields are unencrypted:
3. A monotonically-increasing version number, which could be a timestamp
4. The publisher's public key
5. The signature function
6. A signature of fields 1-2 (encrypted) and 3-5 using the publisher's
private key
A hyperlink to a mutable document contains the hash of the public key,
the symmetric key, the hash function, the signature function, and
optionally the version number. A request message contains the hash of
the public key, the hash function, the signature function, and
optionally the minimum and maximum acceptable version numbers. Relays
and caches can verify the signature and version number, but they can't
read the file name or the hyperlink without the symmetric key.
3. Bundles
When linking to a file, authors can choose between "hard linking" to a
specific (immutable) version and "soft linking" to a mutable redirect
block. To solve the problem of reference cycles, files that link to one
another can be collected into a "bundle", which contains a directory (or
manifest) that maps names onto hard links. Links between files in the
bundle use names instead of hard links, and the entire bundle can be
published as a single immutable file. (FIXME: hard links into bundles
must include a name.)
4. Spidering
From a single entry point, the entire web can be browsed without
needing to contact any specific server. This could be an advantage for
anonymity, because it prevents long-term intersection attacks. The
free-floating web can be spidered by search engines just like the world
wide web, which should help to address the problem of finding content.
Any thoughts? I realise that Freenet does most of this already, but if
possible I'd like to come up with a standard that can be used by
multiple networks, and the transition from 0.5 to 0.7 seems like the
right time to break compatibility if necessary. Do we also need a
standard format for authenticated streams?
Cheers,
Michael