[Tech] The free-floating web

Matthew Toseland Thu, 6 Oct 2005 16:44:37 +0100

On Wed, Oct 05, 2005 at 09:58:01PM +0100, Michael Rogers wrote:
> While we've got the Freenet and I2P people together on one list I'd like 
> to put forward an idea that's been growing in my brain for some time, as 
> the taxi driver says. The idea is hypertext with location-independent, 
> verifiable identifiers, which I call the free-floating web.


CHKs :)

But you need SSKs too, because otherwise you end up with unresolvable
circular dependancies in encoding, unless you don't mind it being a
strict hierarchy.

> It's not a 
> new idea by any means, but it would be useful if we could agree on a 
> standard so that files can easily migrate between different anonymous 
> and censorship-resistant networks.

I propose Freenet 0.7 CHKs and SSKs. These are 32kB long, and padded,
and use AES and SHA-256, and some PK algorithm I haven't decided on yet
(probably DSA).
> 
> Goals:
> pseudonymous and anonymous publishing

I.e. SSKs and CHKs.

> incremental verification

Not sure what you mean.

> plausible deniability

I.e. they are encrypted using a key which is kept in the URI.

> mutable files which publishers can update

Straightforward enough at this level.

> immutable files which publishers can't be forced to update
> 
> Tools:
> one-way hashes
> block ciphers
> public-key signatures
> 
> 1. Immutable files
> 
> Freenet's CHKs offer immutability and plausible deniability: a file is 
> encrypted with its hash, the hash of the encrypted file identifies it, 
> and the hash of the unencrypted file (which isn't revealed to relays) 
> can be used to decrypt it.

The problem is that freenet CHKs are for a block rather than for a file.
A block is only 32kB, after compression. For a file, your CHK points to
a file full of metadata on the keys to fetch to reassemble the original
file.
> 
> CHKs can be combined with hash trees to allow incremental verification 
> and parallel downloads. The file is encrypted with its hash, and the 
> encrypted blocks form the bottom layer of the hash tree. The network 
> representation of the file contains all the blocks of the hash tree in 
> depth-first order, so that each subtree occupies a contiguous range. For 
> example if the hash tree looks like this:
> 
>            A1
>     B1            B2
>  C1     C2     C3     C4
> D1 D2  D3 D4  D5 D6  D7 D8
> 
> then the network representation looks like this:
> 
> A1 B1 C1 D1 D2 C2 D3 D4 B2 C3 D5 D6 C4 D7 D8
> 
> The on-disk representation might be different - for example, the file 
> might be stored unencrypted in a shared folder, with the encryption key 
> and the rest of the hash tree stored in a separate metadata file 
> (convenience may be more important than plausible deniability for some 
> users).

Hrrrm. Well the are two ways to do this:
1. The CHK is the CHK of a list of sub-CHKs to fetch and reassemble
(traditional Freenet way).
2. The CHK can be resolved to a list of sub-CHKs. These are then fetched
and if, when combined, they produce the right data, then we have
success; if not, and the sub-blocks verify, we have to discredit the
manifest somehow.

The latter is what you are proposing, right? This would be difficult on
Freenet... unrequests have always seemed like a bad thing; oskar once
said they would go in over his dead body. On the other hand, if there is
state kept from the original request, and only that is affected, it
might be possible to do it safely...
> 
> The network representation is designed to allow parallel downloads - 
> each subtree can be requested and verified independently. Given the root 
> hash of the tree, the root hash of the subtree, and the starting and 
> ending offsets, a server can quickly find the requested blocks and if 
> necessary encrypt them as it reads them from disk. The client and relays 
> can verify each block as soon as it's received, using a hash from the 
> request message or from a previous block in the subtree.

You might want some redundancy. We use onion FEC codes (which are based
on Vandermonde and ultimately are perfectly space efficient Reed-Solomon
codes).
> 
> A hyperlink to an immutable document contains the root hash of the tree, 
> the hash of the unencrypted file, the hash function and the block 
> cipher. A request message contains the root hash, the hash function, and 
> optionally the subtree hash and the starting and ending offsets. Relays 
> and caches can verify the file but they can't decrypt it without the 
> hash of the unencrypted file.
> 
> 2. Mutable files
> 
> Mutable files can be implemented using public-key signatures. The 
> publisher creates a redirect block which contains:
> 
> 1. A file name, chosen by the publisher (each public key defines a 
> separate namespace)
> 2. A hyperlink to the latest (immutable) version of the file
> 
> Fields 1 and 2 are encrypted with a unique symmetric key to hide them 
> from relays. The following fields are unencrypted:
> 
> 3. A monotonically-increasing version number, which could be a timestamp
> 4. The publisher's public key
> 5. The signature function
> 6. A signature of fields 1-2 (encrypted) and 3-5 using the publisher's 
> private key
> 
> A hyperlink to a mutable document contains the hash of the public key, 
> the symmetric key, the hash function, the signature function, and 
> optionally the version number. A request message contains the hash of 
> the public key, the hash function, the signature function, and 
> optionally the minimum and maximum acceptable version numbers. Relays 
> and caches can verify the signature and version number, but they can't 
> read the file name or the hyperlink without the symmetric key.

Also you need to include the hash of the name (after encrypting it with
the symmetric key). And do you need to include the data needed for
decryption in the actual request? In the hyperlink, sure, but don't tell
the nodes/servers, or you lose plausible deniability.
> 
> 3. Bundles
> 
> When linking to a file, authors can choose between "hard linking" to a 
> specific (immutable) version and "soft linking" to a mutable redirect 
> block. To solve the problem of reference cycles, files that link to one 
> another can be collected into a "bundle", which contains a directory (or 
> manifest) that maps names onto hard links. Links between files in the 
> bundle use names instead of hard links, and the entire bundle can be 
> published as a single immutable file. (FIXME: hard links into bundles 
> must include a name.)

Right. These are manifests and ZIP file manifests. Which means, in the
first case, a big block of metadata that maps names to CHKs, and in the
latter case, a ZIP file with a load of files in it and some metadata
indicating content types (which are vital in an anonymous system IMHO).
> 
> 4. Spidering
> 
> From a single entry point, the entire web can be browsed without 
> needing to contact any specific server. This could be an advantage for 
> anonymity, because it prevents long-term intersection attacks.

How so?

> The 
> free-floating web can be spidered by search engines just like the world 
> wide web, which should help to address the problem of finding content.
> 
> 
> Any thoughts? I realise that Freenet does most of this already, but if 
> possible I'd like to come up with a standard that can be used by 
> multiple networks, and the transition from 0.5 to 0.7 seems like the 
> right time to break compatibility if necessary. Do we also need a 
> standard format for authenticated streams?

It's an interesting idea... Lets deal with the above first; streams are
very experimental.
> 
> Cheers,
> Michael
-- 
Matthew J Toseland - toad at amphibian.dyndns.org
Freenet Project Official Codemonkey - http://freenetproject.org/
ICTHUS - Nothing is impossible. Our Boss says so.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 189 bytes
Desc: Digital signature
URL: 
<https://emu.freenetproject.org/pipermail/tech/attachments/20051006/83177659/attachment.pgp>

[Tech] The free-floating web

Reply via email to