Some thoughts I had recently on the 0.7 datastore:
- Before we release 0.7.0, we will need a minor format change (the store
has a separate index for the header and data store parts; they should
have a unifed index, this will halve memory usage).
- There are some major scaling issues too: with 32kB blocks, a 20GB
datastore will use a significant amount of RAM. 20GB / 32kB / block =
655,360 blocks. If each uses 100 bytes of RAM, this would be 64MB of
RAM just for the datastore index! We will of course need to keep the
index on disk. We can keep a Bloom filter in RAM to indicate what we
do and do not have in cache. However, unless we split the datastore up
into one file per block as NativeFSDirectory, we will need some sort
of on-disk data structure mapping keys to block numbers. The other
option is to use some sort of GPL'd java database library. We would
only need an on-disk indexed (btree?) flatfile database; there must be
implementations out there. We also have to deal with the LRU.
- It would be nice to keep temp files, and client cache, in the datastore.
This would give freenet a fixed space usage, and eliminate
out-of-disk-space and too-many-open-files errors. It would be nice if
they were not identifiable as temp files by an attacker with a seized
datastore; this would eliminate all information that is currently
leaked by temp files (they are encrypted and padded but nonetheless
they leak significant info). It would also be nice if aforesaid attacker
could not trivially obtain a list of keys in the store, without having a
list of known keys already and doing some computational work. We can do
this:
- We have a salt value, S. This is initialized on creation of the
datastore, and is random, and is say 32 bytes.
When we want to fetch a key from the datastore, we compute:
D = H ( 0 + key + S )
K = H ( D + S )
We then fetch the file by K from the datastore, and decrypt it with
D. Either we leave validation to a higher layer (CHK verification
etc), or we store a hash of the intended on-disk data in another
slot.
We keep a hash of the (encrypted) data for the key in the
datastore in a corresponding slot.
When we want to make a temp file, we simply create a series of fake
keys. We keep these only in RAM. The on-disk structure is not
affected. We will have to keep these blocks from being reallocated,
but that can be done in RAM; we are using the file for something
anyway, so the small memory overhead is irrelevant.
Every so often we can swap around a random pair of blocks in order
to erase any information produced by the datastore's currently
highly predictable growth behaviour.
- At present, the datastore is a big file, where every block means
something; data is copied around as needed so there are no free
blocks. If the store is not full, and a block is added, we simply grow
the datastore by one block. The downside of this is that there is no
support for free blocks. If a block is deleted e.g. because it is
invalid, we have to move the data from the end of the file into its
place. And we cannot have the datastore initialized to a fixed size.
And while the datastore grows, keys are added in more or less
chronological order, which could give a lot of information away.
Strategies for dealing with these problems:
- Free block support. Keep a bitmap of free blocks. Use these first.
This is especially important if we use the datastore for temp files,
which is a good idea as explained above.
- Should we save the free blocks list to disk? Free blocks are
generally blocks which have been used for temp files and so on,
rather than real keys. We don't want to give this information away
to an attacker. Especially combined with chronological info. Which
leads us to the second point.
- Preallocation. We can give freenet a completely fixed space usage,
which would be useful for some users, by creating a full size empty
datastore on initialization. This is also advantageous in that there
is no chronological information leaked, as we can allocate blocks
randomly from a large empty store. And we don't need free blocks
support. If a block is no longer needed, simply demote it to the
back end of the LRU. We don't need to overwrite blocks freed from
usage by temp files as they were encrypted with temporary keys
anyway. On creation, we can write random data, with a valid hash,
for each block, and insert it at a random point in the LRU list.
- Random block swaps. We can to some degree eliminate chronological
information leakage just by swapping blocks around from time to
time. I don't know if this would work however, or if it is just
giving a false sense of security.
--
Matthew J Toseland - toad at amphibian.dyndns.org
Freenet Project Official Codemonkey - http://freenetproject.org/
ICTHUS - Nothing is impossible. Our Boss says so.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 189 bytes
Desc: Digital signature
URL:
<https://emu.freenetproject.org/pipermail/tech/attachments/20051112/198f966f/attachment.pgp>