Some thoughts I had recently on the 0.7 datastore:
- Before we release 0.7.0, we will need a minor format change (the store
  has a separate index for the header and data store parts; they should
  have a unifed index, this will halve memory usage).
- There are some major scaling issues too: with 32kB blocks, a 20GB
  datastore will use a significant amount of RAM. 20GB / 32kB / block =
  655,360 blocks. If each uses 100 bytes of RAM, this would be 64MB of
  RAM just for the datastore index! We will of course need to keep the
  index on disk. We can keep a Bloom filter in RAM to indicate what we
  do and do not have in cache. However, unless we split the datastore up
  into one file per block as NativeFSDirectory, we will need some sort
  of on-disk data structure mapping keys to block numbers. The other
  option is to use some sort of GPL'd java database library. We would
  only need an on-disk indexed (btree?) flatfile database; there must be
  implementations out there. We also have to deal with the LRU.
- It would be nice to keep temp files, and client cache, in the datastore.
  This would give freenet a fixed space usage, and eliminate
  out-of-disk-space and too-many-open-files errors. It would be nice if
  they were not identifiable as temp files by an attacker with a seized
  datastore; this would eliminate all information that is currently
  leaked by temp files (they are encrypted and padded but nonetheless
  they leak significant info). It would also be nice if aforesaid attacker
  could not trivially obtain a list of keys in the store, without having a
  list of known keys already and doing some computational work. We can do
  this:
  - We have a salt value, S. This is initialized on creation of the
    datastore, and is random, and is say 32 bytes.

    When we want to fetch a key from the datastore, we compute:
    D = H ( 0 + key + S )
    K = H ( D + S )

    We then fetch the file by K from the datastore, and decrypt it with
    D. Either we leave validation to a higher layer (CHK verification
    etc), or we store a hash of the intended on-disk data in another
    slot.

    We keep a hash of the (encrypted) data for the key in the
    datastore in a corresponding slot.

    When we want to make a temp file, we simply create a series of fake
    keys. We keep these only in RAM. The on-disk structure is not
    affected. We will have to keep these blocks from being reallocated,
    but that can be done in RAM; we are using the file for something
    anyway, so the small memory overhead is irrelevant.

    Every so often we can swap around a random pair of blocks in order
    to erase any information produced by the datastore's currently
    highly predictable growth behaviour.
- At present, the datastore is a big file, where every block means
  something; data is copied around as needed so there are no free
  blocks. If the store is not full, and a block is added, we simply grow
  the datastore by one block. The downside of this is that there is no
  support for free blocks. If a block is deleted e.g. because it is
  invalid, we have to move the data from the end of the file into its
  place. And we cannot have the datastore initialized to a fixed size.
  And while the datastore grows, keys are added in more or less
  chronological order, which could give a lot of information away.
  Strategies for dealing with these problems:
  - Free block support. Keep a bitmap of free blocks. Use these first.
    This is especially important if we use the datastore for temp files,
    which is a good idea as explained above.
    - Should we save the free blocks list to disk? Free blocks are
      generally blocks which have been used for temp files and so on,
      rather than real keys. We don't want to give this information away
      to an attacker. Especially combined with chronological info. Which
      leads us to the second point.
  - Preallocation. We can give freenet a completely fixed space usage,
    which would be useful for some users, by creating a full size empty
    datastore on initialization. This is also advantageous in that there
    is no chronological information leaked, as we can allocate blocks
    randomly from a large empty store. And we don't need free blocks
    support. If a block is no longer needed, simply demote it to the
    back end of the LRU. We don't need to overwrite blocks freed from
    usage by temp files as they were encrypted with temporary keys
    anyway. On creation, we can write random data, with a valid hash,
    for each block, and insert it at a random point in the LRU list.
  - Random block swaps. We can to some degree eliminate chronological
    information leakage just by swapping blocks around from time to
    time. I don't know if this would work however, or if it is just
    giving a false sense of security.
-- 
Matthew J Toseland - toad at amphibian.dyndns.org
Freenet Project Official Codemonkey - http://freenetproject.org/
ICTHUS - Nothing is impossible. Our Boss says so.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 189 bytes
Desc: Digital signature
URL: 
<https://emu.freenetproject.org/pipermail/tech/attachments/20051112/198f966f/attachment.pgp>

Reply via email to