assuming you use option 3) bookkeeper, the following is probably way too over-simplified, but that's the idea:
all writers write to Bookkeeper ledger, and each of your actual datastore nodes keeps reading the ledger, each record would be the serialized form of a DB write op, and when the ledger reader reads out the record, it deserializes it, and applies it to the datastore it has, for example, just a mysql, or bdb, or something like the LSM tree used by Cassandra (memtable+sstable). reads to the store directly go to the data store nodes themselves. would this work? that does not sound a lot of work On Wed, Jul 13, 2011 at 3:02 AM, Simon Felix <[email protected]> wrote: > Hello everyone > > What is the best way to build a distributed, shared storage system on top of > ZooKeeper? I'm talking about block storage in the terabyte-range (i.e. store > billions of 4k blocks). Consistency and Availability are important, as is > throughput (both read & write). I need at least 50 MB/s with 3 nodes with > two regular SATA drives each for my application. > > Some options I came up with: > 1. Use ZooKeeper directly as a data store (Not recommended according to the > docs - and it really leads to abysmally bad performance, I tested that) > 2. Use Cassandra as data store > 3. Use BookKeeper as write-ahead log and implement my own underlying store > 4. Use ZooKeeper to create my own (probably buggy...) data store > > What would you recommend? Are there other options? > > Cheers, > Simon >
