Re: Rv: Why not BerkeleyDB based object store?

Kinkie Tue, 25 Nov 2008 14:15:33 -0800

On Tue, Nov 25, 2008 at 10:23 PM, Pablo Rosatti
<[EMAIL PROTECTED]> wrote:
> Amazon uses BerkeleyDB for several critical parts of its website. The Chicago 
> Mercatile Exchange uses BerkeleyDB for backup and recovery of its trading 
> database. And Google uses BerkeleyDB to process Gmail and Google user 
> accounts. Are you sure BerkeleyDB is not a good idea to replace the Squid 
> filesystems even COSS?


Squid3 uses a modular storage backend system, so you're more than
welcome to try to code it up and see how it compares.
Generally speaking, the needs of a data cache such as squid are very
different from those of a general-purpose backend storage.
Among the other key differences:
- the data in the cache has little or no value.
  it's important to know whether a file was corrupted, but it can
always be thrown away and fetched from the origin server at a
relatively low cost
- workload is mostly writes
  a well-tuned forward proxy will have a hit-rate of roughly 30%,
which means 3 writes for every read on average
- data is stored in incremental chunks

Given these characteristics, a long list of mechanisms database-like
systems have such as journaling, transactions etc. are a  waste of
resources.
COSS is explicitly designed to handle a workload of this kind. I would
not trust any valuable data to it, but it's about as fast as it gets
for a cache.

IMHO BDB might be much more useful as a metadata storage engine, as
those have a very different access pattern than a general-purpose
cache store.
But if I had any time to devote to this, my priority would be in
bringing 3.HEAD COSS up to speed with the work Adrian has done in 2.

-- 
    /kinkie

Re: Rv: Why not BerkeleyDB based object store?

Reply via email to