I'm part of the core Federal Learning Registry dev team 
[http://www.learningregistry.org], and we're using CouchDB to store and 
replicate contents of the registry within our network.

One of the questions that has come up as we are starting to make plans for our 
initial production release is the scalability strategy of CouchDB?  We expect 
long term, we are going to have an enormous amount of data from activity 
streams and metadata inserted into the network, and I'd like to have an idea 
what we need to work towards now so theres no big surprise when we start 
getting close to hitting some limits.

As part of our infrastructure strategy - we've chosen Amazon Web Services EC2 & 
EBS as our hosting provider for the first rollout.  EBS currently has an upper 
limit of 1TB per volume, other cloud or non-cloud solutions may have similar or 
different limitations, however I'm only concerned right now with how we might 
deal with this on EC2 and EBS.
1. Are there CouchDB limits that we are going to run into before we hit 1TB?
2. Is there a strategy to for disk spanning to go beyond the 1TB limit by 
incorporating multiple volumes or do we need to leverage a solution like 
BigCouch which seems to require us to spin up multiple CouchDB's and do some 
sort of sharding/partitioning of data?  I'm curious on how queries that span 
shards/partitions works or if this is transparent.

Thanks,

- Jim


Jim Klo
Senior Software Engineer
Center for Software Engineering
SRI International




Reply via email to