Re: _mutiple_ databases memory profile

Jan Lehnardt Tue, 16 Jun 2009 06:41:17 -0700


On 14 Jun 2009, at 23:03, Marcus Persson Lindqvist wrote:

Greetings all!
In short - what factors are involved in memory consumtion forcouchdb for a
large (x * 1000+) number of databases? Any hints welcome.


Each database requires a file handle and at least one Erlang process to
be open and used. Views add more file handles and Erlang processes.
Both file handles and processes are cheap (processes even more so than
file handles).

CouchDB has a max_open_dbs setting that controls the number of
databases that are open at any time. It is an LRU cache, so unused
databases drop out of that cache as new ones are opened. CouchDB
has been tested with ~1 000 000 databases in total and 20 000 open
databases at any time.

You may need to raise system limits to accommodate a large number
of file handles and you might want to increase the max_open_dbs
setting.

There is also a small write buffer for each db that gets flushed every

second. It's size and the flush-interval can be configured on a per-server

basis.

Cheers
Jan
--

I've recently starting to dig couchdb alot and are using it as primary
storage of a backend-type application to much success andrelaxation. Itreally saves a lot of pain not having to care much about the detailsof a
repository.
Now, however, my application is growing in data and I'm looking forsome
pointers of what to expect in terms of memory consumption (my primary
bottleneck).
The data is highly segmentized - I'm using about 4 different"classes" ofdocuments from X different "sources" (X is currently 200 but mightgrow upto 2000 or more), neither of which need to know about the others.Goingreduction of btrees and such, I figured I would use a separatedatabase for
each, yielding 800 DBs at the moment.
And kudos to couch for making it a breeze implementing, it wasreally nice
and smooth.
But now I'm starting to see some memory consumtion growth and I'mlookingfor pointers of how to think about this. What mechanisms actuallycosumesmemory? What should one avoid? Is it better to use fewer databasesfor this
point of view.
What would be a reasonable memory footprint and how does onecaclulate on
it? Currently it consumed about 300MB.
Each database is really just a pet store. I need to extractdocuments inorder. Thats it. I'm currently doing this with a simple view. (Arethere any"trivial" build-in way of getting documents i reversed insertion-order btw?)
And yeah, the load for most databases is really low, so insert/output
performance could be compromized for less memory consumtions.

Any hints, tips or experiences?

Marcus

Re: _mutiple_ databases memory profile

Reply via email to