Re: Storage Issues on 600,000 document insertion

Florian Westreicher Sat, 06 Feb 2016 06:45:05 -0800

Hello!

Just a random thought but would this also work by replicating the databaseand deleting the old one?That way the new database should stay available and not be bothered withcompaction. Of course the database then needs to be switched and thenreplicated once again to capture all the new changes




On February 6, 2016 00:06:08 Anik Das <[email protected]> wrote:

Thanks Dave,



That was a very comprehensive article. We are currently running compaction
nightly. That's helping for now.



Regards,

On Feb 5 2016, at 7:36 pm, Dave Cottlehuber &lt;[email protected]&gt; wrote:

On Tue, 2 Feb 2016, at 10:08 PM, Anik Das wrote:

&gt; Hello All,
&gt;
&gt; We were developing an application where we had to insert approximately
&gt; 600,000
&gt; documents into a database. The database had only one view (value emitted
&gt; as
&gt; null).
&gt;
&gt;
&gt;
&gt; It was not a batch insertion. After the insertion the database took up
&gt; 3.5GB
&gt; to our wonder. I googled around and did a compact query. After the
&gt; compact
&gt; query the size reduced to 350MB.
&gt;
&gt;
&gt;
&gt; I am new to couchdb and I'm unable to figure out what exactly is
&gt; happening/happened.
&gt;
&gt;
&gt;
&gt; Anik Das

Welcome Anik :-)

Some quick points:

\- we use a B tree in CouchDB

\- its append only
\- to find a doc we walk down the tree from the root node
\- the root node is always the last node in the .couch btree file
\- adding or updating a doc requires appending (in order) the doc, and
intermediary levels, and finally the new root node of the tree
\- thus a single doc update needs to rewrite at least 2 nodes, itself +
the new root
\- as the tree gets wider (more leaf node documents) it also grows slowly
and increases levels
\- this adds more intermediate nodes to be updated as we go along

<http://horicky.blogspot.co.at/2008/10/couchdb-implementation.html> is a

very nice but old picture of this.

You should always plan to compact after a big upload or replication, but

a couple of things will ease the pain:

\- use _bulk_docs (and do some testing for optimum chunk size)

\- upload docs in uuid order (don't rely on couch generated uuids)

both of these reduce the number of interim updates to the tree, the

first simply by only rewriting at the end of each bulk update, the the
last by adding data in sorted order, less intermediary nodes need
updating.

Most people run compaction through a cron job or similar out of hours

scheduling tool.

A+

Dave

Re: Storage Issues on 600,000 document insertion

Reply via email to