Hi Willem,

Good question. CouchDB has a 100% copy-on-write storage engine, including for 
all updates to btree nodes, etc. so any updates to the database will 
necessarily increase the file size before compaction. Looking at your info I 
don’t see a heavy source of updates, so it is a little puzzling.

Adam


> On May 2, 2019, at 12:53 PM, Willem Bison <wil...@nappkin.nl> wrote:
> 
> Hi Adam,
> 
> I ran "POST compact" on the DB mentioned in my post and 'disk_size' went
> from 729884227 (yes, it had grown that much in 1 hour !?) to 1275480.
> 
> Wow.
> 
> I disabled compacting because I thought it was useless in our case since
> the db's and the docs are so small. I do wonder how it is possible for a db
> to grow so much when its being deleted several times a week. What is all
> the 'air' ?
> 
> On Thu, 2 May 2019 at 18:31, Adam Kocoloski <kocol...@apache.org> wrote:
> 
>> Hi Willem,
>> 
>> Compaction would certainly reduce your storage space. You have such a
>> small number of documents in these databases that it would be a fast
>> operation.  Did you try it and run into issues?
>> 
>> Changing cluster.q shouldn’t affect the overall storage consumption.
>> 
>> Adam
>> 
>>> On May 2, 2019, at 12:15 PM, Willem Bison <wil...@nappkin.nl> wrote:
>>> 
>>> Hi,
>>> 
>>> Our CouchDb 2.3.1 standalone server (AWS Ubuntu 18.04) is using a lot of
>>> disk space, so much so that it regularly causes a disk full and a crash.
>>> 
>>> The server contains approximately 100 databases each with a reported
>>> (Fauxton) size of less than 2.5Mb and less than 250 docs. Yesterday the
>>> 'shards' folders combined exceeded a total 14G causing the server to
>> crash.
>>> 
>>> The server is configured with
>>> cluster.n = 1 and
>>> cluster.q = 8
>>> because that was suggested during setup.
>>> 
>>> When I write this the 'shards' folders look like this:
>>> /var/lib/couchdb/shards# du -hs *
>>> 869M 00000000-1fffffff
>>> 1.4G 20000000-3fffffff
>>> 207M 40000000-5fffffff
>>> 620M 60000000-7fffffff
>>> 446M 80000000-9fffffff
>>> 458M a0000000-bfffffff
>>> 400M c0000000-dfffffff
>>> 549M e0000000-ffffffff
>>> 
>>> One of the largest files is this:
>>> curl localhost:5984/xxxxxxx_1590
>>> {
>>>   "db_name": "xxxxxxx_1590",
>>>   "purge_seq":
>>> 
>> "0-g1AAAAFTeJzLYWBg4MhgTmEQTM4vTc5ISXIwNDLXMwBCwxygFFNSApBMqv___39WIgMedXksQJKhAUgBlc4nRu0DiFoC5iYpgOy3J9L-BRAz9-NXm8iQJE_YYgeQxfFEWnwAYvF9oNosADncXo4",
>>>   "update_seq":
>>> 
>> "3132-g1AAAAFWeJzLYWBg4MhgTmEQTM4vTc5ISXIwNDLXMwBCwxygFFMiQ5L8____sxI18ChKUgCSSfYgdUkMDNw1-JQ6gJTGg42UxacuAaSuHqxOAo-6PBYgydAApIBK52clchNUuwCidn9Wog5BtQcgau9nJQoTVPsAohboXsksAJuwX9Y",
>>>   "sizes": {
>>>       "file": 595928643,
>>>       "external": 462778,
>>>       "active": 1393380
>>>   },
>>>   "other": {
>>>       "data_size": 462778
>>>   },
>>>   "doc_del_count": 0,
>>>   "doc_count": 74,
>>>   "disk_size": 595928643,
>>>   "disk_format_version": 7,
>>>   "data_size": 1393380,
>>>   "compact_running": false,
>>>   "cluster": {
>>>       "q": 8,
>>>       "n": 1,
>>>       "w": 1,
>>>       "r": 1
>>>   },
>>>   "instance_start_time": "0"
>>> }
>>> 
>>> curl localhost:5984/xxxxxxx_1590/_local_docs
>>> {"total_rows":null,"offset":null,"rows":[
>>> 
>> {"id":"_local/189d9109518d1a2167b06ca9639af5f2ba16f0a5","key":"_local/189d9109518d1a2167b06ca9639af5f2ba16f0a5","value":{"rev":"0-3022"}},
>>> 
>> {"id":"_local/7b3e0d929201afcea44b237b5b3e86b35ff924c6","key":"_local/7b3e0d929201afcea44b237b5b3e86b35ff924c6","value":{"rev":"0-18"}},
>>> 
>> {"id":"_local/7da4a2aaebc84d01ba0e2906ac0fcb82d96bfe05","key":"_local/7da4a2aaebc84d01ba0e2906ac0fcb82d96bfe05","value":{"rev":"0-3749"}},
>>> 
>> {"id":"_local/9619b06f20d26b076e4060d050dc8e3bde878920","key":"_local/9619b06f20d26b076e4060d050dc8e3bde878920","value":{"rev":"0-172"}}
>>> ]}
>>> 
>>> Each database push/pull replicates with a small number of clients (< 10).
>>> Most of the documents contain orders that are shortlived. We throw away
>> all
>>> db's 3 times a week as a brute force purge.
>>> Compacting has been disabled because it takes too much cpu and was
>>> considered useless in our case (small db's, purging).
>>> 
>>> I read this:
>>> https://github.com/apache/couchdb/issues/1621
>>> but I'm not sure how it helps me.
>>> 
>>> These are my questions:
>>> How is it possible that such a small db occupies so much space?
>>> What can I do to reduce this?
>>> Would changing 'cluster.q' have any effect or would the same amount of
>>> bytes be used in less folders? (am I correct in assuming that cluster.q
>>> 1
>>> is pointless in standalone configuration?)
>>> 
>>> Thanks!
>>> Willem
>> 
>> 

Reply via email to