Hi Willem, Good question. CouchDB has a 100% copy-on-write storage engine, including for all updates to btree nodes, etc. so any updates to the database will necessarily increase the file size before compaction. Looking at your info I don’t see a heavy source of updates, so it is a little puzzling.
Adam > On May 2, 2019, at 12:53 PM, Willem Bison <wil...@nappkin.nl> wrote: > > Hi Adam, > > I ran "POST compact" on the DB mentioned in my post and 'disk_size' went > from 729884227 (yes, it had grown that much in 1 hour !?) to 1275480. > > Wow. > > I disabled compacting because I thought it was useless in our case since > the db's and the docs are so small. I do wonder how it is possible for a db > to grow so much when its being deleted several times a week. What is all > the 'air' ? > > On Thu, 2 May 2019 at 18:31, Adam Kocoloski <kocol...@apache.org> wrote: > >> Hi Willem, >> >> Compaction would certainly reduce your storage space. You have such a >> small number of documents in these databases that it would be a fast >> operation. Did you try it and run into issues? >> >> Changing cluster.q shouldn’t affect the overall storage consumption. >> >> Adam >> >>> On May 2, 2019, at 12:15 PM, Willem Bison <wil...@nappkin.nl> wrote: >>> >>> Hi, >>> >>> Our CouchDb 2.3.1 standalone server (AWS Ubuntu 18.04) is using a lot of >>> disk space, so much so that it regularly causes a disk full and a crash. >>> >>> The server contains approximately 100 databases each with a reported >>> (Fauxton) size of less than 2.5Mb and less than 250 docs. Yesterday the >>> 'shards' folders combined exceeded a total 14G causing the server to >> crash. >>> >>> The server is configured with >>> cluster.n = 1 and >>> cluster.q = 8 >>> because that was suggested during setup. >>> >>> When I write this the 'shards' folders look like this: >>> /var/lib/couchdb/shards# du -hs * >>> 869M 00000000-1fffffff >>> 1.4G 20000000-3fffffff >>> 207M 40000000-5fffffff >>> 620M 60000000-7fffffff >>> 446M 80000000-9fffffff >>> 458M a0000000-bfffffff >>> 400M c0000000-dfffffff >>> 549M e0000000-ffffffff >>> >>> One of the largest files is this: >>> curl localhost:5984/xxxxxxx_1590 >>> { >>> "db_name": "xxxxxxx_1590", >>> "purge_seq": >>> >> "0-g1AAAAFTeJzLYWBg4MhgTmEQTM4vTc5ISXIwNDLXMwBCwxygFFNSApBMqv___39WIgMedXksQJKhAUgBlc4nRu0DiFoC5iYpgOy3J9L-BRAz9-NXm8iQJE_YYgeQxfFEWnwAYvF9oNosADncXo4", >>> "update_seq": >>> >> "3132-g1AAAAFWeJzLYWBg4MhgTmEQTM4vTc5ISXIwNDLXMwBCwxygFFMiQ5L8____sxI18ChKUgCSSfYgdUkMDNw1-JQ6gJTGg42UxacuAaSuHqxOAo-6PBYgydAApIBK52clchNUuwCidn9Wog5BtQcgau9nJQoTVPsAohboXsksAJuwX9Y", >>> "sizes": { >>> "file": 595928643, >>> "external": 462778, >>> "active": 1393380 >>> }, >>> "other": { >>> "data_size": 462778 >>> }, >>> "doc_del_count": 0, >>> "doc_count": 74, >>> "disk_size": 595928643, >>> "disk_format_version": 7, >>> "data_size": 1393380, >>> "compact_running": false, >>> "cluster": { >>> "q": 8, >>> "n": 1, >>> "w": 1, >>> "r": 1 >>> }, >>> "instance_start_time": "0" >>> } >>> >>> curl localhost:5984/xxxxxxx_1590/_local_docs >>> {"total_rows":null,"offset":null,"rows":[ >>> >> {"id":"_local/189d9109518d1a2167b06ca9639af5f2ba16f0a5","key":"_local/189d9109518d1a2167b06ca9639af5f2ba16f0a5","value":{"rev":"0-3022"}}, >>> >> {"id":"_local/7b3e0d929201afcea44b237b5b3e86b35ff924c6","key":"_local/7b3e0d929201afcea44b237b5b3e86b35ff924c6","value":{"rev":"0-18"}}, >>> >> {"id":"_local/7da4a2aaebc84d01ba0e2906ac0fcb82d96bfe05","key":"_local/7da4a2aaebc84d01ba0e2906ac0fcb82d96bfe05","value":{"rev":"0-3749"}}, >>> >> {"id":"_local/9619b06f20d26b076e4060d050dc8e3bde878920","key":"_local/9619b06f20d26b076e4060d050dc8e3bde878920","value":{"rev":"0-172"}} >>> ]} >>> >>> Each database push/pull replicates with a small number of clients (< 10). >>> Most of the documents contain orders that are shortlived. We throw away >> all >>> db's 3 times a week as a brute force purge. >>> Compacting has been disabled because it takes too much cpu and was >>> considered useless in our case (small db's, purging). >>> >>> I read this: >>> https://github.com/apache/couchdb/issues/1621 >>> but I'm not sure how it helps me. >>> >>> These are my questions: >>> How is it possible that such a small db occupies so much space? >>> What can I do to reduce this? >>> Would changing 'cluster.q' have any effect or would the same amount of >>> bytes be used in less folders? (am I correct in assuming that cluster.q >>> 1 >>> is pointless in standalone configuration?) >>> >>> Thanks! >>> Willem >> >>