With the following settings, the size came down to 14GB checkpoint_after = 524288000 doc_buffer_size = 52428800
{ "db_name": "database2", "doc_count": 12986513, "doc_del_count": 0, "update_seq": 12986513, "purge_seq": 0, "compact_running": false, "disk_size": 15156363386, "data_size": 8034864363, "instance_start_time": "1422213492581804", "disk_format_version": 6, "committed_update_seq": 12986513 } ratio disksize/datasize = 1.88 When I further increase the setting by a factor of 10, the size reduces to 8.9G checkpoint_after = 5242880000 doc_buffer_size = 524288000 { "db_name": "database2", "doc_count": 12986513, "doc_del_count": 0, "update_seq": 12986513, "purge_seq": 0, "compact_running": false, "disk_size": 9518403706, "data_size": 8027906267, "instance_start_time": "1422302560319114", "disk_format_version": 6, "committed_update_seq": 12986513 } ratio disksize/datasize = 1.18 My other database got even better with a ratio of 1.00052 { "db_name": "database1", "doc_count": 13337224, "doc_del_count": 0, "update_seq": 13337224, "purge_seq": 0, "compact_running": false, "disk_size": 6897811578, "data_size": 6894215476, "instance_start_time": "1422302561058460", "disk_format_version": 6, "committed_update_seq": 13337224 } Q: how does one determine the optimum checkpoint_after and doc_buffer_size? If this is dependent on the document size, then this value should be configurable per database. -Sharath On Mon, Jan 26, 2015 at 5:31 AM, Sharath <sharat...@gmail.com> wrote: > Thanks - I've set the following values: > checkpoint_after = 524288000 > doc_buffer_size = 52428800 > > and started the compact process. Have to wait for a bit. > > -Sharath > > On Sun, Jan 25, 2015 at 5:59 PM, Alexander Shorin <kxe...@gmail.com> > wrote: > >> Ok, so far, this looks exactly what I have for my hashes databases: >> >> data_size: 557537537 >> disk_size: 1542664311 >> doc_count: 1298255 >> doc_del_count: 18 >> avg doc size: ~350 bytes >> >> While there is 3 times disk_size/data_size ratio, this database >> uncompactiable: CouchDB isn't able to get it to 500MB size, leaving it >> at 1.5GB. This looks like a some "specifics" of underlying database >> format which isn't able to rationale allocate huge amount of tiny >> documents....But, CouchDB provides two interesting options to >> configure database compaction: doc_buffer_size and checkpoint_after. >> >> http://docs.couchdb.org/en/latest/config/compaction.html#database_compaction >> >> By default they are have the following values: >> >> checkpoint_after = 5242880 >> doc_buffer_size = 524288 >> >> And this makes my hashes database to stop at 1.5GB point. If I >> multiple them both by 10, after compaction database size will be >> ~900MB - yay! If I do this again with the resulting config: >> >> checkpoint_after = 524288000 >> doc_buffer_size = 52428800 >> >> Then database sizes will be much more better: >> >> disk_size: 633688183 >> data_size: 556759808 >> >> Almost no overhead! Why this happens? Paul or Robert may correct me, >> but it seems that the most of wasted space after compaction is >> consumed by checkpoint headers and btree rebalance. Asking CouchDB to >> make compaction checkpoints rarely and use bigger buffer for docs >> allows it to build the resulting btree in the new database file in >> more optimized way. As the downsize of such configuration, if your >> compaction fails, it have to start from far and bigger buffer size >> requires more memory to use. >> >> Try to play with these options and see how they will affect on your >> databases. >> >> P.S. This issue is eventually solved for upcoming 2.0 with default config. >> -- >> ,,,^..^,,, >> >> >> On Sun, Jan 25, 2015 at 9:52 AM, Sharath <sharat...@gmail.com> wrote: >> > yes the databases were recently compacted - both the databases run as >> > insert only (no deletion for either). >> > database2 completed compaction about 4 hours ago and I've triggered >> > compaction again (so what you see below for database2 could be >> misleading) >> > >> > database1: >> > { >> > "db_name":"database1", >> > "doc_count":13337224, >> > "doc_del_count":0, >> > "update_seq":13337224, >> > "purge_seq":0, >> > "compact_running":false, >> > "disk_size":8574615674, >> > "data_size":6896805847, >> > "instance_start_time":"1422157234994080", >> > "disk_format_version":6, >> > "committed_update_seq":13337224 >> > } >> > >> > database2: >> > { >> > "db_name":"database2", >> > "doc_count":12982621, >> > "doc_del_count":0, >> > "update_seq":12982621, >> > "purge_seq":0, >> > "compact_running":true, >> > "disk_size":31587352698, >> > "data_size":8026729752, >> > "instance_start_time":"1422157235289671", >> > "disk_format_version":6, >> > "committed_update_seq":12982621 >> > } >> > >> > -Sharath >> > >> > On Sun, Jan 25, 2015 at 5:40 PM, Alexander Shorin <kxe...@gmail.com> >> wrote: >> > >> >> Hm...are you sure that database was recently compacted? How many >> >> deleted documents in these databases? >> >> -- >> >> ,,,^..^,,, >> >> >> >> >> >> On Sun, Jan 25, 2015 at 9:27 AM, Sharath <sharat...@gmail.com> wrote: >> >> > Hi Alexander, >> >> > >> >> > CouchDB version: 1.61 >> >> > >> >> > database1: "disk_size":8574615674,"data_size":6896805847 >> >> > database2: "disk_size":31587352698,"data_size":8026729752 >> >> > >> >> > -Sharath >> >> > >> >> > On Sun, Jan 25, 2015 at 4:55 PM, Alexander Shorin <kxe...@gmail.com> >> >> wrote: >> >> > >> >> >> Hi Sharath, >> >> >> >> >> >> What is your CouchDB version? >> >> >> Could you provide data_size and disk_size values from database info >> for >> >> >> both? >> >> >> curl http://localhost:5984/db1 >> >> >> curl http://localhost:5984/db2 >> >> >> -- >> >> >> ,,,^..^,,, >> >> >> >> >> >> >> >> >> On Sun, Jan 25, 2015 at 7:11 AM, Sharath <sharat...@gmail.com> >> wrote: >> >> >> > Hi All, >> >> >> > >> >> >> > recently moved to couchdb and find my databases taking a lot of >> >> diskspace >> >> >> > >> >> >> > I have two database both with json documents (no attachments) - >> >> however >> >> >> the >> >> >> > sizes vary by a lot >> >> >> > >> >> >> > database1 size 8.0GB number of documents: 13337224 >> >> >> > database2 size 29.4 GB number of documents: 12981148 >> >> >> > >> >> >> > both the databases have been compacted >> >> >> > >> >> >> > each document in database1 is 487 bytes long (including _id and >> _rev) >> >> >> > each document in database2 is 564 bytes long (including _id and >> _rev) >> >> >> > >> >> >> > database1 should be ~6.1GB (only data without compression) [487 * >> >> >> 13337224 >> >> >> > / 1024 /1024] >> >> >> > database2 should be ~6.9GB (only data without compression) [564 * >> >> >> 12981148 >> >> >> > / 1024 /1024] >> >> >> > >> >> >> > I'm curious why the database file takes 29 GB. >> >> >> > >> >> >> > unfortunately I cannot post the document as this is prod data. >> >> >> > >> >> >> > CouchDb is running on my mac 10.10.1 with default configuration. >> >> >> > >> >> >> > database1 was populated by a bulk upload from a mysql extract and >> >> >> database >> >> >> > 2 was populated by individual document inserts (put) database >> >> compaction >> >> >> > was let to complete (took ~30hr on database 2) >> >> >> > >> >> >> > is there a command that compacts superfluous data? or am i missing >> >> >> anything? >> >> >> > >> >> >> > >> >> >> > thanks! >> >> >> > >> >> >> > -Sharath >> >> >> >> >> >> > >