With the following settings, the size came down to 14GB

checkpoint_after = 524288000
doc_buffer_size = 52428800

{
    "db_name": "database2",
    "doc_count": 12986513,
    "doc_del_count": 0,
    "update_seq": 12986513,
    "purge_seq": 0,
    "compact_running": false,
    "disk_size": 15156363386,
    "data_size": 8034864363,
    "instance_start_time": "1422213492581804",
    "disk_format_version": 6,
    "committed_update_seq": 12986513
}

ratio disksize/datasize = 1.88

When I further increase the setting by a factor of 10, the size reduces to
8.9G
checkpoint_after = 5242880000
doc_buffer_size = 524288000
{
    "db_name": "database2",
    "doc_count": 12986513,
    "doc_del_count": 0,
    "update_seq": 12986513,
    "purge_seq": 0,
    "compact_running": false,
    "disk_size": 9518403706,
    "data_size": 8027906267,
    "instance_start_time": "1422302560319114",
    "disk_format_version": 6,
    "committed_update_seq": 12986513
}

ratio disksize/datasize = 1.18

My other database got even better with a ratio of 1.00052

{
    "db_name": "database1",
    "doc_count": 13337224,
    "doc_del_count": 0,
    "update_seq": 13337224,
    "purge_seq": 0,
    "compact_running": false,
    "disk_size": 6897811578,
    "data_size": 6894215476,
    "instance_start_time": "1422302561058460",
    "disk_format_version": 6,
    "committed_update_seq": 13337224
}

Q: how does one determine the optimum checkpoint_after and doc_buffer_size?
If this is dependent on the document size, then this value should be
configurable per database.

-Sharath

On Mon, Jan 26, 2015 at 5:31 AM, Sharath <sharat...@gmail.com> wrote:

> Thanks - I've set the following values:
> checkpoint_after = 524288000
> doc_buffer_size = 52428800
>
> and started the compact process. Have to wait for a bit.
>
> -Sharath
>
> On Sun, Jan 25, 2015 at 5:59 PM, Alexander Shorin <kxe...@gmail.com>
> wrote:
>
>> Ok, so far, this looks exactly what I have for my hashes databases:
>>
>> data_size: 557537537
>> disk_size: 1542664311
>> doc_count: 1298255
>> doc_del_count: 18
>> avg doc size: ~350 bytes
>>
>> While there is 3 times disk_size/data_size ratio, this database
>> uncompactiable: CouchDB isn't able to get it to 500MB size, leaving it
>> at 1.5GB. This looks like a some "specifics" of underlying database
>> format which isn't able to rationale allocate huge amount of tiny
>> documents....But, CouchDB provides two interesting options to
>> configure database compaction: doc_buffer_size and checkpoint_after.
>>
>> http://docs.couchdb.org/en/latest/config/compaction.html#database_compaction
>>
>> By default they are have the following values:
>>
>> checkpoint_after = 5242880
>> doc_buffer_size = 524288
>>
>> And this makes my hashes database to stop at 1.5GB point. If I
>> multiple them both by 10, after compaction database size will be
>> ~900MB - yay! If I do this again with the resulting config:
>>
>> checkpoint_after = 524288000
>> doc_buffer_size = 52428800
>>
>> Then database sizes will be much more better:
>>
>> disk_size: 633688183
>> data_size: 556759808
>>
>> Almost no overhead! Why this happens? Paul or Robert may correct me,
>> but it seems that the most of wasted space after compaction is
>> consumed by checkpoint headers and btree rebalance. Asking CouchDB to
>> make compaction checkpoints rarely and use bigger buffer for docs
>> allows it to build the resulting btree in the new database file in
>> more optimized way. As the downsize of such configuration, if your
>> compaction fails, it have to start from far and bigger buffer size
>> requires more memory to use.
>>
>> Try to play with these options and see how they will affect on your
>> databases.
>>
>> P.S. This issue is eventually solved for upcoming 2.0 with default config.
>> --
>> ,,,^..^,,,
>>
>>
>> On Sun, Jan 25, 2015 at 9:52 AM, Sharath <sharat...@gmail.com> wrote:
>> > yes the databases were recently compacted - both the databases run as
>> > insert only (no deletion for either).
>> > database2 completed compaction about 4 hours ago and I've triggered
>> > compaction again (so what you see below for database2 could be
>> misleading)
>> >
>> > database1:
>> > {
>> >    "db_name":"database1",
>> >    "doc_count":13337224,
>> >    "doc_del_count":0,
>> >    "update_seq":13337224,
>> >    "purge_seq":0,
>> >    "compact_running":false,
>> >    "disk_size":8574615674,
>> >    "data_size":6896805847,
>> >    "instance_start_time":"1422157234994080",
>> >    "disk_format_version":6,
>> >    "committed_update_seq":13337224
>> > }
>> >
>> > database2:
>> > {
>> >    "db_name":"database2",
>> >    "doc_count":12982621,
>> >    "doc_del_count":0,
>> >    "update_seq":12982621,
>> >    "purge_seq":0,
>> >    "compact_running":true,
>> >    "disk_size":31587352698,
>> >    "data_size":8026729752,
>> >    "instance_start_time":"1422157235289671",
>> >    "disk_format_version":6,
>> >    "committed_update_seq":12982621
>> > }
>> >
>> > -Sharath
>> >
>> > On Sun, Jan 25, 2015 at 5:40 PM, Alexander Shorin <kxe...@gmail.com>
>> wrote:
>> >
>> >> Hm...are you sure that database was recently compacted? How many
>> >> deleted documents in these databases?
>> >> --
>> >> ,,,^..^,,,
>> >>
>> >>
>> >> On Sun, Jan 25, 2015 at 9:27 AM, Sharath <sharat...@gmail.com> wrote:
>> >> > Hi Alexander,
>> >> >
>> >> > CouchDB version: 1.61
>> >> >
>> >> > database1: "disk_size":8574615674,"data_size":6896805847
>> >> > database2: "disk_size":31587352698,"data_size":8026729752
>> >> >
>> >> > -Sharath
>> >> >
>> >> > On Sun, Jan 25, 2015 at 4:55 PM, Alexander Shorin <kxe...@gmail.com>
>> >> wrote:
>> >> >
>> >> >> Hi Sharath,
>> >> >>
>> >> >> What is your CouchDB version?
>> >> >> Could you provide data_size and disk_size values from database info
>> for
>> >> >> both?
>> >> >> curl http://localhost:5984/db1
>> >> >> curl http://localhost:5984/db2
>> >> >> --
>> >> >> ,,,^..^,,,
>> >> >>
>> >> >>
>> >> >> On Sun, Jan 25, 2015 at 7:11 AM, Sharath <sharat...@gmail.com>
>> wrote:
>> >> >> > Hi All,
>> >> >> >
>> >> >> > recently moved to couchdb and find my databases taking a lot of
>> >> diskspace
>> >> >> >
>> >> >> > I have two database both with json documents (no attachments) -
>> >> however
>> >> >> the
>> >> >> > sizes vary by a lot
>> >> >> >
>> >> >> > database1      size 8.0GB    number of documents: 13337224
>> >> >> > database2      size 29.4 GB    number of documents: 12981148
>> >> >> >
>> >> >> > both the databases have been compacted
>> >> >> >
>> >> >> > each document in database1 is 487 bytes long (including _id and
>> _rev)
>> >> >> > each document in database2 is 564 bytes long (including _id and
>> _rev)
>> >> >> >
>> >> >> > database1 should be ~6.1GB (only data without compression) [487 *
>> >> >> 13337224
>> >> >> > / 1024 /1024]
>> >> >> > database2 should be ~6.9GB (only data without compression) [564 *
>> >> >> 12981148
>> >> >> > / 1024 /1024]
>> >> >> >
>> >> >> > I'm curious why the database file takes 29 GB.
>> >> >> >
>> >> >> > unfortunately I cannot post the document as this is prod data.
>> >> >> >
>> >> >> > CouchDb is running on my mac 10.10.1 with default configuration.
>> >> >> >
>> >> >> > database1 was populated by a bulk upload from a mysql extract and
>> >> >> database
>> >> >> > 2 was populated by individual document inserts (put) database
>> >> compaction
>> >> >> > was let to complete (took ~30hr on database 2)
>> >> >> >
>> >> >> > is there a command that compacts superfluous data? or am i missing
>> >> >> anything?
>> >> >> >
>> >> >> >
>> >> >> > thanks!
>> >> >> >
>> >> >> > -Sharath
>> >> >>
>> >>
>>
>
>

Reply via email to