On Thu, Mar 8, 2012 at 1:21 PM, Matthieu Rakotojaona <[email protected]> wrote: > Hello everyone, > > I discovered couchDB a few months ago, and decided to dive in just > recently. I don't want to be long, but couchDB is Amazing. True offline > mode/replication, JSON over HTTP, MVCC, MapReduce and other concepts > widened my horizon of how to solve a problem, and I'm really grateful. > > There is a point though that I find sad : the documentation available on > the interwebs are somewhat scarce. Sure you can find yourself because > couchDB is so easy, but there's a particular point that I found to be > especially undocumented : compaction. > > Basically all I could find was : > * If you want to compact your db : > > POST /db/_compact > * If you want to compact your design : > > POST /db/_compact/designname > (which seems to say that you can only compact all your views at once > or none, but not a particular one)
Slightly more specific: Compaction for views is done for all the views in the specified design document. Also, view compaction is in general much more efficient than database compaction. > * Although specially designed like that, the absence of automatic > compaction is seen as unneeded, and a number of people run it with > cron jobs There's an auto compactor in trunk now. > * The real effect of a compaction (ie the real size you are going to > earn) seems to be unknown by many people. Someone (I don't remember > your name, but thank you) came with a patch to dispaly the data_size, > which is the real size of your data on disk; this looks hackish. > Which part looks hackish? > And the initial purpose to my mail comes here. I just added a few > documents in my db (1.7+M) and found that the disk_size gives me ~2.5GB. > while the data_size is around 660 Mo. From what I read, a compaction is > supposed to leave you with data_size ~= disk_size; yet, after numerous > compaction, it doesn't shrink a bit. > I bet you have random document ids which will indeed cause the database file to end up with a significant amount of garbage left after compaction. I'll describe why below. > I suppose the problem is exactly the same with views; I'm building it at > the moment, so I will test it later. > Technically yes, but in general no. More below. > I also would like to understand the process of compaction. All I could > see was : > > 1. couchdb parses the entire DB, fetching only the last (or the few > last, from parameters) revision of each document > 2. it assembles them in a db.compact.couch file > 3. when finished, db.compact.couch replaces db.couch > In broad strokes. Currently, CouchDB compacts like such: 1. Iterate over docs in order of the update_sequence 2. Read document from the id_btree 3. Write doc to both the update sequence and id indexes in the compaction file 4. When finished, delete the .couch file and rename .couch.compact -> .couch Its a bit more complicated than that due to buffering of docs to improve throughput and what not, but those are the important details. The issue is two fold. First, reading the docs in order of the update sequence and then fetching them using the id btree means we're incurring a btree lookup per doc. There's a patch in BigCouch that addresses this by duplicating a record in both trees. It's been shown to have significant speedups for compaction and replication both at the expensive of storing more data (basically it has two copies of the revision tree, but importantly does not duplicate the actual JSON body of the document). While not directly size related in itself, it leads us to the second issue. Namely, that writing both indexes simultaneously is bad for introducing garbage into the .compact file if the order of document ids in the update_seq is random. Ie, if you wrote the same documents to a database where one had is that were monotonically increasing, (say, "%0.20d" % i) vs a random document id and then compact both, the random ids will use significantly more disk space after compaction (as well as take longer to compact). The issue here is that when we update the id tree with random doc ids we end up rewriting more of the internal nodes (append only storage) which causes more garbage to accumulate. Although, all hope is not lost. There's a second set of two patches in BigCouch that I wrote to address this specifically. The first patch changes the compactor to use a temporary file for the id btree. Then just before compaction finishes, this tree is streamed back into the .compact file (in sorted order so that internal garbage is minimized). This helps tremendously for databases with random document ids (sorted ids are already ~optimal for this scheme). The second patch in the set uses an external merge sort on the temporary file which helps speed up the compaction. Depending on the dataset these improvements can have massive gains for post-compaction data sizes as well as time required for compaction. I plan on pulling these back into CouchDB in the coming months as we work on merging BigCouch back into CouchDB so hopefully by end of summer they'll be in master for everyone to enjoy. As to views, they don't really require these imrpovements because their indexes are always streamed in sorted order. So its both fast and close-ish to optimal. Although somewhere I had a patch that changed the index builds to be actually optimal based on ideas from Filipe but as I recall it wasn't a super huge win so I didn't actually commit it. > So I wondered : > * Can you launch a compaction, halt it and continue it later ? While you can resume compaction, there's no API for pausing or canceling them. There's actually a really neat way in Erlang to do this that we've mentioned occasionally adding to the active tasks API but no one has gotten around to adding it. > * If yes, can you move the temporary db.compact.couch file somewhere > else and link to it so that couchdb thinks nothing has changed ? > I'm not sure what you mean here. > Thank you, > > > > -- > Matthieu RAKOTOJAONA
