On Sep 6, 2012, at 7:18 AM, Tim Tisdall wrote: > I had a database of about 10.8gb with almost 15 million records which > was fully compacted. I had to back it up by dumping all the JSON and > then restoring it by inserting it back in. After it was done and I > compacted it the database was now only 8.8gb! I shed 2gb because of > dropping the revision stubs still in the database. This is likely > because each record had about 6 revisions (so around 90 million > stubs). All of this is understandable, but 2gb isn't really > negligible when running on a virtualized instance of 35gb. The > problem, though, is the method I used to dump to JSON and place it > back into couchdb took almost 12hrs! > > Is there a way to drop all of the revision stubs and reset the > document's revision tags back to "1-" values? I know this would > completely break any kind of replication, but in this instance I am > not doing any. > > The best method I can think of is to insert each record into a new DB > (not through replication, though, because that takes the stubs over > with it). Then go through the _changes from when I started and recopy > those over to make sure everything is up-to-date. This would save me > having things down for 12hrs, but I have no idea how slow this process > would take. > > Suggestions?
You may find http://wiki.apache.org/couchdb/Purge_Documents interesting, however since it can only purge from leaf nodes you may still need creative application and I'm not sure what you'd gain over a scripted copy to a different database. What are your uptime/consistency needs? Must the document ids be preserved? hth, -nvw