If you're docid gets repeated, then you could very well use an LRU
cache to get what you want. To make sure you're not missing updates,
using a call to HEAD and checking the ETAG would probably be best.
Also, if you have access to some number of document ids, you can
fetch multiple documents simultaneously by POST'ing a {"keys":
[docid1, docid2, docid3, ...]} to
http://127.0.0.1:5874/db_name/_all_docs?include_docs=true
HTH,
Paul Davis
On Wed, Apr 1, 2009 at 3:34 PM, Manjunath Somashekhar
<[email protected]> wrote:
>
> hi All,
>
> Buoyed by the response i got to my previous mail (Suggestions on View
> performance optimization/improvement), i am asking another question for
> optimizing document look up based on _id.
>
> Let us say we have a db containing a million documents each with _id
> generated by us [1.....1000000]. If we have to get all the documents one by
> one (assuming the search/lookup code will get random inputs of [1..1000000]),
> wat would work best?
>
> As of now wat we are doing is a simple look up like:
> def getDocById(self, id):
> return self.db[id]
>
> For doing a million lookups like this it takes about 50-60 mins on my laptop.
> Is there a better way of doing the same? Thought of fetching a bunch of keys
> in one go caching them (LRU style) and looking up the cache first before
> hitting the db, but given that the input 'id' randomly varies between
> [1..1000000], it has not been a great success.
>
> Any thoughts? Ideas? Suggestions?
>
> Environment details:
> Couchdb - 0.9.0a757326
> Erlang - 5.6.5
> Linux kernel - 2.6.24-23-generic #1 SMP Mon Jan 26 00:13:11 UTC 2009 i686
> GNU/Linux
> Ubuntu distribution
> Centrino Dual core, 4GB RAM laptop
>
> Thanks
> Manju
>
>
>
>