On 28-Oct-08, at 5:36 AM, Jérôme Etévé wrote:
Hi all,
In my code, I'd like to keep a subset of my 14M docs which is around
100k large.
What is according to you the best option in terms of speed and
memory usage ?
Some basic thoughts tells me the BitDocSet should be the fastest for
: The doc of HashDocSet says t can be a better choice if there are few
: docs in the set . What does 'few' means in this context ?
it's relative the total size of your index. if you have a million docs,
but you are dealing with DocSets that are only going to contain 10 docs,
then both the
Hi all,
In my code, I'd like to keep a subset of my 14M docs which is around
100k large.
What is according to you the best option in terms of speed and memory usage ?
Some basic thoughts tells me the BitDocSet should be the fastest for
lookup, but takes ~ 14M * sizeof(int) in memory,
bitdocset does not take ~ 14M * sizeof(int) in memory
it may take a maximum of
14M/8 bytes in memory ~= 1.75MB
On Tue, Oct 28, 2008 at 6:06 PM, Jérôme Etévé [EMAIL PROTECTED] wrote:
Hi all,
In my code, I'd like to keep a subset of my 14M docs which is around
100k large.
What is