Re: DocSet: BitDocSet or HashDocSet ?

Mike Klaas Mon, 03 Nov 2008 12:14:40 -0800


On 28-Oct-08, at 5:36 AM, Jérôme Etévé wrote:

Hi all,

 In my code, I'd like to keep a subset of my 14M docs which is around
100k large.

What is according to you the best option in terms of speed andmemory usage ?


Some basic thoughts tells me the BitDocSet should be the fastest for
lookup, but takes ~ 14M * sizeof(int) in memory, whereas

the HashDocSet takes just ~ 100k * sizeof(int) , but is a bitslower lookup.


The doc of HashDocSet says "t can be a better choice if there are few
docs in the set" . What does 'few' means in this context ?

Solr, by default, ships in a configuration that creates filters withHashDocSet if the size of the set is < 3000, and BitDocSet otherwise.This parameter is tunable in solrconfig.xml. You might find it helpsto increase this slightly with 14m docs--say to 5000-6000. In mytesting, any higher than this is a net loss.


-Mike

Re: DocSet: BitDocSet or HashDocSet ?

Reply via email to