On 28-Oct-08, at 5:36 AM, Jérôme Etévé wrote:

Hi all,

 In my code, I'd like to keep a subset of my 14M docs which is around
100k large.

What is according to you the best option in terms of speed and memory usage ?

Some basic thoughts tells me the BitDocSet should be the fastest for
lookup, but takes ~ 14M * sizeof(int) in memory, whereas
the HashDocSet takes just ~ 100k * sizeof(int) , but is a bit slower lookup.

The doc of HashDocSet says "t can be a better choice if there are few
docs in the set" . What does 'few' means in this context ?

Solr, by default, ships in a configuration that creates filters with HashDocSet if the size of the set is < 3000, and BitDocSet otherwise. This parameter is tunable in solrconfig.xml. You might find it helps to increase this slightly with 14m docs--say to 5000-6000. In my testing, any higher than this is a net loss.

-Mike

Reply via email to