On 11/4/2016 8:22 AM, Vincenzo D'Amore wrote:
> Given 2 collection A and B:
>
> - A collection have 5 M documents with an attribute active: true/false.
> - B collection have only 2.5 M documents, but all the documents have
> attribute active:true
> - in any case, A or B, I can only search upon documents that have
> active:true
>
> Which one perform faster?

This is not backed by knowledge of how the code internals operate, just
things I've pieced together from my own experience and other things said
on the list in response to past questions.

Assuming you have the available memory to effectively cache both
indexes, five million documents is chump change to Solr.  If you don't
have that memory, it might present a performance issue.

Because query performance is largely dependent on the number of terms
that Solr must look through, and the active field probably has at most
three (true, false, and field not present), that part of your query will
generally be very fast with ANY number of documents.

If you search for all documents and filter on the active field, the
difference between the two will probably be so small a human being would
never notice it, but it probably would be a difference that you'd be
able to measure.

Where you *might* notice a difference is when you do a "real" query
against other fields in the index, and filter on the active field. 
That's when the document count will usually track with the term count. 
The smaller collection may be noticeably faster for this kind of query.

Thanks,
Shawn

Reply via email to