Seconding Shawn, if your queries will always aim the active documents you
will see :
High level this is what is going to happen :

A) You need to run your query + a filter query that will retrieve only
active documents.
The filter query results will be cached.
Solr will query over the entire document space, and then merge the query
results with the filtered documents.

B) You run your query over the entire ( smaller) document space .

So option B will be faster, possibly not massively but We do less
calculations.

Cheers

On Fri, Nov 4, 2016 at 2:45 PM, Shawn Heisey <apa...@elyograg.org> wrote:

> On 11/4/2016 8:22 AM, Vincenzo D'Amore wrote:
> > Given 2 collection A and B:
> >
> > - A collection have 5 M documents with an attribute active: true/false.
> > - B collection have only 2.5 M documents, but all the documents have
> > attribute active:true
> > - in any case, A or B, I can only search upon documents that have
> > active:true
> >
> > Which one perform faster?
>
> This is not backed by knowledge of how the code internals operate, just
> things I've pieced together from my own experience and other things said
> on the list in response to past questions.
>
> Assuming you have the available memory to effectively cache both
> indexes, five million documents is chump change to Solr.  If you don't
> have that memory, it might present a performance issue.
>
> Because query performance is largely dependent on the number of terms
> that Solr must look through, and the active field probably has at most
> three (true, false, and field not present), that part of your query will
> generally be very fast with ANY number of documents.
>
> If you search for all documents and filter on the active field, the
> difference between the two will probably be so small a human being would
> never notice it, but it probably would be a difference that you'd be
> able to measure.
>
> Where you *might* notice a difference is when you do a "real" query
> against other fields in the index, and filter on the active field.
> That's when the document count will usually track with the term count.
> The smaller collection may be noticeably faster for this kind of query.
>
> Thanks,
> Shawn
>
>


-- 
--------------------------

Benedetti Alessandro
Visiting card : http://about.me/alessandro_benedetti

"Tyger, tyger burning bright
In the forests of the night,
What immortal hand or eye
Could frame thy fearful symmetry?"

William Blake - Songs of Experience -1794 England

Reply via email to