Yep, all that sounds right.
An additional optimization counts terms for the documents *not* in the
set when the base set is over half the size of the index.

-Yonik
http://www.lucidimagination.com


On Tue, Jun 9, 2009 at 1:01 PM, Michael Ludwig <m...@as-guides.com> wrote:
> Yonik,
>
> from your initial comment for SOLR-475:
>
> | * To save space and speed up faceting, any term that matches enough
> | * documents will not be un-inverted... it will be skipped while
> | * building the un-inverted field structore, and will use a set
> | * intersection method during faceting.
>
> Does this mean that frequently occurring terms (which we can use for
> faceting in 1.3 without problems) are handled exactly as they were
> before, by allocating a slot in the filter cache upon request, while
> those zillions of pesky little fringe terms outside the mainstream,
> for which allocating a slot in the filter cache would be overkill
> (and possibly cause inefficient contention, eviction, and, hence,
> a performance penalty) are now handled by the new structure mapping
> documents to term numbers?
>
> So doing faceting for a given set of documents would result in (a) doing
> set intersection using those filter query results that have been set up
> (for the terms occurring in many documents), and (b) collecting all the
> pesky little terms from the new structure mapping documents to term
> numbers?
>
> So basically, depending on expediency, you (a) know the facets and count
> the documents which display them, or you (b) take the documents and see
> what facets they have?
>
> Michael Ludwig
>

Reply via email to