Yep, all that sounds right. An additional optimization counts terms for the documents *not* in the set when the base set is over half the size of the index.
-Yonik http://www.lucidimagination.com On Tue, Jun 9, 2009 at 1:01 PM, Michael Ludwig <m...@as-guides.com> wrote: > Yonik, > > from your initial comment for SOLR-475: > > | * To save space and speed up faceting, any term that matches enough > | * documents will not be un-inverted... it will be skipped while > | * building the un-inverted field structore, and will use a set > | * intersection method during faceting. > > Does this mean that frequently occurring terms (which we can use for > faceting in 1.3 without problems) are handled exactly as they were > before, by allocating a slot in the filter cache upon request, while > those zillions of pesky little fringe terms outside the mainstream, > for which allocating a slot in the filter cache would be overkill > (and possibly cause inefficient contention, eviction, and, hence, > a performance penalty) are now handled by the new structure mapping > documents to term numbers? > > So doing faceting for a given set of documents would result in (a) doing > set intersection using those filter query results that have been set up > (for the terms occurring in many documents), and (b) collecting all the > pesky little terms from the new structure mapping documents to term > numbers? > > So basically, depending on expediency, you (a) know the facets and count > the documents which display them, or you (b) take the documents and see > what facets they have? > > Michael Ludwig >