Re: Highest frequency terms for a subset of documents

Ofer Fort Thu, 21 Apr 2011 06:45:17 -0700

Not sure i fully understand,
If "facet.method=enum steps over all terms in the index for that field",
than what does setting the q=field:subset do? if i set the q=*:*, than how
do i get the frequency only on my subset?
Ofer


On Thu, Apr 21, 2011 at 4:40 PM, Yonik Seeley <yo...@lucidimagination.com>wrote:

> On Thu, Apr 21, 2011 at 9:24 AM, Ofer Fort <o...@tra.cx> wrote:
> > Another strange behavior is that the Qtime seems pretty stable, no matter
> > how many object match my query. 200K and 20K both take about 17s.
> > I would have guessed that since the time is going over all the terms of
> all
> > the subset documents, would mean that the more documents, the more time.
>
> facet.method=enum steps over all terms in the index for that field...
> that takes time regardless of how many documents are in the base set.
>
> There are also short-circuit methods that avoid looking at the docs
> for a term if it's docfreq is low enough that it couldn't possibly
> make it into the priority queue.  Because if this, it can actually be
> faster to facet on a larger base set (try *:* as the base query).
>
> Actually, it might be interesting to see the query time if you set
> facet.mincount equal to the number of docs in the base set - that will
> test pretty much just the time to enumerate over the terms without
> doing any set intersections at all.  Be careful not to set mincount
> greater than the number of docs in the base set though - solr will
> short-circuit that too and skip enumeration altogether.
>
> The work on the bulkpostings branch should definitely speed up your
> case even more - but I have no idea when it will "land" on trunk.
>
>
> -Yonik
> http://www.lucenerevolution.org -- Lucene/Solr User Conference, May
> 25-26, San Francisco
>

Re: Highest frequency terms for a subset of documents

Reply via email to