Re: Unique terms without faceting

Otis Gospodnetic Thu, 11 Oct 2012 07:42:33 -0700

Hi,

Are you lookig for
http://lucene.apache.org/core/old_versioned_docs/versions/3_5_0/api/contrib-misc/org/apache/lucene/misc/HighFreqTerms.html
?


Otis
--
Search Analytics - http://sematext.com/search-analytics/index.html
Performance Monitoring - http://sematext.com/spm/index.html


On Thu, Oct 11, 2012 at 4:40 AM, Toke Eskildsen <t...@statsbiblioteket.dk> 
wrote:
> On Wed, 2012-10-10 at 17:45 +0200, Phil Hoy wrote:
>> I know that you can use a facet query to get the unique terms for a
>> field taking account of any q or fq parameters but for our use case the
>> counts are not needed. So is there a more efficient way of finding
>> just unique terms for a field?
>
> Short answer: Not at this moment.
>
>
> If the amount of unique terms is large (millions), a fair amount of
> temporary memory could be spared by just keeping track of matched terms
> with a boolean vs. the full int for standard faceting. Reduced memory
> requirements means less garbage collection and faster processing due to
> better cache utilization. So yes, there is a more efficient way.
>
> Guessing from your other posts, you are building a social network and
> need to query on surnames and similar large fields. Question is of
> course how large the payoff will be and if it is worth the investment in
> development hours. I would suggest hacking the current faceting code to
> use OpenBitSet instead of int[] and doing performance tests on that.
> PerSegmentSingleValuedFaceting.SegFacet and UnivertedField.getCounts
> seems to be the right places to look in Solr 4.
>
> Regards,
> Toke Eskildsen, State and University Library, Denmark
>

Re: Unique terms without faceting

Reply via email to