Hi, Are you lookig for http://lucene.apache.org/core/old_versioned_docs/versions/3_5_0/api/contrib-misc/org/apache/lucene/misc/HighFreqTerms.html ?
Otis -- Search Analytics - http://sematext.com/search-analytics/index.html Performance Monitoring - http://sematext.com/spm/index.html On Thu, Oct 11, 2012 at 4:40 AM, Toke Eskildsen <t...@statsbiblioteket.dk> wrote: > On Wed, 2012-10-10 at 17:45 +0200, Phil Hoy wrote: >> I know that you can use a facet query to get the unique terms for a >> field taking account of any q or fq parameters but for our use case the >> counts are not needed. So is there a more efficient way of finding >> just unique terms for a field? > > Short answer: Not at this moment. > > > If the amount of unique terms is large (millions), a fair amount of > temporary memory could be spared by just keeping track of matched terms > with a boolean vs. the full int for standard faceting. Reduced memory > requirements means less garbage collection and faster processing due to > better cache utilization. So yes, there is a more efficient way. > > Guessing from your other posts, you are building a social network and > need to query on surnames and similar large fields. Question is of > course how large the payoff will be and if it is worth the investment in > development hours. I would suggest hacking the current faceting code to > use OpenBitSet instead of int[] and doing performance tests on that. > PerSegmentSingleValuedFaceting.SegFacet and UnivertedField.getCounts > seems to be the right places to look in Solr 4. > > Regards, > Toke Eskildsen, State and University Library, Denmark >