Robert Young wrote:
Thanks, that is very helpfull. So, is there a way to find out the
total number of distinct tokens, regardless of which field they're
associated with? And to find which are most popular?


nothing standard does that... the semantics of what it would mean get a little wierd - a histogram of values regardless of the analzer. how would you know how to search for it once you got the result? unless eveyrthing has the same analyzer i guess.

Perhaps consider using a copyField to copy the relevant values into another field - then you can get the top tokens across all these fields with luke.

ryan



Cheers
Rob

On Jan 8, 2008 5:04 PM, Ryan McKinley <[EMAIL PROTECTED]> wrote:
numTerms counts the unique terms (field:value pair) in the index.  The
source is:

         TermEnum te = reader.terms();
         int numTerms = 0;
         while (te.next()) {
           numTerms++;
         }
         indexInfo.add("numTerms", numTerms );

"distinct" is a similar calculation, but for each field.

ryan



Robert Young wrote:
Hi,

In the response for the LuceRequestHandler what do the different
fields mean? Some of them are obvious but some are less so. Is
numTerms the total number of terms or the total number of unique terms
(ie the dictionary), if it is the former how can I find the size of
the dictionary across all fields? I'm assuming that distinct in the
specific field sections is the number of unique terms in that field,
is this correct?

Thanks
Rob




Reply via email to