Hi Mike,

Thanks for clarifying what has been a bit of a black box to me.

A couple of questions, to increase my understanding, if you don't mind.

If I am only using fields with multiValued="false", with a type of "string"
or "integer"  (untokenized), does solr automatically use approach 2? Or is
this something I have to actively configure?

And is approach 2 better than 1? Or vice versa? Or is the answer "it
depends"? :-)

If, as I suspect, the answer was "it depends", are there any general
guidelines on when to use or approach or the other?

Thanks,

Tom














On 9/6/07, Mike Klaas <[EMAIL PROTECTED]> wrote:
>
>
> On 6-Sep-07, at 3:25 PM, Mike Klaas wrote:
>
> >
> > There are essentially two facet computation strategies:
> >
> > 1. cached bitsets: a bitset for each term is generated and
> > intersected with the query restul bitset.  This is more general and
> > performs well up to a few thousand terms.
> >
> > 2. field enumeration: cache the field contents, and generate counts
> > using this data.  Relatively independent of #unique terms, but
> > requires at most a single facet value per field per document.
> >
> > So, if you factor author into Primary author/Secondary author,
> > where each is guaranteed to only have one value per doc, this could
> > greatly accelerate your faceting.  There are probably fewer unique
> > subjects, so strategy 1 is likely fine.
> >
> > To use strategy 2, just make sure that multivalued="false" is set
> > for those fields in schema.xml
>
> I forgot to mention that strategy 2 also requires a single token for
> each doc (see http://wiki.apache.org/solr/
> FAQ#head-14f9f2d84fb2cd1ff389f97f19acdb6ca55e4cd3)
>
> -Mike
>

Reply via email to