Hi Mike, Thanks for clarifying what has been a bit of a black box to me.
A couple of questions, to increase my understanding, if you don't mind. If I am only using fields with multiValued="false", with a type of "string" or "integer" (untokenized), does solr automatically use approach 2? Or is this something I have to actively configure? And is approach 2 better than 1? Or vice versa? Or is the answer "it depends"? :-) If, as I suspect, the answer was "it depends", are there any general guidelines on when to use or approach or the other? Thanks, Tom On 9/6/07, Mike Klaas <[EMAIL PROTECTED]> wrote: > > > On 6-Sep-07, at 3:25 PM, Mike Klaas wrote: > > > > > There are essentially two facet computation strategies: > > > > 1. cached bitsets: a bitset for each term is generated and > > intersected with the query restul bitset. This is more general and > > performs well up to a few thousand terms. > > > > 2. field enumeration: cache the field contents, and generate counts > > using this data. Relatively independent of #unique terms, but > > requires at most a single facet value per field per document. > > > > So, if you factor author into Primary author/Secondary author, > > where each is guaranteed to only have one value per doc, this could > > greatly accelerate your faceting. There are probably fewer unique > > subjects, so strategy 1 is likely fine. > > > > To use strategy 2, just make sure that multivalued="false" is set > > for those fields in schema.xml > > I forgot to mention that strategy 2 also requires a single token for > each doc (see http://wiki.apache.org/solr/ > FAQ#head-14f9f2d84fb2cd1ff389f97f19acdb6ca55e4cd3) > > -Mike >