Andrew Nagy, ditto on what Yonik said.  Here is some further elaboration:

I am doing much the same thing (faceting on Author etc.). When my Author field 
was defined as a solr.TextField, even using solr.KeywordTokenizerFactory so it 
wasn't actually tokenized, the faceting code chose the QueryFilter approach, 
and faceting on Author for 100k+ document took about 4 seconds.

When I changed the field to "string" e.g. solr.StrField, the faceting code 
recognized it as untokenized and used the FieldCache approach.  Times have 
dropped to about 120ms for the first query (when the FieldCache is generated) 
and < 10ms for subsequent queries returning a few thousand results.  Quite a 
difference.

The strategy must be chosen on a field-by-field basis.  While QueryFilter is 
excellent for fields with a small set of enumerated values such as Language or 
Format, it is inappropriate for large value sets such as Author.

Unfortunately which strategy will be chosen is currently undocumented and 
control is a bit oblique:  If the field is tokenized or multivalued or Boolean, 
the FilterQuery method will be used; otherwise the FieldCache method.  I expect 
I or others will improve that shortly.

- J.J.

At 2:58 PM -0500 12/8/06, Yonik Seeley wrote:
>Right, if any of these are tokenized, then you could make them
>non-tokenized (use "string" type).  If they really need to be
>tokenized (author for example), then you could use copyField to make
>another copy to a non-tokenized field that you can use for faceting.
>
>After that, as Hoss suggests, run a single faceting query with all 4
>fields and look at the filterCache statistics.  Take the "lookups"
>number and multiply it by, say, 1.5 to leave some room for future
>growth, and use that as your cache size.  You probably want to bump up
>both initialSize and autowarmCount as well.
>
>The first query will still be slow.  The second should be relatively fast.
>You may hit an OOM error.  Increase the JVM heap size if this happens.
>
>-Yonik

Reply via email to