Beyond what Erick said, I'll add that it is often better to "do this from the 
outside" and send in multiple actual end-user displayable facet values.  When 
you send in a field like "Water -- Irrigation ; Water -- Sewage", that is what 
will get stored (if you have it set to stored), but what you might rather want 
is each individual value stored, which can only be done by the indexer sending 
in multiple values, not through just tokenization.

        Erik

On Jan 27, 2011, at 09:09 , Dennis Schafroth wrote:

> Hi, 
> 
> Pretty novice into SOLR coding, but looking for hints about how (if not 
> already done) to implement a PatternTokenizer, that would index this into 
> multivalie fields of solr.StrField for facetting. Ex. 
> 
> Water -- Irrigation ; Water -- Sewage
> 
> should be tokenized into 
> 
> Water
> Irrigation
> Sewage
> 
> in multi-valued non-tokenized fields due to performance. I could do it from 
> the outside, but I would this as a opportunity to learn about SOLR.
> 
> It "works" as I want with the PatternTokenizerFactory when I am using 
> solr.TextField, but not when I am using the non-tokenized solr.StrField. But 
> according to reading, facets performance is better on non-tokenized fields. 
> We need better performance on our faceted searches on these multi-value 
> fields.  (25 million documents, three multi-valued facets)
> 
> I would also need to have a filter that filter out identical values as the 
> feeds have redundant data as shown above.
> 
> Can anyone point point me in the right direction..
> 
> cheers, 
> :-Dennis

Reply via email to