I have a solr index where certain facet fields should only contain one
or more items from a limited list of values. To enforce this
restriction at index time I have been looking at using a
KeepWordFilterFactory. It seems it ought to work as I have it
implamented, and actually seems to work when tested through the admin
analysis page, but when I index a document with that filter in place
values that ought to be filtered out aren't. (I am running the solr 1.4
release)
I've added a new field type in schema.xml:
<fieldType name="formatFacet" class="solr.StrField"
sortMissingLast="true" omitNorms="true">
<analyzer type="index">
<tokenizer class="solr.KeywordTokenizerFactory"/>
<filter class="solr.KeepWordFilterFactory"
words="format_facet.txt" ignoreCase="false" />
</analyzer>
</fieldType>
placed a file format_facet.txt in the conf directory containing:
Book
Online
Microform
Journal/Magazine
Musical Score
Musical Recording
Thesis/Dissertation
Video
Streaming Video
Software/Multimedia
Photographs
Cassette
referenced this new field type with a field declaration in schema.xml
<field name="format_facet" type="formatFacet" indexed="true"
stored="true" multiValued="true" />
also have this dynamic field, but this seems irrelevant:
<dynamicField name="*_facet" type="string" indexed="true"
stored="true" multiValued="true" omitNorms="true" />
restarted the jetty server running the solr server.
and submitted a solr add document containing
format_facet=format_facet(1.0)={[Video, Streaming Video, Online,
Gooberhead, Book of the Month]}
Of these values only Video, Streaming Video and Online ought to end up
in the index, however all five values end up as format_facet values for
the solr item in question.
<arr name="format_facet">
<str>Video</str>
<str>Streaming Video</str>
<str>Online</str>
<str>Gooberhead</str>
<str>Book of the Month</str>
</arr>
I must be missing something fairly basic, since this doesn't seem
especially complicated.
Thanks in advance for any assistance,
-Bob Haschart