I have a solr index where certain facet fields should only contain one or more items from a limited list of values. To enforce this restriction at index time I have been looking at using a KeepWordFilterFactory. It seems it ought to work as I have it implamented, and actually seems to work when tested through the admin analysis page, but when I index a document with that filter in place values that ought to be filtered out aren't. (I am running the solr 1.4 release)

I've added a new field type in schema.xml:

<fieldType name="formatFacet" class="solr.StrField" sortMissingLast="true" omitNorms="true">
     <analyzer type="index">
          <tokenizer class="solr.KeywordTokenizerFactory"/>
<filter class="solr.KeepWordFilterFactory" words="format_facet.txt" ignoreCase="false" />
     </analyzer>
   </fieldType>

placed a file format_facet.txt in the conf directory containing:

Book
Online
Microform
Journal/Magazine
Musical Score
Musical Recording
Thesis/Dissertation
Video
Streaming Video
Software/Multimedia
Photographs
Cassette

referenced this new field type with a field declaration in schema.xml

<field name="format_facet" type="formatFacet" indexed="true" stored="true" multiValued="true" />

also have this dynamic field, but this seems irrelevant:

<dynamicField name="*_facet" type="string" indexed="true" stored="true" multiValued="true" omitNorms="true" />


restarted the jetty server running the solr server.

and submitted a solr add document containing

format_facet=format_facet(1.0)={[Video, Streaming Video, Online, Gooberhead, Book of the Month]}

Of these values only Video, Streaming Video and Online ought to end up in the index, however all five values end up as format_facet values for the solr item in question.
<arr name="format_facet">
   <str>Video</str>
   <str>Streaming Video</str>
   <str>Online</str>
   <str>Gooberhead</str>
   <str>Book of the Month</str>
</arr>


I must be missing something fairly basic, since this doesn't seem especially complicated.

Thanks in advance for any assistance,

-Bob Haschart

Reply via email to