We have user entered item listings that have a title and contain html in
their descriptions. I would like to index the full descriptions (minus the
html which im stripping out via the DIH HTMLStripTransformer) so I can
search across that it as well as perform highlighting/excerpting. 

Can someone recommend a good fieldType and field for this need. The
following is what I've been using up to this point for both fields (title
and description).

   <fieldType name="text" class="solr.TextField" omitNorms="false">
      <analyzer>
        <tokenizer class="solr.WhitespaceTokenizerFactory"/>
        <filter class="solr.WordDelimiterFilterFactory"
                generateWordParts="1"
                generateNumberParts="1"
                catenateWords="1"
                catenateNumber="1"
                catenateAll="1"
                splitOnCaseChange="1"/>
        <filter class="solr.LowerCaseFilterFactory"/>
        <filter
class="com.lucidimagination.solrworks.analysis.LucidKStemFilterFactory"/>
        <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
      </analyzer>
    </fieldType>

Should I be using the DIH HTMLStripTransformer or HTMLStripCharFilterFactory
to remove the html? Which one is faster? 

Any suggestions on my fieldType? 

Thanks a lot!
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Need-guidance-on-schema-type-tp846923p846923.html
Sent from the Solr - User mailing list archive at Nabble.com.

Reply via email to