We have user entered item listings that have a title and contain html in their descriptions. I would like to index the full descriptions (minus the html which im stripping out via the DIH HTMLStripTransformer) so I can search across that it as well as perform highlighting/excerpting.
Can someone recommend a good fieldType and field for this need. The following is what I've been using up to this point for both fields (title and description). <fieldType name="text" class="solr.TextField" omitNorms="false"> <analyzer> <tokenizer class="solr.WhitespaceTokenizerFactory"/> <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="1" catenateWords="1" catenateNumber="1" catenateAll="1" splitOnCaseChange="1"/> <filter class="solr.LowerCaseFilterFactory"/> <filter class="com.lucidimagination.solrworks.analysis.LucidKStemFilterFactory"/> <filter class="solr.RemoveDuplicatesTokenFilterFactory"/> </analyzer> </fieldType> Should I be using the DIH HTMLStripTransformer or HTMLStripCharFilterFactory to remove the html? Which one is faster? Any suggestions on my fieldType? Thanks a lot! -- View this message in context: http://lucene.472066.n3.nabble.com/Need-guidance-on-schema-type-tp846923p846923.html Sent from the Solr - User mailing list archive at Nabble.com.