OK, that complicates things a bit. I would still try to go for a solution where you store the rich text in Solr, but make sure you tokenize it correctly.
If the format is relatively simple, you could use either a regexp pattern tokenizer https://cwiki.apache.org/confluence/display/solr/Tokenizers#Tokenizers-SimplifiedRegularExpressionPatternTokenizer or perhaps, before tokenization, use a pattern replace char filter to strip out the parts of the rich text that should not be indexed https://cwiki.apache.org/confluence/display/solr/CharFilterFactories#CharFilterFactories-solr.PatternReplaceCharFilterFactory I assume that you have some process for converting the rich text to plain text before indexing, so if you can replicate that process using Solr's charfilters, tokenizers and filters then that would allow you to use the highlighter to get the rich text back. HTH, Bjarle 2017-03-30 10:39 GMT+02:00 forest_soup <tanglin0...@gmail.com>: > Unfortunately the rich text is not an html/xml/doc/pdf or any other popular > rich text format. And we would like to show the highlighted text in the > doc's own specific viewer. That's why I'm eagerly want the offset. > > The /tvrh(term vector component) and tv.offsets/tv.positions can give us > such info, but they returns all terms' data instead of the being searched > ones. So we are still seeking ways to filter the results. > > Any ideas? > > Thanks! > > > > -- > View this message in context: http://lucene.472066.n3. > nabble.com/Is-there-a-way-to-retrieve-the-a-term-s- > position-offset-in-Solr-tp4326931p4327623.html > Sent from the Solr - User mailing list archive at Nabble.com. >