Hi Chris thank you for replying. My content field in the schema is
stored=true and indexed=false because I am copying the content field
in text field which is by default indexed=true.
I was having a query that I am able to search in the html documents I had
fed to the solr, but as the results
: Hi everyone, i am new to solr technology and not getting a way to get back
: the original HTML document with Hits highlighted into it. what
: configuration and where i can do to instruct SolrCell/ Tika so that it does
: not strips down the tags of HTML document in the content field.
I _think_
Thank you for your help Jack. I just wanted to know if there is any ready
made solution for this because i really don't know about extracting meta
information.
awaiting reply..
Thank you
On Tue, Feb 19, 2013 at 12:48 PM, Jack Krupansky j...@basetechnology.comwrote:
Use the standard update
Look at HTMLStripCharFilter, which accepts HTML as its source text, which
preserves all the HTML tags in the stored value, but then strips off the
HTML tags for tokenization into terms. So, you can search for the actual
text terms, but the HTML will still be in the returned field value for
Thank you for replying sir !!!
I have two queries related with this -
1) So in this case which request handler I have to use because
'ExtractingRequestHandler' by default strips the html content and the
default handler 'UpdateRequestHandler' does not accepts the HTML contrents.
2) How can I
Use the standard update handler and pass the entire HTML page as literal
text in a Solr XML document for the field that has the HTML strip filter,
but be sure to escape the HTML (angle brackets, ampersands, etc.) syntax.
You'll have to process meta information yourself.
-- Jack Krupansky