Re: How can i instruct the Solr/ Solr Cell to output the original HTML document which was fed to it.?

2013-03-05 Thread Divyanand Tiwari
Hi Chris thank you for replying. My content field in the schema is stored=true and indexed=false because I am copying the content field in text field which is by default indexed=true. I was having a query that I am able to search in the html documents I had fed to the solr, but as the results

Re: How can i instruct the Solr/ Solr Cell to output the original HTML document which was fed to it.?

2013-02-21 Thread Chris Hostetter
: Hi everyone, i am new to solr technology and not getting a way to get back : the original HTML document with Hits highlighted into it. what : configuration and where i can do to instruct SolrCell/ Tika so that it does : not strips down the tags of HTML document in the content field. I _think_

Re: How can i instruct the Solr/ Solr Cell to output the original HTML document which was fed to it.?

2013-02-19 Thread Divyanand Tiwari
Thank you for your help Jack. I just wanted to know if there is any ready made solution for this because i really don't know about extracting meta information. awaiting reply.. Thank you On Tue, Feb 19, 2013 at 12:48 PM, Jack Krupansky j...@basetechnology.comwrote: Use the standard update

Re: How can i instruct the Solr/ Solr Cell to output the original HTML document which was fed to it.?

2013-02-18 Thread Jack Krupansky
Look at HTMLStripCharFilter, which accepts HTML as its source text, which preserves all the HTML tags in the stored value, but then strips off the HTML tags for tokenization into terms. So, you can search for the actual text terms, but the HTML will still be in the returned field value for

Re: How can i instruct the Solr/ Solr Cell to output the original HTML document which was fed to it.?

2013-02-18 Thread Divyanand Tiwari
Thank you for replying sir !!! I have two queries related with this - 1) So in this case which request handler I have to use because 'ExtractingRequestHandler' by default strips the html content and the default handler 'UpdateRequestHandler' does not accepts the HTML contrents. 2) How can I

Re: How can i instruct the Solr/ Solr Cell to output the original HTML document which was fed to it.?

2013-02-18 Thread Jack Krupansky
Use the standard update handler and pass the entire HTML page as literal text in a Solr XML document for the field that has the HTML strip filter, but be sure to escape the HTML (angle brackets, ampersands, etc.) syntax. You'll have to process meta information yourself. -- Jack Krupansky