Readers for extracting textual info from pd/doc/excel for indexing the actual content

2013-01-25 Thread saisantoshi
ext: http://lucene.472066.n3.nabble.com/Readers-for-extracting-textual-info-from-pd-doc-excel-for-indexing-the-actual-content-tp4036379.html Sent from the Lucene - Java Users mailing list archive at Nabble.com. - To unsubscribe, e-

Re: Readers for extracting textual info from pd/doc/excel for indexing the actual content

2013-01-27 Thread Adrien Grand
Have you tried using the PDFParser [1] and the OfficeParser [2] classes from Tika? This question seems to be more appropriate for the Tika user mailing list [3]? [1] http://tika.apache.org/1.3/api/org/apache/tika/parser/pdf/PDFParser.html#parse(java.io.InputStream, org.xml.sax.ContentHandler, or

Re: Readers for extracting textual info from pd/doc/excel for indexing the actual content

2013-01-27 Thread Jack Krupansky
y Solr itself to see how it works: http://wiki.apache.org/solr/ExtractingRequestHandler -- Jack Krupansky -Original Message- From: Adrien Grand Sent: Sunday, January 27, 2013 12:53 PM To: java-user@lucene.apache.org Subject: Re: Readers for extracting textual info from pd/doc/excel for

Re: Readers for extracting textual info from pd/doc/excel for indexing the actual content

2013-01-27 Thread saisantoshi
framework good enough or is there any other better library. Any issues/experiences in using the tika framework. Thanks, Sai. -- View this message in context: http://lucene.472066.n3.nabble.com/Readers-for-extracting-textual-info-from-pd-doc-excel-for-indexing-the-actual-content-tp4036379p4036557.html

Re: Readers for extracting textual info from pd/doc/excel for indexing the actual content

2013-01-27 Thread Jack Krupansky
it, and Solr is based on Lucene. -- Jack Krupansky -Original Message- From: saisantoshi Sent: Sunday, January 27, 2013 2:09 PM To: java-user@lucene.apache.org Subject: Re: Readers for extracting textual info from pd/doc/excel for indexing the actual content We are not using Solr and

Re: Readers for extracting textual info from pd/doc/excel for indexing the actual content

2013-01-28 Thread VIGNESH S
> > -- > View this message in context: > http://lucene.472066.n3.nabble.com/Readers-for-extracting-textual-info-from-pd-doc-excel-for-indexing-the-actual-content-tp4036379.html > Sent from the Lucene - Java Users mailing list archive at Nabble.com. > > -

Re: Readers for extracting textual info from pd/doc/excel for indexing the actual content

2013-02-05 Thread saisantoshi
-textual-info-from-pd-doc-excel-for-indexing-the-actual-content-tp4036379p4038642.html Sent from the Lucene - Java Users mailing list archive at Nabble.com. - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional