On Fri, Nov 20, 2009 at 9:13 PM, javaxmlsoapdev vika...@yahoo.com wrote:
did you extend DIH to do this work? can you share code samples. I have
similar requirement where I need tp index database records and each record
has a column with document path so need to create another index for
both index separately) in parallel
with reading some meta data of documents from database as well. I have all
sorts of different document formats to index. I am on solr 1.4.0. Any
pointers would be appreciated.
Thanks,
--
View this message in context:
http://old.nabble.com/How-to-use
ExtractingRequestHandler to extract
the
content of the PDF?
Thanks!
Khai Doan
--
View this message in context:
http://old.nabble.com/How-to-use-DataImportHandler-with-ExtractingRequestHandler--tp25267745p26443544.html
Sent from the Solr - User mailing list archive at Nabble.com.
Hi Khai,
a few weeks ago, I was facing the same problem.
In my case, this workaround helped (assuming, you're using Solr 1.3):
For each row, extract the content from the corresponding pdf file using
a parser library of your choice (I suggest Apache PDFBox or Apache Tika
in case you need to
Hi all,
My name is Khai. I have a table in a relational database. I have
successfully use DataImportHandler to import this data into Apache Solr.
However, one of the column store the location of PDF file. How can I
configure DataImportHandler to use ExtractingRequestHandler to extract the
unfortunately DIH is not yet integrated with ExtractingRequestHandler .
see this https://issues.apache.org/jira/browse/SOLR-1358
On Thu, Sep 3, 2009 at 5:34 AM, Khai Doankhaitd...@gmail.com wrote:
Hi all,
My name is Khai. I have a table in a relational database. I have
successfully use