In that case you'll have to write an indexing client that (probably)
uses Tika to parse the PDF file, some kind of XML parser to parse the
metadata XML and combine the two into Solr documents that you send to
Solr. Here's a skeletal program with some extra stuff in there for
database connectivity,
Hi Erick,
Thanks for your reply.
Yes, XML files contain metadata about PDF files. I need to search from both
XML and PDF files and to show search results from both sources.
Regards,
On Wed, Oct 26, 2016 at 1:47 AM, Erick Erickson
wrote:
> First you need to define
First you need to define the problem
what do you mean by "combine"? Do the XML files
contain, say, metadata about an associated PDF file?
Or are these entirely orthogonal documents that
you need to index into the same collection?
Best,
Erick
On Tue, Oct 25, 2016 at 4:18 PM,
Hi,
I ma new to Apache Solr. Developing a search project. The source data is
coming from two sources:
1) XML Files
2) PDF Files
I need to combine these two sources for search. Couldn't find example of
combining these two sources. Any help is appreciated.
Regards,