Re: Combine Data from PDF + XML

2016-10-26 Thread Erick Erickson
In that case you'll have to write an indexing client that (probably) uses Tika to parse the PDF file, some kind of XML parser to parse the metadata XML and combine the two into Solr documents that you send to Solr. Here's a skeletal program with some extra stuff in there for database connectivity,

Re: Combine Data from PDF + XML

2016-10-26 Thread tesm...@gmail.com
Hi Erick, Thanks for your reply. Yes, XML files contain metadata about PDF files. I need to search from both XML and PDF files and to show search results from both sources. Regards, On Wed, Oct 26, 2016 at 1:47 AM, Erick Erickson wrote: > First you need to define

Re: Combine Data from PDF + XML

2016-10-25 Thread Erick Erickson
First you need to define the problem what do you mean by "combine"? Do the XML files contain, say, metadata about an associated PDF file? Or are these entirely orthogonal documents that you need to index into the same collection? Best, Erick On Tue, Oct 25, 2016 at 4:18 PM,

Combine Data from PDF + XML

2016-10-25 Thread tesm...@gmail.com
Hi, I ma new to Apache Solr. Developing a search project. The source data is coming from two sources: 1) XML Files 2) PDF Files I need to combine these two sources for search. Couldn't find example of combining these two sources. Any help is appreciated. Regards,