In that case you'll have to write an indexing client that (probably)
uses Tika to parse the PDF file, some kind of XML parser to parse the
metadata XML and combine the two into Solr documents that you send to
Solr. Here's a skeletal program with some extra stuff in there for
database connectivity, but you should be able to chop that out pretty
easily.

https://lucidworks.com/blog/2012/02/14/indexing-with-solrj/

Best,
Erick


On Wed, Oct 26, 2016 at 1:47 PM, tesm...@gmail.com <tesm...@gmail.com> wrote:
> Hi Erick,
>
> Thanks for your reply.
>
> Yes, XML files contain metadata about PDF files. I need to search from both
> XML and PDF files and to show search results from both sources.
>
>
> Regards,
>
> On Wed, Oct 26, 2016 at 1:47 AM, Erick Erickson <erickerick...@gmail.com>
> wrote:
>
>> First you need to define the problem....
>>
>> what do you mean by "combine"? Do the XML files
>> contain, say, metadata about an associated PDF file?
>>
>> Or are these entirely orthogonal documents that
>> you need to index into the same collection?
>>
>> Best,
>> Erick
>>
>> On Tue, Oct 25, 2016 at 4:18 PM, tesm...@gmail.com <tesm...@gmail.com>
>> wrote:
>> > Hi,
>> >
>> > I ma new to Apache Solr.  Developing a search project. The source data is
>> > coming from two sources:
>> >
>> > 1) XML Files
>> >
>> > 2) PDF Files
>> >
>> >
>> > I need to combine these two sources for search.  Couldn't find example of
>> > combining these two sources. Any help is appreciated.
>> >
>> >
>> > Regards,
>>

Reply via email to