What is the best way of Indexing different formats of documents?

sangeetha.subraman...@gtnexus.com Tue, 07 Apr 2015 03:50:04 -0700

Hi,

I am a newbie to SOLR and basically from database background. We have a 
requirement of indexing files of different formats (x12,edifact, csv,xml).
The files which are inputted can be of any format and we need to do a content 
based search on it.


>From the web I understand we can use TIKA processor to extract the content and 
>store it in SOLR. What I want to know is, is there any better approach for 
>indexing files in SOLR ? Can we index the document through streaming directly 
>from the Application ? If so what is the disadvantage of using it (against DIH 
>which fetches from the database)? Could someone share me some insight on this 
>? ls there any web links which I can refer to get some idea on it ? Please do 
>help.

Thanks
Sangeetha

What is the best way of Indexing different formats of documents?

Reply via email to