Simpler possibly, but not necessarily reliable. If you do everything inside Solr's DIH with Tika under the hood to extract data from Excel, a malformed Excel file could kill Tika and bring down your entire Solr cluster. Far better to do it outside of Solr as this blog describes: https://lucidworks.com/post/indexing-with-solrj/

If you want to see what Tika does to your Excel examples this is quite a neat way to experiment: https://okfnlabs.org/projects/tika-server/

Cheers

Charlie

On 26/07/2019 09:44, Vipul Bahuguna wrote:
Hi Charlie,

Thanks for your suggestion,  but I will have thousands of these files
coming from different sources. It would become very tedious if I have to
first convert them to csv and then run liny by line.

I was hoping if there could be a simpker way to achieve these using DIH
which I thought can be configured to read and ingest MS Excel (xlsx)
files.

I am not too sure of how the configuration file would look like.

Any pointers are welcome. Thanks!

On Fri, 26 Jul, 2019, 1:56 PM Charlie Hull, <char...@flax.co.uk> wrote:

Convert the Excel file to a CSV and then write a teeny script to go
through it line by line and submit to Solr over HTTP? Tika would
probably work but it's a lot of heavy lifting for what seems to me like
a simple problem.

Cheers

Charlie

On 26/07/2019 09:19, Vipul Bahuguna wrote:
Hi Guys - can anyone suggest how to achieve this?
I have understood how to insert json documents. So one alternative that
comes to my mind is that I can convert the rows in my excel to json
format
with the header of my excel file becoming the json keys (corresponding to
the fields I have defined in my managed-schema.xml). And then each cell
in
the excel file will become the value of this field.

However, I am sure there must be a better way and directly ingesting the
excel file to achieve the same. I was trying to reach about DIH and
Apache
Tika, but I am not very sure of how the configuration works.

My sample excel file has 4 columns namely -
1. First Name
2. Last Name
3. Phone
4. Website Link

I want to index these fields into SOLR in a way that all these columns
become my solr schema fields and later I can search based on these
fields.
Any suggestions please.

thanks !

--
Charlie Hull
Flax - Open Source Enterprise Search

tel/fax: +44 (0)8700 118334
mobile:  +44 (0)7767 825828
web: www.flax.co.uk



--
Charlie Hull
Flax - Open Source Enterprise Search

tel/fax: +44 (0)8700 118334
mobile:  +44 (0)7767 825828
web: www.flax.co.uk

Reply via email to