I know there are conversations about an Elasticsearch plugin, however, I
had a recent need to take some data that was accessible in Drill (stored as
a Parquet table) and move it into an Elasticsearch index.  There are about
1 million rows in the source data.

I was learning the technologies here, and so I thought I could use the Rest
API in Drill and then push the data into Elasticsearch.

I found that in using drill, I couldn't run a single query and then batch
them into Elastic Search. To many returned, timeouts etc. So I used LIMIT
1000 and offset.  This allowed the ES side of things to work well, but It
required quite a few of the same query in Drill wasting time and resources.

I guess, I am just looking for ideas, how would YOU get data from Parquet
files into Elastic Search? I have Drill and Spark at the ready, but want to
be able to handle it as efficiently as possible.  Ideally, if we had a well
written ES plugin, I could write a query that inserted into an index and
streamed stuff in... but barring that, what other methods have people used?

Thanks

John

Reply via email to