I know there are conversations about an Elasticsearch plugin, however, I had a recent need to take some data that was accessible in Drill (stored as a Parquet table) and move it into an Elasticsearch index. There are about 1 million rows in the source data.
I was learning the technologies here, and so I thought I could use the Rest API in Drill and then push the data into Elasticsearch. I found that in using drill, I couldn't run a single query and then batch them into Elastic Search. To many returned, timeouts etc. So I used LIMIT 1000 and offset. This allowed the ES side of things to work well, but It required quite a few of the same query in Drill wasting time and resources. I guess, I am just looking for ideas, how would YOU get data from Parquet files into Elastic Search? I have Drill and Spark at the ready, but want to be able to handle it as efficiently as possible. Ideally, if we had a well written ES plugin, I could write a query that inserted into an index and streamed stuff in... but barring that, what other methods have people used? Thanks John