Re: Drill and Elasticsearch

Ted Dunning Thu, 23 Feb 2017 06:20:47 -0800

On Tue, Feb 21, 2017 at 10:32 PM, John Omernik <[email protected]> wrote:


> I guess, I am just looking for ideas, how would YOU get data from Parquet
> files into Elastic Search? I have Drill and Spark at the ready, but want to
> be able to handle it as efficiently as possible.  Ideally, if we had a well
> written ES plugin, I could write a query that inserted into an index and
> streamed stuff in... but barring that, what other methods have people used?
>

My traditional method has been to use Python's version of the ES batch load
API. This runs ES pretty hard, but you would need more to saturate a really
large ES cluster. Often I export a JSON file using whatever tool (Drill
would work) and then use the python on that file. Avoids questions of
Python reading obscure stuff. I think that Python is now able to read and
write Parquet, but that is pretty new stuff, so I would stay old school
there.

I don't think that you need a lot of sophistication on the loader.

Re: Drill and Elasticsearch

Reply via email to