On Tue, Feb 21, 2017 at 10:32 PM, John Omernik <[email protected]> wrote:
> I guess, I am just looking for ideas, how would YOU get data from Parquet > files into Elastic Search? I have Drill and Spark at the ready, but want to > be able to handle it as efficiently as possible. Ideally, if we had a well > written ES plugin, I could write a query that inserted into an index and > streamed stuff in... but barring that, what other methods have people used? > My traditional method has been to use Python's version of the ES batch load API. This runs ES pretty hard, but you would need more to saturate a really large ES cluster. Often I export a JSON file using whatever tool (Drill would work) and then use the python on that file. Avoids questions of Python reading obscure stuff. I think that Python is now able to read and write Parquet, but that is pretty new stuff, so I would stay old school there. I don't think that you need a lot of sophistication on the loader.
