Re: Spark batch with Druid

2019-02-06 Thread Gian Merlino
Ah, you're right. I misread the original question. In that case, also try checking out: https://github.com/implydata/druid-hadoop-inputformat, an unofficial Druid InputFormat. Spark can use that to read Druid data into an RDD - check the example in the README. It's also unofficial and, currently,

Re: Spark batch with Druid

2019-02-06 Thread Julian Jaffe
I think this question is going the other way (e.g. how to read data into Spark, as opposed to into Druid). For that, the quickest and dirtiest approach is probably to use Spark's json support to parse a Druid response. You may also be able to repurpose some code from

Re: Spark batch with Druid

2019-02-06 Thread Gian Merlino
Hey Rajiv, There's an unofficial Druid/Spark adapter at: https://github.com/metamx/druid-spark-batch. If you want to stick to official things, then the best approach would be to use Spark to write data to HDFS or S3 and then ingest it into Druid using Druid's Hadoop-based or native batch

Spark batch with Druid

2019-02-06 Thread Rajiv Mordani
Is there a best practice for how to load data from druid to use in a spark batch job? I asked this question on the user alias but got no response hence reposting here. * Rajiv