Elasticsearch-Hadoop Data Locality

2014-12-31 Thread Elliott Bradshaw
I'm trying to get a spark job running that pulls several million documents from an Elasticsearch cluster for some analytics that cannot be done via aggregations. It was my understanding that es-hadoop maintained data locality when the spark cluster was running alongside the elasticsearch

Re: Elasticsearch-Hadoop Data Locality

2014-12-31 Thread Costin Leau
For the record, what spark and es-hadoop version are you using? For each shard in your index, es-hadoop creates one Spark task which gets informed of the whereabouts of the underlying shard. So in your case, you would end up with 20 tasks/workers, one per shard, streaming data back to the