On Fri, Jun 13, 2014 at 1:55 PM, Albert Chu wrote:
> 1) How is this data process-local? I *just* copied it into HDFS. No
> spark worker or executor should have loaded it.
>
Yeah, I thought that PROCESS_LOCAL meant the data was already in the JVM on
the worker node, but I do see the same thing
There is probably a subtlety between the ability to run tasks with data
process-local and node-local that I think I'm missing.
I'm doing a basic test which is the following:
1) Copy a large text file from the local file system into HDFS using
hadoop fs -copyFromLocal
2) Run Spark's wordcount exa