subject:"process local vs node local subtlety question\/issue"

Re: process local vs node local subtlety question/issue

2014-06-13 Thread Nicholas Chammas

On Fri, Jun 13, 2014 at 1:55 PM, Albert Chu wrote: > 1) How is this data process-local? I *just* copied it into HDFS. No > spark worker or executor should have loaded it. > Yeah, I thought that PROCESS_LOCAL meant the data was already in the JVM on the worker node, but I do see the same thing

process local vs node local subtlety question/issue

2014-06-13 Thread Albert Chu

There is probably a subtlety between the ability to run tasks with data process-local and node-local that I think I'm missing. I'm doing a basic test which is the following: 1) Copy a large text file from the local file system into HDFS using hadoop fs -copyFromLocal 2) Run Spark's wordcount exa