I am playing with some data using (stand alone) spark-shell (Spark version 
1.6.0) by executing `spark-shell`. The flow is simple; a bit like cp - 
basically moving local 100k files (the max size is 190k) to S3. Memory is 
configured as below


export SPARK_DRIVER_MEMORY=8192M
export SPARK_WORKER_CORES=1
export SPARK_WORKER_MEMORY=8192M
export SPARK_EXECUTOR_CORES=4
export SPARK_EXECUTOR_MEMORY=2048M


But total time spent on moving those files to S3 took roughly 30 mins. The 
resident memory I found is roughly 3.820g (checking with top -p <pid>). This 
seems to me there are still rooms to speed it up, though this is only for 
testing purpose. So I would like to know if any other parameters I can change 
to improve spark-shell's performance? Is the memory setup above correct? 


Thanks. 

Reply via email to