I am playing with some data using (stand alone) spark-shell (Spark version 1.6.0) by executing `spark-shell`. The flow is simple; a bit like cp - basically moving local 100k files (the max size is 190k) to S3. Memory is configured as below
export SPARK_DRIVER_MEMORY=8192M export SPARK_WORKER_CORES=1 export SPARK_WORKER_MEMORY=8192M export SPARK_EXECUTOR_CORES=4 export SPARK_EXECUTOR_MEMORY=2048M But total time spent on moving those files to S3 took roughly 30 mins. The resident memory I found is roughly 3.820g (checking with top -p <pid>). This seems to me there are still rooms to speed it up, though this is only for testing purpose. So I would like to know if any other parameters I can change to improve spark-shell's performance? Is the memory setup above correct? Thanks.