Hi, Hanson. Perhaps I’m digressing here. If I'm wrong or mistake, please correct me.
SPARK_WORKER_* is the configuration for whole cluster, and it's fine to write those global variable in spark-env.sh. However, SPARK_DRIVER_* and SPARK_EXECUTOR_* is the configuration for application (your code), perhaps it's better to pass the argument to spark-shell directly, like: ```bash spark-shell --driver-memory 8G --executor-cores 4 --executor-memory 2G ``` Tuning the configuration for application is a good start, and passing them to spark-shell directly is easier to test. For more details see: + `spark-shell -h` + http://spark.apache.org/docs/latest/submitting-applications.html + http://spark.apache.org/docs/latest/spark-standalone.html On Mon, Apr 17, 2017 at 6:18 PM, Richard Hanson <rhan...@mailbox.org> wrote: > I am playing with some data using (stand alone) spark-shell (Spark version > 1.6.0) by executing `spark-shell`. The flow is simple; a bit like cp - > basically moving local 100k files (the max size is 190k) to S3. Memory is > configured as below > > > export SPARK_DRIVER_MEMORY=8192M > export SPARK_WORKER_CORES=1 > export SPARK_WORKER_MEMORY=8192M > export SPARK_EXECUTOR_CORES=4 > export SPARK_EXECUTOR_MEMORY=2048M > > > But total time spent on moving those files to S3 took roughly 30 mins. The > resident memory I found is roughly 3.820g (checking with top -p <pid>). > This seems to me there are still rooms to speed it up, though this is only > for testing purpose. So I would like to know if any other parameters I can > change to improve spark-shell's performance? Is the memory setup above > correct? > > > Thanks. >