Re: Big performance difference between "client" and "cluster" deployment mode; is this expected?

2014-12-31 Thread Enno Shioji
Hi Tathagata, It's a standalone cluster. The submit commands are: == CLIENT spark-submit --class com.fake.Test \ --deploy-mode client --master spark://fake.com:7077 \ fake.jar == CLUSTER spark-submit --class com.fake.Test \ --deploy-mode cluster --master spark://fake.com:7077 \ s3n://fake.ja

Re: Big performance difference between "client" and "cluster" deployment mode; is this expected?

2014-12-31 Thread Tathagata Das
Whats your spark-submit commands in both cases? Is it Spark Standalone or YARN (both support client and cluster)? Accordingly what is the number of executors/cores requested? TD On Wed, Dec 31, 2014 at 10:36 AM, Enno Shioji wrote: > Also the job was deployed from the master machine in the clust

Re: Big performance difference between "client" and "cluster" deployment mode; is this expected?

2014-12-31 Thread Enno Shioji
Oh sorry that was a edit mistake. The code is essentially: val msgStream = kafkaStream .map { case (k, v) => v} .map(DatatypeConverter.printBase64Binary) .saveAsTextFile("s3n://some.bucket/path", classOf[LzoCodec]) I.e. there is essentially no original code (I was callin

Re: Big performance difference between "client" and "cluster" deployment mode; is this expected?

2014-12-31 Thread Enno Shioji
Also the job was deployed from the master machine in the cluster. ᐧ On Wed, Dec 31, 2014 at 6:35 PM, Enno Shioji wrote: > Oh sorry that was a edit mistake. The code is essentially: > > val msgStream = kafkaStream >.map { case (k, v) => v} >.map(DatatypeConverter.printBase64B

Re: Big performance difference between "client" and "cluster" deployment mode; is this expected?

2014-12-31 Thread Sean Owen
-dev, +user A decent guess: Does your 'save' function entail collecting data back to the driver? and are you running this from a machine that's not in your Spark cluster? Then in client mode you're shipping data back to a less-nearby machine, compared to with cluster mode. That could explain the b