Hi Tathagata,
It's a standalone cluster. The submit commands are:
== CLIENT
spark-submit --class com.fake.Test \
--deploy-mode client --master spark://fake.com:7077 \
fake.jar
== CLUSTER
spark-submit --class com.fake.Test \
--deploy-mode cluster --master spark://fake.com:7077 \
s3n://fake.ja
Whats your spark-submit commands in both cases? Is it Spark Standalone or
YARN (both support client and cluster)? Accordingly what is the number of
executors/cores requested?
TD
On Wed, Dec 31, 2014 at 10:36 AM, Enno Shioji wrote:
> Also the job was deployed from the master machine in the clust
Oh sorry that was a edit mistake. The code is essentially:
val msgStream = kafkaStream
.map { case (k, v) => v}
.map(DatatypeConverter.printBase64Binary)
.saveAsTextFile("s3n://some.bucket/path", classOf[LzoCodec])
I.e. there is essentially no original code (I was callin
Also the job was deployed from the master machine in the cluster.
ᐧ
On Wed, Dec 31, 2014 at 6:35 PM, Enno Shioji wrote:
> Oh sorry that was a edit mistake. The code is essentially:
>
> val msgStream = kafkaStream
>.map { case (k, v) => v}
>.map(DatatypeConverter.printBase64B
-dev, +user
A decent guess: Does your 'save' function entail collecting data back
to the driver? and are you running this from a machine that's not in
your Spark cluster? Then in client mode you're shipping data back to a
less-nearby machine, compared to with cluster mode. That could explain
the b