numStreams is 5 in my case.
ListJavaPairDStreambyte[], byte[] kafkaStreams = new
ArrayList(numStreams);
for (int i = 0; i numStreams; i++) {
kafkaStreams.add(KafkaUtils.createStream(sc, byte[].class,
byte[].class, DefaultDecoder.class, DefaultDecoder.class, kafkaConf,
topicMap,
Hello Guys,
I've re partitioned my kafkaStream so that it gets evenly distributed among
the executors and the results are better.
Still from the executors page it seems that only 1 executors all 8 cores
are getting used and other executors are using just 1 core.
Is this the correct
Hi Mukesh,
How are you creating your receivers? Could you post the (relevant) code?
-kr, Gerard.
On Wed, Jan 21, 2015 at 9:42 AM, Mukesh Jha me.mukesh@gmail.com wrote:
Hello Guys,
I've re partitioned my kafkaStream so that it gets evenly distributed
among the executors and the results
Thanks Sandy, It was the issue with the no of cores.
Another issue I was facing is that tasks are not getting distributed evenly
among all executors and are running on the NODE_LOCAL locality level i.e.
all the tasks are running on the same executor where my kafkareceiver(s)
are running even
Thats is kind of expected due to data locality. Though you should see
some tasks running on the executors as the data gets replicated to
other nodes and can therefore run tasks based on locality. You have
two solutions
1. kafkaStream.repartition() to explicitly repartition the received
data
Hi Mukesh,
Based on your spark-submit command, it looks like you're only running with
2 executors on YARN. Also, how many cores does each machine have?
-Sandy
On Mon, Dec 29, 2014 at 4:36 AM, Mukesh Jha me.mukesh@gmail.com wrote:
Hello Experts,
I'm bench-marking Spark on YARN (
Sorry Sandy, The command is just for reference but I can confirm that there
are 4 executors and a driver as shown in the spark UI page.
Each of these machines is a 8 core box with ~15G of ram.
On Mon, Dec 29, 2014 at 11:23 PM, Sandy Ryza sandy.r...@cloudera.com
wrote:
Hi Mukesh,
Based on
And this is with spark version 1.2.0.
On Mon, Dec 29, 2014 at 11:43 PM, Mukesh Jha me.mukesh@gmail.com
wrote:
Sorry Sandy, The command is just for reference but I can confirm that
there are 4 executors and a driver as shown in the spark UI page.
Each of these machines is a 8 core box
Are you setting --num-executors to 8?
On Mon, Dec 29, 2014 at 10:13 AM, Mukesh Jha me.mukesh@gmail.com
wrote:
Sorry Sandy, The command is just for reference but I can confirm that
there are 4 executors and a driver as shown in the spark UI page.
Each of these machines is a 8 core box
*oops, I mean are you setting --executor-cores to 8
On Mon, Dec 29, 2014 at 10:15 AM, Sandy Ryza sandy.r...@cloudera.com
wrote:
Are you setting --num-executors to 8?
On Mon, Dec 29, 2014 at 10:13 AM, Mukesh Jha me.mukesh@gmail.com
wrote:
Sorry Sandy, The command is just for reference
Nope, I am setting 5 executors with 2 cores each. Below is the command
that I'm using to submit in YARN mode. This starts up 5 executor nodes and
a drives as per the spark application master UI.
spark-submit --master yarn-cluster --num-executors 5 --driver-memory 1024m
--executor-memory 1024m
When running in standalone mode, each executor will be able to use all 8
cores on the box. When running on YARN, each executor will only have
access to 2 cores. So the comparison doesn't seem fair, no?
-Sandy
On Mon, Dec 29, 2014 at 10:22 AM, Mukesh Jha me.mukesh@gmail.com
wrote:
Nope, I
Makes sense, I've also tries it in standalone mode where all 3 workers
driver were running on the same 8 core box and the results were similar.
Anyways I will share the results in YARN mode with 8 core yarn containers.
On Mon, Dec 29, 2014 at 11:58 PM, Sandy Ryza sandy.r...@cloudera.com
wrote:
13 matches
Mail list logo