Koeninger [mailto:c...@koeninger.org]
Sent: Monday, March 14, 2016 9:39 PM
To: Mukul Gupta
Cc: user@spark.apache.org
Subject: Re: Kafka + Spark streaming, RDD partitions not processed in parallel
So what's happening here is that print() uses take(). Take() will try to
satisfy the request using on
efore.
Following is the link to repository:
https://github.com/guptamukul/sparktest.git
From: Cody Koeninger
Sent: 11 March 2016 23:04
To: Mukul Gupta
Cc: user@spark.apache.org
Subject: Re: Kafka + Spark streaming, RDD partitions not processed in parallel
w Function, String>() {
@Override
public String call(Tuple2 arg0) throws Exception {
Thread.sleep(7000);
return arg0._2;
}
});
processed.print(90);
try {
jssc.start();
jssc.awaitTermination();
} catch (Exception e) {
} finally {
jssc.close();
}
}
}
_____
Hi All,I was running the following test:*Setup*9 VM runing spark workers with
1 spark executor each.1 VM running kafka and spark master.Spark version is
1.6.0Kafka version is 0.9.0.1Spark is using its own resource manager and is
not running over YARN.*Test*I created a kafka topic with 3 partition.