RE: Kafka + Spark streaming, RDD partitions not processed in parallel

2016-03-14 Thread Mukul Gupta
Koeninger [mailto:c...@koeninger.org] Sent: Monday, March 14, 2016 9:39 PM To: Mukul Gupta Cc: user@spark.apache.org Subject: Re: Kafka + Spark streaming, RDD partitions not processed in parallel So what's happening here is that print() uses take(). Take() will try to satisfy the request using on

Re: Kafka + Spark streaming, RDD partitions not processed in parallel

2016-03-13 Thread Mukul Gupta
efore. Following is the link to repository: https://github.com/guptamukul/sparktest.git From: Cody Koeninger Sent: 11 March 2016 23:04 To: Mukul Gupta Cc: user@spark.apache.org Subject: Re: Kafka + Spark streaming, RDD partitions not processed in parallel

Re: Kafka + Spark streaming, RDD partitions not processed in parallel

2016-03-11 Thread Mukul Gupta
w Function, String>() { @Override public String call(Tuple2 arg0) throws Exception { Thread.sleep(7000); return arg0._2; } }); processed.print(90); try { jssc.start(); jssc.awaitTermination(); } catch (Exception e) { } finally { jssc.close(); } } } _____

Kafka + Spark streaming, RDD partitions not processed in parallel

2016-03-10 Thread Mukul Gupta
Hi All,I was running the following test:*Setup*9 VM runing spark workers with 1 spark executor each.1 VM running kafka and spark master.Spark version is 1.6.0Kafka version is 0.9.0.1Spark is using its own resource manager and is not running over YARN.*Test*I created a kafka topic with 3 partition.