Spark streaming job not able to launch more number of executors

2020-09-18 Thread Vibhor Banga ( Engineering - VS)
Hi all, We have a spark streaming job which reads from two kafka topics with 10 partitions each. And we are running the streaming job with 3 concurrent microbatches. (So total 20 partitions and 3 concurrency) We have following question: In our processing DAG, we do a rdd.persist() at one stage,

Writing data to HBase using Spark

2014-06-10 Thread Vibhor Banga
Hi, I am reading data from a HBase table to RDD and then using foreach on that RDD I am doing some processing on every Result of HBase table. After this processing I want to store the processed data back to another HBase table. How can I do that ? If I use standard Hadoop and HBase classes to

Re: Using Spark on Data size larger than Memory size

2014-06-07 Thread Vibhor Banga
wrote: Clearly thr will be impact on performance but frankly depends on what you are trying to achieve with the dataset. Mayur Rustagi Ph: +1 (760) 203 3257 http://www.sigmoidanalytics.com @mayur_rustagi https://twitter.com/mayur_rustagi On Sat, May 31, 2014 at 11:45 AM, Vibhor Banga

Serialization problem in Spark

2014-06-05 Thread Vibhor Banga
Hi, I am trying to do something like following in Spark: JavaPairRDDbyte[], MyObject eventRDD = hBaseRDD.map(new PairFunctionTuple2ImmutableBytesWritable, Result, byte[], MyObject () { @Override public Tuple2byte[], MyObject call(Tuple2ImmutableBytesWritable, Result

Re: Serialization problem in Spark

2014-06-05 Thread Vibhor Banga
Any inputs on this will be helpful. Thanks, -Vibhor On Thu, Jun 5, 2014 at 3:41 PM, Vibhor Banga vibhorba...@gmail.com wrote: Hi, I am trying to do something like following in Spark: JavaPairRDDbyte[], MyObject eventRDD = hBaseRDD.map(new PairFunctionTuple2ImmutableBytesWritable, Result

Re: Using Spark on Data size larger than Memory size

2014-05-31 Thread Vibhor Banga
Some inputs will be really helpful. Thanks, -Vibhor On Fri, May 30, 2014 at 7:51 PM, Vibhor Banga vibhorba...@gmail.com wrote: Hi all, I am planning to use spark with HBase, where I generate RDD by reading data from HBase Table. I want to know that in the case when the size of HBase

Using Spark on Data size larger than Memory size

2014-05-30 Thread Vibhor Banga
Hi all, I am planning to use spark with HBase, where I generate RDD by reading data from HBase Table. I want to know that in the case when the size of HBase Table grows larger than the size of RAM available in the cluster, will the application fail, or will there be an impact in performance ?

Re: Problem using Spark with Hbase

2014-05-30 Thread Vibhor Banga
, -Vibhor On Wed, May 28, 2014 at 11:34 PM, Mayur Rustagi mayur.rust...@gmail.com wrote: Try this.. Mayur Rustagi Ph: +1 (760) 203 3257 http://www.sigmoidanalytics.com @mayur_rustagi https://twitter.com/mayur_rustagi On Wed, May 28, 2014 at 7:40 PM, Vibhor Banga vibhorba...@gmail.com

Problem using Spark with Hbase

2014-05-28 Thread Vibhor Banga
Hi all, I am facing issues while using spark with HBase. I am getting NullPointerException at org.apache.hadoop.hbase.TableName.valueOf (TableName.java:288) Can someone please help to resolve this issue. What am I missing ? I am using following snippet of code - Configuration config =

Re: Problem using Spark with Hbase

2014-05-28 Thread Vibhor Banga
Any one who has used spark this way or has faced similar issue, please help. Thanks, -Vibhor On Wed, May 28, 2014 at 6:03 PM, Vibhor Banga vibhorba...@gmail.com wrote: Hi all, I am facing issues while using spark with HBase. I am getting NullPointerException