Spark Streaming from Kafka, deal with initial heavy load.

2017-03-17 Thread sagarcasual .
Hi, we have spark 1.6.1 streaming from Kafka (0.10.1) topic using direct approach. The streaming part works fine but when we initially start the job, we have to deal with really huge Kafka message backlog, millions of messages, and that first batch runs for over 40 hours, and after 12 hours or so

Re: Error while Spark 1.6.1 streaming from Kafka-2.11_0.10.0.1 cluster

2016-09-23 Thread sagarcasual .
t; The change to Spark 2.0 was really straightforward for the one or two > jobs I switched over, for what it's worth. > > On Fri, Sep 23, 2016 at 12:31 AM, sagarcasual . > wrote: > > Also you mentioned about streaming-kafka-0-10 connector, what connector > is > > th

Fwd: Error while Spark 1.6.1 streaming from Kafka-2.11_0.10.0.1 cluster

2016-09-22 Thread sagarcasual .
: '1.6.1' For Spark 2.0 with Kafka 0.10.0.1 do I need to have a different kafka connector dependency? On Thu, Sep 22, 2016 at 2:21 PM, sagarcasual . wrote: > Hi Cody, > Thanks for the response. > One thing I forgot to mention is I am using a Direct Approach (No > receive

Error while Spark 1.6.1 streaming from Kafka-2.11_0.10.0.1 cluster

2016-09-22 Thread sagarcasual .
Hello, I am trying to stream data out of kafka cluster (2.11_0.10.0.1) using Spark 1.6.1 I am receiving following error, and I confirmed that Topic to which I am trying to connect exists with the data . Any idea what could be the case? kafka.common.UnknownTopicOrPartitionException at sun.reflect

Sending extraJavaOptions for Spark 1.6.1 on mesos 0.28.2 in cluster mode

2016-09-19 Thread sagarcasual .
Hello, I have my Spark application running in cluster mode in CDH with extraJavaOptions. However when I am attempting a same application to run with apache mesos, it does not recognize the properties below at all and code returns null that reads them. --conf spark.driver.extraJavaOptions=-Dsome.ur

Spark.1.6.1 on Apache Mesos : Log4j2 could not find a logging implementation

2016-09-19 Thread sagarcasual .
Hello, I am trying to run Spark.1.6.1 on Apache Mesos I have log4j-core and log4j-api 2.6.2 as part of my uber jar still I am getting following error while starting my spark app. ERROR StatusLogger Log4j2 could not find a logging implementation. Please add log4j-core to the classpath. Using SimpleL

NoSuchField Error : INSTANCE specify user defined httpclient jar

2016-09-17 Thread sagarcasual .
Hello, I am using Spark 1.6.1 distribution over Cloudera CDH 5.7.0 cluster. When I am running my fatJar - spark jar and when it is making a call to HttpClient it is getting classic NoSuchField Error : INSTANCE. Which happens usually when httrpclient in classpath is older than anticipated httpclient

Access application-jar name within main method.

2016-09-08 Thread sagarcasual .
Hello, I am running Spark 1.6.1 and would like to access application Jar name within my main() method. somehow I am using following code to get the version name, String sparkJarName = new java.io.File(MySparkProcessor.class.getProtectionDomain() .getCodeSource() .getLocation()

Re: Pausing spark kafka streaming (direct) or exclude/include some partitions on the fly per batch

2016-09-02 Thread sagarcasual .
rite some code. The 0.10 integration makes the > underlying kafka consumer pluggable, so you may be able to wrap a > consumer to do what you need. > > On Fri, Sep 2, 2016 at 12:28 PM, sagarcasual . > wrote: > > Hello, this is for > > Pausing spark kafka streaming (direct)

Pausing spark kafka streaming (direct) or exclude/include some partitions on the fly per batch

2016-09-02 Thread sagarcasual .
Hello, this is for Pausing spark kafka streaming (direct) or exclude/include some partitions on the fly per batch = I have following code that creates a direct stream using Kafka connector for Spark. final JavaInputDStream msgRecords = KafkaU

Maintain kafka offset externally as Spark streaming processes records.

2016-05-24 Thread sagarcasual .
In spark streaming consuming kafka using KafkaUtils.createDirectStream, there are examples of the kafka offset level ranges. However if 1. I would like periodically maintain offset level so that if needed I can reprocess items from a offset. Is there any way I can retrieve offset of a message in rd