Re: Spark streaming with Kafka

2020-11-03 Thread Kevin Pis
Hi, this is my Word Count demo. https://github.com/kevincmchen/wordcount MohitAbbi 于2020年11月4日周三 上午3:32写道: > Hi, > > Can you please share the correct versions of JAR files which you used to > resolve the issue. I'm also facing the same issue. > > Thanks > > > > > -- > Sent from: http://apache

Re: Spark streaming with Kafka

2020-11-03 Thread MohitAbbi
Hi, Can you please share the correct versions of JAR files which you used to resolve the issue. I'm also facing the same issue. Thanks -- Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/ - To unsubscribe e-mai

Re: Spark Streaming with Kafka and Python

2020-08-12 Thread Sean Owen
What supports Python in (Kafka?) 0.8? I don't think Spark ever had a specific Python-Kafka integration. But you have always been able to use it to read DataFrames as in Structured Streaming. Kafka 0.8 support is deprecated (gone in 3.0) but 0.10 means 0.10+ - works with the latest 2.x. What is the

Re: Spark Streaming with Kafka and Python

2020-08-12 Thread German Schiavon
Hey, Maybe I'm missing some restriction with EMR, but have you tried to use Structured Streaming instead of Spark Streaming? https://spark.apache.org/docs/2.4.5/structured-streaming-kafka-integration.html Regards On Wed, 12 Aug 2020 at 14:12, Hamish Whittal wrote: > Hi folks, > > Thought I wo

Re: Spark streaming with Kafka

2020-07-02 Thread dwgw
Hi I am able to correct the issue. The issue was due to wrong version of JAR file I have used. I have removed the these JAR files and copied correct version of JAR files and the error has gone away. Regards -- Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/ ---

Re: Spark streaming with Kafka

2020-07-02 Thread Jungtaek Lim
I can't reproduce. Could you please make sure you're running spark-shell with official spark 3.0.0 distribution? Please try out changing the directory and using relative path like "./spark-shell". On Thu, Jul 2, 2020 at 9:59 PM dwgw wrote: > Hi > I am trying to stream kafka topic from spark shel

Re: spark streaming with kafka source, how many concurrent jobs?

2017-03-21 Thread shyla deshpande
Thanks TD. On Tue, Mar 14, 2017 at 4:37 PM, Tathagata Das wrote: > This setting allows multiple spark jobs generated through multiple > foreachRDD to run concurrently, even if they are across batches. So output > op2 from batch X, can run concurrently with op1 of batch X+1 > This is not safe bec

Re: spark streaming with kafka source, how many concurrent jobs?

2017-03-14 Thread Tathagata Das
This setting allows multiple spark jobs generated through multiple foreachRDD to run concurrently, even if they are across batches. So output op2 from batch X, can run concurrently with op1 of batch X+1 This is not safe because it breaks the checkpointing logic in subtle ways. Note that this was ne

Re: spark streaming with kafka source, how many concurrent jobs?

2017-03-14 Thread shyla deshpande
Thanks TD for the response. Can you please provide more explanation. I am having multiple streams in the spark streaming application (Spark 2.0.2 using DStreams). I know many people using this setting. So your explanation will help a lot of people. Thanks On Fri, Mar 10, 2017 at 6:24 PM, Tathag

Re: spark streaming with kafka source, how many concurrent jobs?

2017-03-10 Thread Tathagata Das
That config I not safe. Please do not use it. On Mar 10, 2017 10:03 AM, "shyla deshpande" wrote: > I have a spark streaming application which processes 3 kafka streams and > has 5 output operations. > > Not sure what should be the setting for spark.streaming.concurrentJobs. > > 1. If the concurr

Re: Spark Streaming with Kafka

2016-12-12 Thread Anton Okolnychyi
thanks for all your replies, now I have a complete picture. 2016-12-12 16:49 GMT+01:00 Cody Koeninger : > http://spark.apache.org/docs/latest/streaming-kafka-0-10- > integration.html#creating-a-direct-stream > > Use a separate group id for each stream, like the docs say. > > If you're doing mul

Re: Spark Streaming with Kafka

2016-12-12 Thread Cody Koeninger
http://spark.apache.org/docs/latest/streaming-kafka-0-10-integration.html#creating-a-direct-stream Use a separate group id for each stream, like the docs say. If you're doing multiple output operations, and aren't caching, spark is going to read from kafka again each time, and if some of those re

Re: Spark Streaming with Kafka

2016-12-11 Thread Oleksii Dukhno
Hi Anton, What is the command you run your spark app with? Why not working with data instead of stream on your second stage operation? Can you provide logs with the issue? ConcurrentModificationException is not a spark issue, it means that you use the same Kafka consumer instance from more than o

Re: Spark Streaming with Kafka

2016-12-11 Thread Anton Okolnychyi
sorry, I forgot to mention that I was using Spark 2.0.2, Kafka 0.10, and nothing custom. I will try restate the initial question. Let's consider an example. 1. I create a stream and subscribe to a certain topic. val stream = KafkaUtils.createDirectStream(...) 2. I extract the actual data f

Re: Spark Streaming with Kafka

2016-12-11 Thread Timur Shenkao
Hi, Usual general questions are: -- what is your Spark version? -- what is your Kafka version? -- do you use "standard" Kafka consumer or try to implement something custom (your own multi-threaded consumer)? The freshest docs https://spark.apache.org/docs/latest/streaming-kafka-0-10-integration.ht

Re: Spark Streaming with Kafka

2016-05-24 Thread Rasika Pohankar
Hi firemonk9, Sorry, its been too long but I just saw this. I hope you were able to resolve it. FWIW, we were able to solve this with the help of the Low Level Kafka Consumer, instead of the inbuilt Kafka consumer in Spark, from here: https://github.com/dibbhatt/kafka-spark-consumer/. Regards

Re: Spark Streaming with Kafka Use Case

2016-02-18 Thread Cody Koeninger
If by smaller block interval you mean the value in seconds passed to the streaming context constructor, no. You'll still get everything from the starting offset until now in the first batch. On Thu, Feb 18, 2016 at 10:02 AM, praveen S wrote: > Sorry.. Rephrasing : > Can this issue be resolved b

Re: Spark Streaming with Kafka Use Case

2016-02-18 Thread praveen S
Sorry.. Rephrasing : Can this issue be resolved by having a smaller block interval? Regards, Praveen On 18 Feb 2016 21:30, "praveen S" wrote: > Can having a smaller block interval only resolve this? > > Regards, > Praveen > On 18 Feb 2016 21:13, "Cody Koeninger" wrote: > >> Backpressure won't h

Re: Spark Streaming with Kafka Use Case

2016-02-18 Thread praveen S
Can having a smaller block interval only resolve this? Regards, Praveen On 18 Feb 2016 21:13, "Cody Koeninger" wrote: > Backpressure won't help you with the first batch, you'd need > spark.streaming.kafka.maxRatePerPartition > for that > > On Thu, Feb 18, 2016 at 9:40 AM, praveen S wrote: > >>

Re: Spark Streaming with Kafka Use Case

2016-02-18 Thread Cody Koeninger
Backpressure won't help you with the first batch, you'd need spark.streaming.kafka.maxRatePerPartition for that On Thu, Feb 18, 2016 at 9:40 AM, praveen S wrote: > Have a look at > > spark.streaming.backpressure.enabled > Property > > Regards, > Praveen > On 18 Feb 2016 00:13, "Abhishek Anand"

Re: Spark Streaming with Kafka Use Case

2016-02-18 Thread praveen S
Have a look at spark.streaming.backpressure.enabled Property Regards, Praveen On 18 Feb 2016 00:13, "Abhishek Anand" wrote: > I have a spark streaming application running in production. I am trying to > find a solution for a particular use case when my application has a > downtime of say 5 hour

Re: Spark Streaming with Kafka DirectStream

2016-02-17 Thread Cody Koeninger
You can print whatever you want wherever you want, it's just a question of whether it's going to show up on the driver or the various executors logs On Wed, Feb 17, 2016 at 5:50 AM, Cyril Scetbon wrote: > I don't think we can print an integer value in a spark streaming process > As opposed to a

Re: Spark Streaming with Kafka DirectStream

2016-02-17 Thread Cyril Scetbon
I don't think we can print an integer value in a spark streaming process As opposed to a spark job. I think I can print the content of an rdd but not debug messages. Am I wrong ? Cyril Scetbon > On Feb 17, 2016, at 12:51 AM, ayan guha wrote: > > Hi > > You can always use RDD properties, whi

Re: Spark Streaming with Kafka Use Case

2016-02-17 Thread Cody Koeninger
Just use a kafka rdd in a batch job or two, then start your streaming job. On Wed, Feb 17, 2016 at 12:57 AM, Abhishek Anand wrote: > I have a spark streaming application running in production. I am trying to > find a solution for a particular use case when my application has a > downtime of say

Re: Spark Streaming with Kafka DirectStream

2016-02-16 Thread ayan guha
Hi You can always use RDD properties, which already has partition information. https://databricks.gitbooks.io/databricks-spark-knowledge-base/content/performance_optimization/how_many_partitions_does_an_rdd_have.html On Wed, Feb 17, 2016 at 2:36 PM, Cyril Scetbon wrote: > Your understanding i

Re: Spark Streaming with Kafka DirectStream

2016-02-16 Thread Cyril Scetbon
Your understanding is the right one (having re-read the documentation). Still wondering how I can verify that 5 partitions have been created. My job is reading from a topic in Kafka that has 5 partitions and sends the data to E/S. I can see that when there is one task to read from Kafka there ar

Re: Spark Streaming with Kafka DirectStream

2016-02-16 Thread ayan guha
I have a slightly different understanding. Direct stream generates 1 RDD per batch, however, number of partitions in that RDD = number of partitions in kafka topic. On Wed, Feb 17, 2016 at 12:18 PM, Cyril Scetbon wrote: > Hi guys, > > I'm making some tests with Spark and Kafka using a Python sc

Re: Spark Streaming with Kafka: Dealing with 'slow' partitions

2016-02-12 Thread p pathiyil
Thanks Sebastian. I was indeed trying out FAIR scheduling with a high value for concurrentJobs today. It does improve the latency seen by the non-hot partitions, even if it does not provide complete isolation. So it might be an acceptable middle ground. On 12 Feb 2016 12:18, "Sebastian Piu" wrot

Re: Spark Streaming with Kafka: Dealing with 'slow' partitions

2016-02-11 Thread Sebastian Piu
Have you tried using fair scheduler and queues On 12 Feb 2016 4:24 a.m., "p pathiyil" wrote: > With this setting, I can see that the next job is being executed before > the previous one is finished. However, the processing of the 'hot' > partition eventually hogs all the concurrent jobs. If there

Re: Spark Streaming with Kafka: Dealing with 'slow' partitions

2016-02-11 Thread p pathiyil
With this setting, I can see that the next job is being executed before the previous one is finished. However, the processing of the 'hot' partition eventually hogs all the concurrent jobs. If there was a way to restrict jobs to be one per partition, then this setting would provide the per-partitio

RE: Spark Streaming with Kafka: Dealing with 'slow' partitions

2016-02-11 Thread Diwakar Dhanuskodi
Hi, Did  you try  another  implementation  of  DirectStream where you  give  only   topic. It would  read all  topic partitions in parallel  under a batch   interval . You  need  not create union explicitly.  Sent from Samsung Mobile. Original message From: p pathiyil Date:11

Re: Spark Streaming with Kafka: Dealing with 'slow' partitions

2016-02-11 Thread Cody Koeninger
The real way to fix this is by changing partitioning, so you don't have a hot partition. It would be better to do this at the time you're producing messages, but you can also do it with a shuffle / repartition during consuming. There is a setting to allow another batch to start in parallel, but t

Re: Spark Streaming with Kafka - batch DStreams in memory

2016-02-02 Thread Cody Koeninger
It's possible you could (ab)use updateStateByKey or mapWithState for this. But honestly it's probably a lot more straightforward to just choose a reasonable batch size that gets you a reasonable file size for most of your keys, then use filecrush or something similar to deal with the hdfs small fi

Re: spark streaming with kafka reset offset

2015-07-14 Thread Tathagata Das
Of course, exactly once receiving is not same as exactly once. In case of direct kafka stream, the data may actually be pulled multiple time. But even if the data of a batch is pulled twice because of some failure, the final result (that is, transformed data accessed through foreachRDD) will always

Re: spark streaming with kafka reset offset

2015-07-14 Thread Chen Song
Thanks TD. As for 1), if timing is not guaranteed, how does exactly once semantics supported? It feels like exactly once receiving is not necessarily exactly once processing. Chen On Tue, Jul 14, 2015 at 10:16 PM, Tathagata Das wrote: > > > On Tue, Jul 14, 2015 at 6:42 PM, Chen Song wrote: >

Re: spark streaming with kafka reset offset

2015-07-14 Thread Tathagata Das
On Tue, Jul 14, 2015 at 6:42 PM, Chen Song wrote: > Thanks TD and Cody. I saw that. > > 1. By doing that (foreachRDD), does KafkaDStream checkpoints its offsets > on HDFS at the end of each batch interval? > The timing is not guaranteed. > 2. In the code, if I first apply transformations and a

Re: spark streaming with kafka reset offset

2015-07-14 Thread Chen Song
Thanks TD and Cody. I saw that. 1. By doing that (foreachRDD), does KafkaDStream checkpoints its offsets on HDFS at the end of each batch interval? 2. In the code, if I first apply transformations and actions on the directKafkaStream and then use foreachRDD on the original KafkaDStream to commit o

Re: spark streaming with kafka reset offset

2015-07-14 Thread Tathagata Das
Relevant documentation - https://spark.apache.org/docs/latest/streaming-kafka-integration.html, towards the end. directKafkaStream.foreachRDD { rdd => val offsetRanges = rdd.asInstanceOf[HasOffsetRanges] // offsetRanges.length = # of Kafka partitions being consumed ... } On Tue,

Re: spark streaming with kafka reset offset

2015-07-14 Thread Cody Koeninger
You have access to the offset ranges for a given rdd in the stream by typecasting to HasOffsetRanges. You can then store the offsets wherever you need to. On Tue, Jul 14, 2015 at 5:00 PM, Chen Song wrote: > A follow up question. > > When using createDirectStream approach, the offsets are checkp

Re: spark streaming with kafka reset offset

2015-07-14 Thread Chen Song
A follow up question. When using createDirectStream approach, the offsets are checkpointed to HDFS and it is understandable by Spark Streaming job. Is there a way to expose the offsets via a REST api to end users. Or alternatively, is there a way to have offsets committed to Kafka Offset Manager s

Re: spark streaming with kafka reset offset

2015-06-30 Thread Cody Koeninger
You can't use different versions of spark in your application vs your cluster. For the direct stream, it's not 60 partitions per executor, it's 300 partitions, and executors work on them as they are scheduled. Yes, if you have no messages you will get an empty partition. It's up to you whether i

Re: spark streaming with kafka reset offset

2015-06-30 Thread Shushant Arora
Is this 3 is no of parallel consumer threads per receiver , means in total we have 2*3=6 consumer in same consumer group consuming from all 300 partitions. 3 is just parallelism on same receiver and recommendation is to use 1 per receiver since consuming from kafka is not cpu bound rather NIC(netwo

Re: spark streaming with kafka reset offset

2015-06-29 Thread Shushant Arora
1. Here you are basically creating 2 receivers and asking each of them to consume 3 kafka partitions each. - In 1.2 we have high level consumers so how can we restrict no of kafka partitions to consume from? Say I have 300 kafka partitions in kafka topic and as in above I gave 2 receivers and 3 ka

Re: spark streaming with kafka reset offset

2015-06-29 Thread ayan guha
Hi Let me take ashot at your questions. (I am sure people like Cody and TD will correct if I am wrong) 0. This is exact copy from the similar question in mail thread from Akhil D: Since you set local[4] you will have 4 threads for your computation, and since you are having 2 receivers, you are le

Re: spark streaming with kafka reset offset

2015-06-29 Thread Cody Koeninger
3. You need to use your own method, because you need to set up your job. Read the checkpoint documentation. 4. Yes, if you want to checkpoint, you need to specify a url to store the checkpoint at (s3 or hdfs). Yes, for the direct stream checkpoint it's just offsets, not all the messages. On Sun

Re: spark streaming with kafka reset offset

2015-06-28 Thread Shushant Arora
Few doubts : In 1.2 streaming when I use union of streams , my streaming application getting hanged sometimes and nothing gets printed on driver. [Stage 2:> (0 + 2) / 2] Whats is 0+2/2 here signifies. 1.Does no of streams in topicsMap.put("testSparkPartitio

Re: spark streaming with kafka reset offset

2015-06-27 Thread Dibyendu Bhattacharya
Hi, There is another option to try for Receiver Based Low Level Kafka Consumer which is part of Spark-Packages ( http://spark-packages.org/package/dibbhatt/kafka-spark-consumer) . This can be used with WAL as well for end to end zero data loss. This is also Reliable Receiver and Commit offset to

Re: spark streaming with kafka reset offset

2015-06-27 Thread Tathagata Das
In the receiver based approach, If the receiver crashes for any reason (receiver crashed or executor crashed) the receiver should get restarted on another executor and should start reading data from the offset present in the zookeeper. There is some chance of data loss which can alleviated using Wr

Re: spark streaming with kafka reset offset

2015-06-26 Thread Cody Koeninger
Read the spark streaming guide ad the kafka integration guide for a better understanding of how the receiver based stream works. Capacity planning is specific to your environment and what the job is actually doing, youll need to determine it empirically. On Friday, June 26, 2015, Shushant Arora

Re: spark streaming with kafka reset offset

2015-06-26 Thread Shushant Arora
In 1.2 how to handle offset management after stream application starts in each job . I should commit offset after job completion manually? And what is recommended no of consumer threads. Say I have 300 partitions in kafka cluster . Load is ~ 1 million events per second.Each event is of ~500bytes.

Re: spark streaming with kafka reset offset

2015-06-26 Thread Cody Koeninger
The receiver-based kafka createStream in spark 1.2 uses zookeeper to store offsets. If you want finer-grained control over offsets, you can update the values in zookeeper yourself before starting the job. createDirectStream in spark 1.3 is still marked as experimental, and subject to change. Tha

Re: spark streaming with kafka jar missing

2015-06-23 Thread Shushant Arora
Thanks a lot. It worked after keeping all versions to same.1.2.0 On Wed, Jun 24, 2015 at 2:24 AM, Tathagata Das wrote: > Why are you mixing spark versions between streaming and core?? > Your core is 1.2.0 and streaming is 1.4.0. > > On Tue, Jun 23, 2015 at 1:32 PM, Shushant Arora > wrote: > >>

Re: spark streaming with kafka jar missing

2015-06-23 Thread Tathagata Das
Why are you mixing spark versions between streaming and core?? Your core is 1.2.0 and streaming is 1.4.0. On Tue, Jun 23, 2015 at 1:32 PM, Shushant Arora wrote: > It throws exception for WriteAheadLogUtils after excluding core and > streaming jar. > > Exception in thread "main" java.lang.NoClass

Re: spark streaming with kafka jar missing

2015-06-23 Thread Shushant Arora
It throws exception for WriteAheadLogUtils after excluding core and streaming jar. Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/spark/streaming/util/WriteAheadLogUtils$ at org.apache.spark.streaming.kafka.KafkaUtils$.createStream(KafkaUtils.scala:84) at org

Re: spark streaming with kafka jar missing

2015-06-23 Thread Tathagata Das
You must not include spark-core and spark-streaming in the assembly. They are already present in the installation and the presence of multiple versions of spark may throw off the classloaders in weird ways. So make the assembly marking the those dependencies as scope=provided. On Tue, Jun 23, 20

Re: Spark streaming with kafka

2015-05-29 Thread Akhil Das
Just after receiving the data from kafka, you can do a dstream.count().print() to see spark and kafka is not the problem, after that next step would be to identify where is the problem, you can do the same count and print on each of the dstreams that you are creating (by transforming), and finally,

Re: Re: spark streaming with kafka

2015-04-15 Thread Akhil Das
gt; *From:* Akhil Das > *Date:* 2015-04-15 19:12 > *To:* Shushant Arora > *CC:* user > *Subject:* Re: spark streaming with kafka > Once you start your streaming application to read from Kafka, it will > launch receivers on the executor nodes. And you can see them on the > st

Re: spark streaming with kafka

2015-04-15 Thread Shushant Arora
So receivers will be fixed for every run of streaming interval job. Say I have set stream Duration to be 10 minutes, then after each 10 minute job will be created and same executor nodes say in your case(spark-akhil-slave2.c.neat-axis-616.internal and spark-akhil-slave1.c.neat-axis-616.internal) wi

Re: spark streaming with kafka

2015-04-15 Thread Akhil Das
Once you start your streaming application to read from Kafka, it will launch receivers on the executor nodes. And you can see them on the streaming tab of your driver ui (runs on 4040). [image: Inline image 1] These receivers will be fixed till the end of your pipeline (unless its crashed etc.) Y

Re: Spark streaming with Kafka- couldnt find KafkaUtils

2015-04-07 Thread Felix C
Or you could build an uber jar ( you could google that ) https://eradiating.wordpress.com/2015/02/15/getting-spark-streaming-on-kafka-to-work/ --- Original Message --- From: "Akhil Das" Sent: April 4, 2015 11:52 PM To: "Priya Ch" Cc: user@spark.apache.org, "dev"

Re: Spark streaming with Kafka- couldnt find KafkaUtils

2015-04-04 Thread Akhil Das
How are you submitting the application? Use a standard build tool like maven or sbt to build your project, it will download all the dependency jars, when you submit your application (if you are using spark-submit, then use --jars option to add those jars which are causing classNotFoundException). I

Re: Spark streaming with Kafka, multiple partitions fail, single partition ok

2015-03-31 Thread Ted Yu
Can you show us the output of DStream#print() if you have it ? Thanks On Tue, Mar 31, 2015 at 2:55 AM, Nicolas Phung wrote: > Hello, > > @Akhil Das I'm trying to use the experimental API > https://github.com/apache/spark/blob/master/examples/scala-2.10/src/main/scala/org/apache/spark/examples/s

Re: Spark streaming with Kafka, multiple partitions fail, single partition ok

2015-03-31 Thread Nicolas Phung
Hello, @Akhil Das I'm trying to use the experimental API https://github.com/apache/spark/blob/master/examples/scala-2.10/src/main/scala/org/apache/spark/examples/streaming/DirectKafkaWordCount.scala

Re: Spark streaming with Kafka, multiple partitions fail, single partition ok

2015-03-30 Thread Ted Yu
Nicolas: See if there was occurrence of the following exception in the log: errs => throw new SparkException( s"Couldn't connect to leader for topic ${part.topic} ${part.partition}: " + errs.mkString("\n")), Cheers On Mon, Mar 30, 2015 at 9:40 AM, Cody Koeninge

Re: Spark streaming with Kafka, multiple partitions fail, single partition ok

2015-03-30 Thread Cody Koeninger
This line at org.apache.spark.streaming.kafka.KafkaRDD$KafkaRDDIterator.close( KafkaRDD.scala:158) is the attempt to close the underlying kafka simple consumer. We can add a null pointer check, but the underlying issue of the consumer being null probably indicates a problem earlier. Do you see

Re: Spark streaming with Kafka, multiple partitions fail, single partition ok

2015-03-30 Thread Akhil Das
Did you try this example? https://github.com/apache/spark/blob/master/examples/scala-2.10/src/main/scala/org/apache/spark/examples/streaming/KafkaWordCount.scala I think you need to create a topic set with # partitions to consume. Thanks Best Regards On Mon, Mar 30, 2015 at 9:35 PM, Nicolas Phu

Re: Spark Streaming with Kafka

2015-01-21 Thread Dibyendu Bhattacharya
You can probably try the Low Level Consumer option with Spark 1.2 https://github.com/dibbhatt/kafka-spark-consumer This Consumer can recover from any underlying failure of Spark Platform or Kafka and either retry or restart the receiver. This is being working nicely for us. Regards, Dibyendu O

Re: Spark Streaming with Kafka

2015-01-20 Thread firemonk9
Hi, I am having similar issues. Have you found any resolution ? Thank you -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-Streaming-with-Kafka-tp21222p21276.html Sent from the Apache Spark User List mailing list archive at Nabble.com. ---

RE: Spark Streaming with Kafka

2015-01-19 Thread Shao, Saisai
Hi, Could you please describing your monitored symptom a little more specifically, like is there any exceptions you monitored with Kafka, Zookeeper, what other exceptions from Spark side you monitored? With only this short description can hardly find any clue. Thanks Jerry From: Eduardo Alfai

Re: Spark Streaming with Kafka is failing with Error

2014-11-18 Thread Tobias Pfeiffer
Hi, do you have some logging backend (log4j, logback) on your classpath? This seems a bit like there is no particular implementation of the abstract `log()` method available. Tobias

Re: Spark Streaming with Kafka, building project with 'sbt assembly' is extremely slow

2014-09-08 Thread Matt Narrell
I came across this: https://github.com/xerial/sbt-pack Until i found this, I was simply using the sbt-assembly plugin (sbt clean assembly) mn On Sep 4, 2014, at 2:46 PM, Aris wrote: > Thanks for answering Daniil - > > I have SBT version 0.13.5, is that an old version? Seems pretty up-to-da

Re: Spark Streaming with Kafka, building project with 'sbt assembly' is extremely slow

2014-09-04 Thread Aris
Thanks for answering Daniil - I have SBT version 0.13.5, is that an old version? Seems pretty up-to-date. It turns out I figured out a way around this entire problem: just use 'sbt package', and when using bin/spark-submit, pass it the "--jars" option and GIVE IT ALL THE JARS from the local iv2 c

Re: Spark Streaming with Kafka, building project with 'sbt assembly' is extremely slow

2014-09-02 Thread Daniil Osipov
What version of sbt are you using? There is a bug in early version of 0.13 that causes assembly to be extremely slow - make sure you're using the latest one. On Fri, Aug 29, 2014 at 1:30 PM, Aris wrote: > Hi folks, > > I am trying to use Kafka with Spark Streaming, and it appears I cannot do >

Re: Spark Streaming with Kafka NoClassDefFoundError

2014-07-13 Thread Tathagata Das
Alsom the reason the spark-streaming-kafka is not included in the spark assembly is that we do not want dependencies of external systems like kafka (which itself probably has a complex dependency tree) to cause conflict with the core spark's functionality and stability. TD On Sun, Jul 13, 2014 a

Re: Spark Streaming with Kafka NoClassDefFoundError

2014-07-13 Thread Tathagata Das
In case you still have issues with duplicate files in uber jar, here is a reference sbt file with assembly plugin that deals with duplicates https://github.com/databricks/training/blob/sparkSummit2014/streaming/scala/build.sbt On Fri, Jul 11, 2014 at 10:06 AM, Bill Jay wrote: > You may try to

Re: Spark Streaming with Kafka NoClassDefFoundError

2014-07-11 Thread Bill Jay
You may try to use this one: https://github.com/sbt/sbt-assembly I had an issue of duplicate files in the uber jar file. But I think this library will assemble dependencies into a single jar file. Bill On Fri, Jul 11, 2014 at 1:34 AM, Dilip wrote: > A simple > sbt assembly > is not work

Re: Spark Streaming with Kafka NoClassDefFoundError

2014-07-11 Thread Dilip
A simple sbt assembly is not working. Is there any other way to include particular jars with assembly command? Regards, Dilip On Friday 11 July 2014 12:45 PM, Bill Jay wrote: I have met similar issues. The reason is probably because in Spark assembly, spark-streaming-kafka is not included.

Re: Spark Streaming with Kafka NoClassDefFoundError

2014-07-11 Thread Bill Jay
I have met similar issues. The reason is probably because in Spark assembly, spark-streaming-kafka is not included. Currently, I am using Maven to generate a shaded package with all the dependencies. You may try to use sbt assembly to include the dependencies in your jar file. Bill On Thu, Jul 1

Re: Spark Streaming with Kafka NoClassDefFoundError

2014-07-10 Thread Dilip
Hi Akhil, Can you please guide me through this? Because the code I am running already has this in it: [java] SparkContext sc = new SparkContext(); sc.addJar("/usr/local/spark/external/kafka/target/scala-2.10/spark-streaming-kafka_2.10-1.1.0-SNAPSHOT.jar"); Is there something I am mis

Re: Spark Streaming with Kafka NoClassDefFoundError

2014-07-10 Thread Akhil Das
Easiest fix would be adding the kafka jars to the SparkContext while creating it. Thanks Best Regards On Fri, Jul 11, 2014 at 4:39 AM, Dilip wrote: > Hi, > > I am trying to run a program with spark streaming using Kafka on a stand > alone system. These are my details: > > Spark 1.0.0 hadoop2 >