Re: Spark Streaming and Kafka MultiNode Setup - Data Locality

2015-09-21 Thread Adrian Tanase
We do - using Spark streaming, Kafka, HDFS all collocated on the same nodes. Works great so far. Spark picks up the location information and reads data from the partitions hosted by the local broker, showing up as NODE_LOCAL in the UI. You also need to look at the locality options

Re: Spark Streaming and Kafka MultiNode Setup - Data Locality

2015-09-21 Thread Cody Koeninger
The direct stream already uses the kafka leader for a given partition as the preferred location. I don't run kafka on the same nodes as spark, and I don't know anyone who does, so that situation isn't particularly well tested. On Mon, Sep 21, 2015 at 1:15 PM, Ashish Soni

Re: spark streaming 1.3 kafka topic error

2015-08-31 Thread Cody Koeninger
You can't set it to less than 1 Just set it to max int if that's really what you want to do On Mon, Aug 31, 2015 at 6:00 AM, Shushant Arora wrote: > Say if my cluster takes long time for rebalance for some reason > intermittently . So to handle that Can I have

Re: spark streaming 1.3 kafka topic error

2015-08-31 Thread Shushant Arora
Say if my cluster takes long time for rebalance for some reason intermittently . So to handle that Can I have infinite retries instead of killing the app? What should be the value of retries (-1) will work or something else ? On Thu, Aug 27, 2015 at 6:46 PM, Cody Koeninger

Re: spark streaming 1.3 kafka topic error

2015-08-27 Thread Ahmed Nawar
Dears, I needs to commit DB Transaction for each partition,Not for each row. below didn't work for me. rdd.mapPartitions(partitionOfRecords = { DBConnectionInit() val results = partitionOfRecords.map(..) DBConnection.commit() }) Best regards, Ahmed Atef Nawwar Data Management

Re: spark streaming 1.3 kafka topic error

2015-08-27 Thread Cody Koeninger
Map is lazy. You need an actual action, or nothing will happen. Use foreachPartition, or do an empty foreach after the map. On Thu, Aug 27, 2015 at 8:53 AM, Ahmed Nawar ahmed.na...@gmail.com wrote: Dears, I needs to commit DB Transaction for each partition,Not for each row. below

Re: spark streaming 1.3 kafka buffer size

2015-08-27 Thread Cody Koeninger
: whats the default buffer in spark streaming 1.3 for kafka messages. Say In this run it has to fetch messages from offset 1 to 1. will it fetch all in one go or internally it fetches messages in few messages batch. Is there any setting to configure this no of offsets fetched in one batch?

spark streaming 1.3 kafka buffer size

2015-08-26 Thread Shushant Arora
whats the default buffer in spark streaming 1.3 for kafka messages. Say In this run it has to fetch messages from offset 1 to 1. will it fetch all in one go or internally it fetches messages in few messages batch. Is there any setting to configure this no of offsets fetched in one batch?

Re: spark streaming 1.3 kafka buffer size

2015-08-26 Thread Cody Koeninger
see http://kafka.apache.org/documentation.html#consumerconfigs fetch.message.max.bytes in the kafka params passed to the constructor On Wed, Aug 26, 2015 at 10:39 AM, Shushant Arora shushantaror...@gmail.com wrote: whats the default buffer in spark streaming 1.3 for kafka messages. Say

spark streaming 1.3 kafka topic error

2015-08-26 Thread Shushant Arora
Hi My streaming application gets killed with below error 5/08/26 21:55:20 ERROR kafka.DirectKafkaInputDStream: ArrayBuffer(kafka.common.NotLeaderForPartitionException, kafka.common.NotLeaderForPartitionException, kafka.common.NotLeaderForPartitionException,

Re: spark streaming 1.3 kafka buffer size

2015-08-26 Thread Shushant Arora
...@gmail.com wrote: whats the default buffer in spark streaming 1.3 for kafka messages. Say In this run it has to fetch messages from offset 1 to 1. will it fetch all in one go or internally it fetches messages in few messages batch. Is there any setting to configure this no of offsets

Re: spark streaming 1.3 kafka error

2015-08-22 Thread Shushant Arora
Exception comes when client has so many connections to some another external server also. So I think Exception is coming because of client side issue only- server side there is no issue. Want to understand is executor(simple consumer) not making new connection to kafka broker at start of each

Re: spark streaming 1.3 kafka error

2015-08-22 Thread Shushant Arora
On trying the consumer without external connections or with low number of external conections its working fine - so doubt is how socket got closed - java.io.EOFException: Received -1 when reading from channel, socket has likely been closed. On Sat, Aug 22, 2015 at 7:24 PM, Akhil Das

Re: spark streaming 1.3 kafka error

2015-08-22 Thread Shushant Arora
On trying the consumer without external connections or with low number of external conections its working fine - so doubt is how socket got closed - 15/08/21 08:54:54 ERROR executor.Executor: Exception in task 262.0 in stage 130.0 (TID 16332) java.io.EOFException: Received -1 when reading

Re: spark streaming 1.3 kafka error

2015-08-22 Thread Akhil Das
Can you try some other consumer and see if the issue still exists? On Aug 22, 2015 12:47 AM, Shushant Arora shushantaror...@gmail.com wrote: Exception comes when client has so many connections to some another external server also. So I think Exception is coming because of client side issue

Re: spark streaming 1.3 kafka error

2015-08-22 Thread Dibyendu Bhattacharya
I think you also can give a try to this consumer : http://spark-packages.org/package/dibbhatt/kafka-spark-consumer in your environment. This has been running fine for topic with large number of Kafka partition ( 200 ) like yours without any issue.. no issue with connection as this consumer re-use

Re: spark streaming 1.3 kafka error

2015-08-22 Thread Cody Koeninger
To be perfectly clear, the direct kafka stream will also recover from any failures, because it does the simplest thing possible - fail the task and let spark retry it. If you're consistently having socket closed problems on one task after another, there's probably something else going on in your

Re: spark streaming 1.3 kafka error

2015-08-21 Thread Shushant Arora
it comes at start of each tasks when there is new data inserted in kafka.( data inserted is very few) kafka topic has 300 partitions - data inserted is ~10 MB. Tasks gets failed and it retries which succeed and after certain no of fail tasks it kills the job. On Sat, Aug 22, 2015 at 2:08 AM,

Re: spark streaming 1.3 kafka error

2015-08-21 Thread Cody Koeninger
Sounds like that's happening consistently, not an occasional network problem? Look at the Kafka broker logs Make sure you've configured the correct kafka broker hosts / ports (note that direct stream does not use zookeeper host / port). Make sure that host / port is reachable from your driver

Re: spark streaming 1.3 kafka error

2015-08-21 Thread Akhil Das
That looks like you are choking your kafka machine. Do a top on the kafka machines and see the workload, it may happen that you are spending too much time on disk io etc. On Aug 21, 2015 7:32 AM, Cody Koeninger c...@koeninger.org wrote: Sounds like that's happening consistently, not an

spark streaming 1.3 kafka error

2015-08-21 Thread Shushant Arora
Hi Getting below error in spark streaming 1.3 while consuming from kafka using directkafka stream. Few of tasks are getting failed in each run. What is the reason /solution of this error? 15/08/21 08:54:54 ERROR executor.Executor: Exception in task 262.0 in stage 130.0 (TID 16332)

Re: Spark Streaming: Change Kafka topics on runtime

2015-08-14 Thread Cody Koeninger
There's a long recent thread in this list about stopping apps, subject was stopping spark stream app at 1 second I wouldn't run repeated rdds, no. I'd take a look at subclassing, personally (you'll have to rebuild the streaming kafka project since a lot is private), but if topic changes dont

Re: Spark Streaming: Change Kafka topics on runtime

2015-08-14 Thread Nisrina Luthfiyati
Hi Cody, by start/stopping, do you mean the streaming context or the app entirely? From what I understand once a streaming context has been stopped it cannot be restarted, but I also haven't found a way to stop the app programmatically. The batch duration will probably be around 1-10 seconds. I

Spark Streaming: Change Kafka topics on runtime

2015-08-13 Thread Nisrina Luthfiyati
Hi all, I want to write a Spark Streaming program that listens to Kafka for a list of topics. The list of topics that I want to consume is stored in a DB and might change dynamically. I plan to periodically refresh this list of topics in the Spark Streaming app. My question is is it possible to

Re: Spark Streaming: Change Kafka topics on runtime

2015-08-13 Thread Cody Koeninger
The current kafka stream implementation assumes the set of topics doesn't change during operation. You could either take a crack at writing a subclass that does what you need; stop/start; or if your batch duration isn't too small, you could run it as a series of RDDs (using the existing

Re: spark streaming get kafka individual message's offset and partition no

2015-07-28 Thread Cody Koeninger
to KafkaUtils.createDirectStream to get access to all of the MessageAndMetadata, including partition and offset, on a per-message basis. On Tue, Jul 28, 2015 at 7:48 AM, Shushant Arora shushantaror...@gmail.com wrote: Hi I am processing kafka messages using spark streaming 1.3. I am using

spark streaming get kafka individual message's offset and partition no

2015-07-28 Thread Shushant Arora
Hi I am processing kafka messages using spark streaming 1.3. I am using mapPartitions function to process kafka message. How can I access offset no of individual message getting being processed. JavaPairInputDStreambyte[], byte[] directKafkaStream =KafkaUtils.createDirectStream

Re: spark streaming with kafka reset offset

2015-07-14 Thread Cody Koeninger
we use simple Kafka API that does not use Zookeeper and offsets tracked only by Spark Streaming within its checkpoints. This eliminates inconsistencies between Spark Streaming and Zookeeper/Kafka, and so each record is received by Spark Streaming effectively exactly once despite failures

Re: spark streaming with kafka reset offset

2015-07-14 Thread Chen Song
writes and read is not strictly consistent? So we use simple Kafka API that does not use Zookeeper and offsets tracked only by Spark Streaming within its checkpoints. This eliminates inconsistencies between Spark Streaming and Zookeeper/Kafka, and so each record is received by Spark

Re: spark streaming with kafka reset offset

2015-07-14 Thread Tathagata Das
. This eliminates inconsistencies between Spark Streaming and Zookeeper/Kafka, and so each record is received by Spark Streaming effectively exactly once despite failures. So we have to call context.checkpoint(hdfsdir)? Or is it implicit checkoint location ? Means does hdfs be used for small data(just

Re: spark streaming with kafka reset offset

2015-07-14 Thread Chen Song
not use Zookeeper and offsets tracked only by Spark Streaming within its checkpoints. This eliminates inconsistencies between Spark Streaming and Zookeeper/Kafka, and so each record is received by Spark Streaming effectively exactly once despite failures. So we have to call context.checkpoint

Re: spark streaming with kafka reset offset

2015-07-14 Thread Chen Song
in zookeeper, is it because of zookeeper is not efficient for high writes and read is not strictly consistent? So we use simple Kafka API that does not use Zookeeper and offsets tracked only by Spark Streaming within its checkpoints. This eliminates inconsistencies between Spark Streaming

Re: spark streaming with kafka reset offset

2015-07-14 Thread Tathagata Das
is not efficient for high writes and read is not strictly consistent? So we use simple Kafka API that does not use Zookeeper and offsets tracked only by Spark Streaming within its checkpoints. This eliminates inconsistencies between Spark Streaming and Zookeeper/Kafka, and so each record

Re: spark streaming with kafka reset offset

2015-07-14 Thread Tathagata Das
between Spark Streaming and Zookeeper/Kafka, and so each record is received by Spark Streaming effectively exactly once despite failures. So we have to call context.checkpoint(hdfsdir)? Or is it implicit checkoint location ? Means does hdfs be used for small data(just offset?) On Sat

Re: spark streaming with kafka reset offset

2015-06-30 Thread Shushant Arora
that does not use Zookeeper and offsets tracked only by Spark Streaming within its checkpoints. This eliminates inconsistencies between Spark Streaming and Zookeeper/Kafka, and so each record is received by Spark Streaming effectively exactly once despite failures. So we have to call

Re: spark streaming with kafka reset offset

2015-06-30 Thread Cody Koeninger
in zookeeper, is it because of zookeeper is not efficient for high writes and read is not strictly consistent? So we use simple Kafka API that does not use Zookeeper and offsets tracked only by Spark Streaming within its checkpoints. This eliminates inconsistencies between Spark Streaming

Re: spark streaming with kafka reset offset

2015-06-29 Thread Cody Koeninger
is not efficient for high writes and read is not strictly consistent? So we use simple Kafka API that does not use Zookeeper and offsets tracked only by Spark Streaming within its checkpoints. This eliminates inconsistencies between Spark Streaming and Zookeeper/Kafka, and so each record is received

Re: spark streaming with kafka reset offset

2015-06-29 Thread ayan guha
. This eliminates inconsistencies between Spark Streaming and Zookeeper/Kafka, and so each record is received by Spark Streaming effectively exactly once despite failures. So we have to call context.checkpoint(hdfsdir)? Or is it implicit checkoint location ? Means does hdfs be used for small data(just offset

Re: spark streaming with kafka reset offset

2015-06-29 Thread Shushant Arora
, is it because of zookeeper is not efficient for high writes and read is not strictly consistent? So we use simple Kafka API that does not use Zookeeper and offsets tracked only by Spark Streaming within its checkpoints. This eliminates inconsistencies between Spark Streaming and Zookeeper/Kafka

Re: spark streaming with kafka reset offset

2015-06-28 Thread Shushant Arora
tracked only by Spark Streaming within its checkpoints. This eliminates inconsistencies between Spark Streaming and Zookeeper/Kafka, and so each record is received by Spark Streaming effectively exactly once despite failures. So we have to call context.checkpoint(hdfsdir)? Or is it implicit

Re: spark streaming with kafka reset offset

2015-06-27 Thread Tathagata Das
In the receiver based approach, If the receiver crashes for any reason (receiver crashed or executor crashed) the receiver should get restarted on another executor and should start reading data from the offset present in the zookeeper. There is some chance of data loss which can alleviated using

Re: spark streaming with kafka reset offset

2015-06-27 Thread Dibyendu Bhattacharya
Hi, There is another option to try for Receiver Based Low Level Kafka Consumer which is part of Spark-Packages ( http://spark-packages.org/package/dibbhatt/kafka-spark-consumer) . This can be used with WAL as well for end to end zero data loss. This is also Reliable Receiver and Commit offset to

Re: spark streaming with kafka reset offset

2015-06-26 Thread Cody Koeninger
The receiver-based kafka createStream in spark 1.2 uses zookeeper to store offsets. If you want finer-grained control over offsets, you can update the values in zookeeper yourself before starting the job. createDirectStream in spark 1.3 is still marked as experimental, and subject to change.

Re: spark streaming with kafka reset offset

2015-06-26 Thread Cody Koeninger
Read the spark streaming guide ad the kafka integration guide for a better understanding of how the receiver based stream works. Capacity planning is specific to your environment and what the job is actually doing, youll need to determine it empirically. On Friday, June 26, 2015, Shushant Arora

Re: spark streaming with kafka reset offset

2015-06-26 Thread Shushant Arora
In 1.2 how to handle offset management after stream application starts in each job . I should commit offset after job completion manually? And what is recommended no of consumer threads. Say I have 300 partitions in kafka cluster . Load is ~ 1 million events per second.Each event is of ~500bytes.

spark streaming with kafka reset offset

2015-06-26 Thread Shushant Arora
I am using spark streaming 1.2. If processing executors get crashed will receiver rest the offset back to last processed offset? If receiver itself got crashed is there a way to reset the offset without restarting streaming application other than smallest or largest. Is spark streaming 1.3

Re: spark streaming with kafka jar missing

2015-06-23 Thread Tathagata Das
, 2015 at 11:56 AM, Shushant Arora shushantaror...@gmail.com wrote: hi While using spark streaming (1.2) with kafka . I am getting below error and receivers are getting killed but jobs get scheduled at each stream interval. 15/06/23 18:42:35 WARN TaskSetManager: Lost task 0.1 in stage 18.0 (TID

spark streaming with kafka jar missing

2015-06-23 Thread Shushant Arora
hi While using spark streaming (1.2) with kafka . I am getting below error and receivers are getting killed but jobs get scheduled at each stream interval. 15/06/23 18:42:35 WARN TaskSetManager: Lost task 0.1 in stage 18.0 (TID 82, ip(XX)): java.io.IOException: Failed to connect to ip

Re: spark streaming with kafka jar missing

2015-06-23 Thread Tathagata Das
...@gmail.com wrote: hi While using spark streaming (1.2) with kafka . I am getting below error and receivers are getting killed but jobs get scheduled at each stream interval. 15/06/23 18:42:35 WARN TaskSetManager: Lost task 0.1 in stage 18.0 (TID 82, ip(XX)): java.io.IOException: Failed

Re: spark streaming with kafka jar missing

2015-06-23 Thread Shushant Arora
, Shushant Arora shushantaror...@gmail.com wrote: hi While using spark streaming (1.2) with kafka . I am getting below error and receivers are getting killed but jobs get scheduled at each stream interval. 15/06/23 18:42:35 WARN TaskSetManager: Lost task 0.1 in stage 18.0 (TID 82, ip(XX

Re: spark streaming with kafka jar missing

2015-06-23 Thread Shushant Arora
make the assembly marking the those dependencies as scope=provided. On Tue, Jun 23, 2015 at 11:56 AM, Shushant Arora shushantaror...@gmail.com wrote: hi While using spark streaming (1.2) with kafka . I am getting below error and receivers are getting killed but jobs get scheduled

Re: How to monitor Spark Streaming from Kafka?

2015-06-02 Thread Ruslan Dautkhanov
/adopted approached to monitoring Spark Streaming from Kafka? I see that there are things like http://quantifind.github.io/KafkaOffsetMonitor, for example. Do they all assume that Receiver-based streaming is used? Then Note that one disadvantage of this approach (Receiverless Approach, #2

How to monitor Spark Streaming from Kafka?

2015-06-01 Thread dgoldenberg
Hi, What are some of the good/adopted approached to monitoring Spark Streaming from Kafka? I see that there are things like http://quantifind.github.io/KafkaOffsetMonitor, for example. Do they all assume that Receiver-based streaming is used? Then Note that one disadvantage of this approach

Re: How to monitor Spark Streaming from Kafka?

2015-06-01 Thread Cody Koeninger
. TD On Mon, Jun 1, 2015 at 2:23 PM, dgoldenberg dgoldenberg...@gmail.com wrote: Hi, What are some of the good/adopted approached to monitoring Spark Streaming from Kafka? I see that there are things like http://quantifind.github.io/KafkaOffsetMonitor, for example. Do they all assume

Re: How to monitor Spark Streaming from Kafka?

2015-06-01 Thread Tathagata Das
are some of the good/adopted approached to monitoring Spark Streaming from Kafka? I see that there are things like http://quantifind.github.io/KafkaOffsetMonitor, for example. Do they all assume that Receiver-based streaming is used? Then Note that one disadvantage of this approach (Receiverless

Re: How to monitor Spark Streaming from Kafka?

2015-06-01 Thread Otis Gospodnetic
/22/monitoring-stream-processing-tools-cassandra-kafka-and-spark/ Otis On Mon, Jun 1, 2015 at 5:23 PM, dgoldenberg dgoldenberg...@gmail.com wrote: Hi, What are some of the good/adopted approached to monitoring Spark Streaming from Kafka? I see that there are things like http

Re: How to monitor Spark Streaming from Kafka?

2015-06-01 Thread Dmitry Goldenberg
are some of the good/adopted approached to monitoring Spark Streaming from Kafka? I see that there are things like http://quantifind.github.io/KafkaOffsetMonitor, for example. Do they all assume that Receiver-based streaming is used? Then Note that one disadvantage of this approach

Spark streaming with kafka

2015-05-28 Thread boci
Hi guys, I using spark streaming with kafka... In local machine (start as java application without using spark-submit) it's work, connect to kafka and do the job (*). I tried to put into spark docker container (hadoop 2.6, spark 1.3.1, try spark submit wil local[5] and yarn-client too ) but I'm

Re: Spark Streaming from Kafka - no receivers and spark.streaming.receiver.maxRate?

2015-05-27 Thread Tathagata Das
as not to overwhelm the Spark consumers? What would be some of the ways to throttle the streamed messages so that the consumers don't run out of memory? -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-Streaming-from-Kafka-no-receivers-and-spark-streaming

Re: Spark Streaming from Kafka - no receivers and spark.streaming.receiver.maxRate?

2015-05-27 Thread Dmitry Goldenberg
the Spark consumers? What would be some of the ways to throttle the streamed messages so that the consumers don't run out of memory? -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-Streaming-from-Kafka-no-receivers-and-spark-streaming-receiver

Re: Spark Streaming from Kafka - no receivers and spark.streaming.receiver.maxRate?

2015-05-27 Thread Ted Yu
-Streaming-from-Kafka-no-receivers-and-spark-streaming-receiver-maxRate-tp23061.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional

Spark Streaming from Kafka - no receivers and spark.streaming.receiver.maxRate?

2015-05-27 Thread dgoldenberg
? -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-Streaming-from-Kafka-no-receivers-and-spark-streaming-receiver-maxRate-tp23061.html Sent from the Apache Spark User List mailing list archive at Nabble.com

Re: Spark Streaming to Kafka

2015-05-20 Thread twinkle sachdeva
integrated with consuming messages from Kafka, so I thought of asking the forum, that is there any implementation available for pushing data to Kafka from Spark Streaming too? Any link(s) will be helpful. Thanks and Regards, Twinkle

Re: Spark Streaming to Kafka

2015-05-19 Thread Saisai Shao
, that is there any implementation available for pushing data to Kafka from Spark Streaming too? Any link(s) will be helpful. Thanks and Regards, Twinkle

Spark Streaming to Kafka

2015-05-19 Thread twinkle sachdeva
Hi, As Spark streaming is being nicely integrated with consuming messages from Kafka, so I thought of asking the forum, that is there any implementation available for pushing data to Kafka from Spark Streaming too? Any link(s) will be helpful. Thanks and Regards, Twinkle

Re: spark streaming with kafka

2015-04-15 Thread Akhil Das
.) You can say, eah receiver will run on a single core. Thanks Best Regards On Wed, Apr 15, 2015 at 3:46 PM, Shushant Arora shushantaror...@gmail.com wrote: Hi I want to understand the flow of spark streaming with kafka. In spark Streaming is the executor nodes at each run of streaming interval

Re: spark streaming with kafka

2015-04-15 Thread Shushant Arora
on a single core. Thanks Best Regards On Wed, Apr 15, 2015 at 3:46 PM, Shushant Arora shushantaror...@gmail.com wrote: Hi I want to understand the flow of spark streaming with kafka. In spark Streaming is the executor nodes at each run of streaming interval same or At each stream interval

spark streaming with kafka

2015-04-15 Thread Shushant Arora
Hi I want to understand the flow of spark streaming with kafka. In spark Streaming is the executor nodes at each run of streaming interval same or At each stream interval cluster manager assigns new executor nodes for processing this batch input. If yes then at each batch interval new executors

Re: Re: spark streaming with kafka

2015-04-15 Thread Akhil Das
...@sigmoidanalytics.com *Date:* 2015-04-15 19:12 *To:* Shushant Arora shushantaror...@gmail.com *CC:* user user@spark.apache.org *Subject:* Re: spark streaming with kafka Once you start your streaming application to read from Kafka, it will launch receivers on the executor nodes. And you can see them

Re: Spark streaming with Kafka- couldnt find KafkaUtils

2015-04-07 Thread Felix C
Or you could build an uber jar ( you could google that ) https://eradiating.wordpress.com/2015/02/15/getting-spark-streaming-on-kafka-to-work/ --- Original Message --- From: Akhil Das ak...@sigmoidanalytics.com Sent: April 4, 2015 11:52 PM To: Priya Ch learnings.chitt...@gmail.com Cc: user

Re: Spark Streaming 1.3 Kafka Direct Streams

2015-04-06 Thread Neelesh
Somewhat agree on subclassing and its issues. It looks like the alternative in spark 1.3.0 to create a custom build. Is there an enhancement filed for this? If not, I'll file one. Thanks! -neelesh On Wed, Apr 1, 2015 at 12:46 PM, Tathagata Das t...@databricks.com wrote: The challenge of

Spark streaming with Kafka- couldnt find KafkaUtils

2015-04-05 Thread Priya Ch
with the following exception: java.lang.ClassNotFoundException: org/apache/spark/streaming/kafka/KafkaUtils. I am using spark-1.2.1 version. when i checked the source files of streaming, the source files related to kafka are missing. Are these not included in spark-1.3.0 and spark-1.2.1 versions

Re: Spark streaming with Kafka- couldnt find KafkaUtils

2015-04-05 Thread Akhil Das
from the IDE, the application runs fine. But when I submit the same to spark cluster in standalone mode, I end up with the following exception: java.lang.ClassNotFoundException: org/apache/spark/streaming/kafka/KafkaUtils. I am using spark-1.2.1 version. when i checked the source files

Spark Streaming 1.3 Kafka Direct Streams

2015-04-01 Thread Neelesh
With receivers, it was pretty obvious which code ran where - each receiver occupied a core and ran on the workers. However, with the new kafka direct input streams, its hard for me to understand where the code that's reading from kafka brokers runs. Does it run on the driver (I hope not), or does

Re: Spark Streaming 1.3 Kafka Direct Streams

2015-04-01 Thread Neelesh
Thanks Cody, that was really helpful. I have a much better understanding now. One last question - Kafka topics are initialized once in the driver, is there an easy way of adding/removing topics on the fly? KafkaRDD#getPartitions() seems to be computed only once, and no way of refreshing them.

Re: Spark Streaming 1.3 Kafka Direct Streams

2015-04-01 Thread Cody Koeninger
https://github.com/koeninger/kafka-exactly-once/blob/master/blogpost.md The kafka consumers run in the executors. On Wed, Apr 1, 2015 at 11:18 AM, Neelesh neele...@gmail.com wrote: With receivers, it was pretty obvious which code ran where - each receiver occupied a core and ran on the

Re: Spark Streaming 1.3 Kafka Direct Streams

2015-04-01 Thread Cody Koeninger
If you want to change topics from batch to batch, you can always just create a KafkaRDD repeatedly. The streaming code as it stands assumes a consistent set of topics though. The implementation is private so you cant subclass it without building your own spark. On Wed, Apr 1, 2015 at 1:09 PM,

Re: Spark Streaming 1.3 Kafka Direct Streams

2015-04-01 Thread Cody Koeninger
As I said in the original ticket, I think the implementation classes should be exposed so that people can subclass and override compute() to suit their needs. Just adding a function from Time = Set[TopicAndPartition] wouldn't be sufficient for some of my current production use cases. compute()

Re: Spark Streaming 1.3 Kafka Direct Streams

2015-04-01 Thread Neelesh
Thanks Cody! On Wed, Apr 1, 2015 at 11:21 AM, Cody Koeninger c...@koeninger.org wrote: If you want to change topics from batch to batch, you can always just create a KafkaRDD repeatedly. The streaming code as it stands assumes a consistent set of topics though. The implementation is

Re: Spark Streaming 1.3 Kafka Direct Streams

2015-04-01 Thread Tathagata Das
We should be able to support that use case in the direct API. It may be as simple as allowing the users to pass on a function that returns the set of topic+partitions to read from. That is function (Time) = Set[TopicAndPartition] This gets called every batch interval before the offsets are

Re: Spark Streaming 1.3 Kafka Direct Streams

2015-04-01 Thread Tathagata Das
The challenge of opening up these internal classes to public (even with Developer API tag) is that it prevents us from making non-trivial changes without breaking API compatibility for all those who had subclassed. Its a tradeoff that is hard to optimize. That's why we favor exposing more optional

Re: Spark streaming with Kafka, multiple partitions fail, single partition ok

2015-03-31 Thread Nicolas Phung
Hello, @Akhil Das I'm trying to use the experimental API https://github.com/apache/spark/blob/master/examples/scala-2.10/src/main/scala/org/apache/spark/examples/streaming/DirectKafkaWordCount.scala

Re: Spark streaming with Kafka, multiple partitions fail, single partition ok

2015-03-31 Thread Ted Yu
Can you show us the output of DStream#print() if you have it ? Thanks On Tue, Mar 31, 2015 at 2:55 AM, Nicolas Phung nicolas.ph...@gmail.com wrote: Hello, @Akhil Das I'm trying to use the experimental API

Re: Spark streaming with Kafka, multiple partitions fail, single partition ok

2015-03-30 Thread Ted Yu
. On Mon, Mar 30, 2015 at 11:05 AM, Nicolas Phung nicolas.ph...@gmail.com wrote: Hello, I'm using spark-streaming-kafka 1.3.0 with the new consumer Approach 2: Direct Approach (No Receivers) ( http://spark.apache.org/docs/latest/streaming-kafka-integration.html). I'm using the following

Re: Spark streaming with Kafka, multiple partitions fail, single partition ok

2015-03-30 Thread Akhil Das
Phung nicolas.ph...@gmail.com wrote: Hello, I'm using spark-streaming-kafka 1.3.0 with the new consumer Approach 2: Direct Approach (No Receivers) ( http://spark.apache.org/docs/latest/streaming-kafka-integration.html). I'm using the following code snippets : // Create direct kafka stream

Re: Spark streaming with Kafka, multiple partitions fail, single partition ok

2015-03-30 Thread Cody Koeninger
AM, Nicolas Phung nicolas.ph...@gmail.com wrote: Hello, I'm using spark-streaming-kafka 1.3.0 with the new consumer Approach 2: Direct Approach (No Receivers) ( http://spark.apache.org/docs/latest/streaming-kafka-integration.html). I'm using the following code snippets : // Create direct

Spark streaming with Kafka, multiple partitions fail, single partition ok

2015-03-30 Thread Nicolas Phung
Hello, I'm using spark-streaming-kafka 1.3.0 with the new consumer Approach 2: Direct Approach (No Receivers) ( http://spark.apache.org/docs/latest/streaming-kafka-integration.html). I'm using the following code snippets : // Create direct kafka stream with brokers and topics val messages

spark streaming from kafka real time + batch processing in java

2015-02-06 Thread Mohit Durgapal
I want to write a spark streaming consumer for kafka in java. I want to process the data in real-time as well as store the data in hdfs in year/month/day/hour/ format. I am not sure how to achieve this. Should I write separate kafka consumers, one for writing data to HDFS and one for spark

Re: spark streaming from kafka real time + batch processing in java

2015-02-06 Thread Andrew Psaltis
), it will easily put the data into the directory structure you are after. On Fri, Feb 6, 2015 at 12:19 AM, Mohit Durgapal durgapalmo...@gmail.com wrote: I want to write a spark streaming consumer for kafka in java. I want to process the data in real-time as well as store the data in hdfs in year/month/day

Re: spark streaming from kafka real time + batch processing in java

2015-02-06 Thread Charles Feduke
(in which case refer to #1). On Fri Feb 06 2015 at 6:16:39 AM Mohit Durgapal durgapalmo...@gmail.com wrote: I want to write a spark streaming consumer for kafka in java. I want to process the data in real-time as well as store the data in hdfs in year/month/day/hour/ format. I am not sure how

spark streaming from kafka real time + batch processing in java

2015-02-05 Thread Mohit Durgapal
I want to write a spark streaming consumer for kafka in java. I want to process the data in real-time as well as store the data in hdfs in year/month/day/hour/ format. I am not sure how to achieve this. Should I write separate kafka consumers, one for writing data to HDFS and one for spark

Re: Integrerate Spark Streaming and Kafka, but get bad symbolic reference error

2015-01-24 Thread mykidong
Maybe, you can use alternative kafka receiver which I wrote: https://github.com/mykidong/spark-kafka-simple-consumer-receiver - Kidong. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Integrerate-Spark-Streaming-and-Kafka-but-get-bad-symbolic-reference

Re: Spark Streaming with Kafka

2015-01-21 Thread Dibyendu Bhattacharya
On Wed, Jan 21, 2015 at 7:46 AM, firemonk9 dhiraj.peech...@gmail.com wrote: Hi, I am having similar issues. Have you found any resolution ? Thank you -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-Streaming-with-Kafka-tp21222p21276.html

Re: Spark Streaming with Kafka

2015-01-20 Thread firemonk9
Hi, I am having similar issues. Have you found any resolution ? Thank you -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-Streaming-with-Kafka-tp21222p21276.html Sent from the Apache Spark User List mailing list archive at Nabble.com

R: Spark Streaming with Kafka

2015-01-18 Thread Eduardo Alfaia
I have the same issue. - Messaggio originale - Da: Rasika Pohankar rasikapohan...@gmail.com Inviato: ‎18/‎01/‎2015 18:48 A: user@spark.apache.org user@spark.apache.org Oggetto: Spark Streaming with Kafka I am using Spark Streaming to process data received through Kafka. The Spark

Spark Streaming with Kafka

2015-01-18 Thread Rasika Pohankar
if the problem was in that version. But after upgrading also, it is happening. Is this a known issue? Can someone please help. Thanking you. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-Streaming-with-Kafka-tp21222.html Sent from the Apache Spark

Re: spark streaming python + kafka

2014-12-22 Thread Davies Liu
There is a WIP pull request[1] working on this, it should be merged into master soon. [1] https://github.com/apache/spark/pull/3715 On Fri, Dec 19, 2014 at 2:15 AM, Oleg Ruchovets oruchov...@gmail.com wrote: Hi , I've just seen that streaming spark supports python from 1.2 version.

spark streaming python + kafka

2014-12-19 Thread Oleg Ruchovets
Hi , I've just seen that streaming spark supports python from 1.2 version. Question, does spark streaming (python version ) supports kafka integration? Thanks Oleg.

Spark Streaming with Kafka is failing with Error

2014-11-18 Thread Sourav Chandra
Hi, While running my spark streaming application built on spark 1.1.0 I am getting below error. *14/11/18 15:35:30 ERROR ReceiverTracker: Deregistered receiver for stream 0: Error starting receiver 0 - java.lang.AbstractMethodError* * at org.apache.spark.Logging$class.log(Logging.scala:52)* * at

Re: Spark Streaming with Kafka is failing with Error

2014-11-18 Thread Tobias Pfeiffer
Hi, do you have some logging backend (log4j, logback) on your classpath? This seems a bit like there is no particular implementation of the abstract `log()` method available. Tobias

<    1   2   3   >