Re: Is that possible to launch spark streaming application on yarn with only one machine?

2016-07-07 Thread Yu Wei
Yes. Thanks for your clarification. The problem I encountered is that in yarn cluster mode, no output for "DStream.print()" in yarn logs. In spark implementation org/apache/spark/streaming/dstream/DStream.scala, the logs related with "Time" was printed out. However, o

Re: Is that possible to launch spark streaming application on yarn with only one machine?

2016-07-06 Thread Rabin Banerjee
- > *From:* Mich Talebzadeh > *Sent:* Wednesday, July 6, 2016 9:46:11 PM > *To:* Yu Wei > *Cc:* Deng Ching-Mallete; user@spark.apache.org > > *Subject:* Re: Is that possible to launch spark streaming application on > yarn with only one machine? > > Deploy-mode clus

Re: Is that possible to launch spark streaming application on yarn with only one machine?

2016-07-06 Thread Yu Wei
Deng Ching-Mallete; user@spark.apache.org Subject: Re: Is that possible to launch spark streaming application on yarn with only one machine? Deploy-mode cluster don't think will work. Try --master yarn --deploy-mode client FYI * Spark Local - Spark runs on the local host. This is the si

Re: How to spin up Kafka using docker and use for Spark Streaming Integration tests

2016-07-06 Thread swetha kasireddy
xposes by > setting ADVERTISED_HOST to the output of "docker-machine ip" (on Mac) > or the address printed by "ip addr show docker0" (Linux). I also > suggest setting > AUTO_CREATE_TOPICS to true. > > You can choose to run your Spark Streaming application under

Re: How to spin up Kafka using docker and use for Spark Streaming Integration tests

2016-07-06 Thread swetha kasireddy
etting > AUTO_CREATE_TOPICS to true. > > You can choose to run your Spark Streaming application under test > (SUT) and your test harness also in Docker containers, or directly on > your host. > > In the former case, it is easiest to set up a Docker Compose file > linking the har

Re: Spark streaming. Strict discretizing by time

2016-07-06 Thread Cody Koeninger
message":950002,"uid":"81e2d447-69f2-4ce6-a13d-50a1a8b569a0"} >> >> > >> >> > That is yes, it works but throughput is much less than without >> >> > limitations >> >> > because of this is an absolute upper limit. And time

Re: Spark streaming. Strict discretizing by time

2016-07-06 Thread rss rss
; >> > limitations > >> > because of this is an absolute upper limit. And time of processing is > >> > half > >> > of available. > >> > > >> > Regarding Spark 2.0 structured streaming I will look it some later. > Now >

Re: Spark streaming. Strict discretizing by time

2016-07-06 Thread Cody Koeninger
ughput and latency of this high >> > level >> > API. My aim now is to compare streaming processors. >> > >> > >> > 2016-07-06 17:41 GMT+02:00 Cody Koeninger : >> >> >> >> The configuration you set is spark.streaming.receiver.maxR

Re: Spark streaming. Strict discretizing by time

2016-07-06 Thread rss rss
streaming.receiver.maxRate. The > >> direct stream is not a receiver. As I said in my first message in > >> this thread, and as the pages at > >> > >> > http://spark.apache.org/docs/latest/streaming-kafka-integration.html#approach-2-direct-approach-no-receivers

Re: Spark streaming. Strict discretizing by time

2016-07-06 Thread Cody Koeninger
level > API. My aim now is to compare streaming processors. > > > 2016-07-06 17:41 GMT+02:00 Cody Koeninger : >> >> The configuration you set is spark.streaming.receiver.maxRate. The >> direct stream is not a receiver. As I said in my first message in >> this th

Re: Spark streaming. Strict discretizing by time

2016-07-06 Thread rss rss
his thread, and as the pages at > > http://spark.apache.org/docs/latest/streaming-kafka-integration.html#approach-2-direct-approach-no-receivers > and http://spark.apache.org/docs/latest/configuration.html#spark-streaming > also say, use maxRatePerPartition for the direct stream. > &

Re: Spark streaming. Strict discretizing by time

2016-07-06 Thread Cody Koeninger
://spark.apache.org/docs/latest/configuration.html#spark-streaming also say, use maxRatePerPartition for the direct stream. Bottom line, if you have more information than your system can process in X amount of time, after X amount of time you can either give the wrong answer, or take longer to process. Flink

Re: Spark streaming. Strict discretizing by time

2016-07-06 Thread rss rss
; >> >> > a > >> >> > typical case when a number of messages in kafka's queue is more > then > >> >> > Spark > >> >> > app's possibilities to process. But I need a strong time limit to > >> >> > prepare

Re: Spark streaming. Strict discretizing by time

2016-07-06 Thread Cody Koeninger
onf() >> >> > .setAppName("Spark") >> >> > .setMaster("local"); >> >> > >> >> > JavaStreamingContext jssc = new >> >> > JavaStreamingContext(sparkConf, >> >> > Millis

Re: Spark streaming. Strict discretizing by time

2016-07-06 Thread rss rss
;))); > >> > Map kafkaParams = new HashMap String>() > >> > { > >> > { > >> > put("metadata.broker.list", bootstrapServers); > >> > put("auto.offset.reset", "

Re: Is that possible to launch spark streaming application on yarn with only one machine?

2016-07-06 Thread Mich Talebzadeh
or launching via local mode (works) >spark-submit --master local[4] --driver-memory 4g --executor-memory 2g > --num-executors 4 target/CollAna-1.0-SNAPSHOT.jar > > > > Any advice? > > > Thanks, > > Jared > > > > -- > *From:*

Re: Is that possible to launch spark streaming application on yarn with only one machine?

2016-07-06 Thread Yu Wei
master local[4] --driver-memory 4g --executor-memory 2g --num-executors 4 target/CollAna-1.0-SNAPSHOT.jar Any advice? Thanks, Jared From: Yu Wei Sent: Tuesday, July 5, 2016 4:41 PM To: Deng Ching-Mallete Cc: user@spark.apache.org Subject: Re: Is that possible to

Re: Spark streaming. Strict discretizing by time

2016-07-05 Thread Cody Koeninger
put("auto.offset.reset", "smallest"); >> > } >> > }; >> > >> > JavaPairInputDStream messages = >> > KafkaUtils.createDirectStream(jssc, >> > S

Re: Spark streaming. Strict discretizing by time

2016-07-05 Thread rss rss
createDirectStream(jssc, > > String.class, > > String.class, > > StringDecoder.class, > > StringDecoder.class, > > kafkaParams, > >

Re: Spark streaming. Strict discretizing by time

2016-07-05 Thread Cody Koeninger
topicMap); > > messages.countByWindow(Seconds.apply(10), Milliseconds.apply(5000)) > .map(x -> {System.out.println(x); return x;}) > .dstream().saveAsTextFiles("/tmp/spark", "spark-streaming"); > > > I need to see a

Spark streaming. Strict discretizing by time

2016-07-05 Thread rss rss
kafkaParams, topicMap); messages.countByWindow(Seconds.apply(10), Milliseconds.apply(5000)) *.map(x -> {System.out.println(x); return x;})* .dstream().saveAsTextFiles("/tmp/spark", "spark-streaming");

Re: Is that possible to launch spark streaming application on yarn with only one machine?

2016-07-05 Thread Yu Wei
Hi Deng, Thanks for the help. Actually I need pay more attention to memory usage. I found the root cause in my problem. It seemed that it existed in spark streaming MQTTUtils module. When I use "localhost" in brokerURL, it doesn't work. After change it to "127.0.0.1"

Re: Is that possible to launch spark streaming application on yarn with only one machine?

2016-07-05 Thread Deng Ching-Mallete
YARN. You should be able to see something like "No resources available in cluster.." in the application master logs in YARN if that is the case. HTH, Deng On Tue, Jul 5, 2016 at 4:31 PM, Yu Wei wrote: > Hi guys, > > I set up pseudo hadoop/yarn cluster on my labtop. > &g

Is that possible to launch spark streaming application on yarn with only one machine?

2016-07-04 Thread Yu Wei
Hi guys, I set up pseudo hadoop/yarn cluster on my labtop. I wrote a simple spark streaming program as below to receive messages with MQTTUtils. conf = new SparkConf().setAppName("Monitor&Control"); jssc = new JavaStreamingContext(conf, Durations.seconds(1)); JavaReceiverInputD

Re: How to spin up Kafka using docker and use for Spark Streaming Integration tests

2016-07-04 Thread Lars Albertsson
or the address printed by "ip addr show docker0" (Linux). I also suggest setting AUTO_CREATE_TOPICS to true. You can choose to run your Spark Streaming application under test (SUT) and your test harness also in Docker containers, or directly on your host. In the former case, it is e

RE: AMQP extension for Apache Spark Streaming (messaging/IoT)

2016-07-03 Thread Darren Govoni
This is fantastic news. Sent from my Verizon 4G LTE smartphone Original message From: Paolo Patierno Date: 7/3/16 4:41 AM (GMT-05:00) To: user@spark.apache.org Subject: AMQP extension for Apache Spark Streaming (messaging/IoT) Hi all, I'm working on an

AMQP extension for Apache Spark Streaming (messaging/IoT)

2016-07-03 Thread Paolo Patierno
Hi all, I'm working on an AMQP extension for Apache Spark Streaming, developing a reliable receiver for that. After MQTT support (I see it in the Apache Bahir repository), another messaging/IoT protocol could be very useful for the Apache Spark Streaming ecosystem. Out there a lo

Re: How to spin up Kafka using docker and use for Spark Streaming Integration tests

2016-07-01 Thread Akhil Das
You can use this https://github.com/wurstmeister/kafka-docker to spin up a kafka cluster and then point your sparkstreaming to it to consume from it. On Fri, Jul 1, 2016 at 1:19 AM, SRK wrote: > Hi, > > I need to do integration tests using Spark Streaming. My idea is to spin up >

How to spin up Kafka using docker and use for Spark Streaming Integration tests

2016-06-30 Thread SRK
Hi, I need to do integration tests using Spark Streaming. My idea is to spin up kafka using docker locally and use it to feed the stream to my Streaming Job. Any suggestions on how to do this would be of great help. Thanks, Swetha -- View this message in context: http://apache-spark-user

Re: Integration tests for Spark Streaming

2016-06-28 Thread Luciano Resende
This thread might be useful for what you want: https://www.mail-archive.com/user%40spark.apache.org/msg34673.html On Tue, Jun 28, 2016 at 1:25 PM, SRK wrote: > Hi, > > I need to write some integration tests for my Spark Streaming app. Any > example on how to do this would be o

Integration tests for Spark Streaming

2016-06-28 Thread SRK
Hi, I need to write some integration tests for my Spark Streaming app. Any example on how to do this would be of great help. Thanks, Swetha -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Integration-tests-for-Spark-Streaming-tp27246.html Sent from the

Re: Improving performance of a kafka spark streaming app

2016-06-24 Thread Cody Koeninger
Unless I'm misreading the image you posted, it does show event counts for the single batch that is still running, with 1.7 billion events in it. The recent batches do show 0 events, but I'm guessing that's because they're actually empty. When you said you had a kafka topic with 1.7 billion events

Re: Kryo ClassCastException during Serialization/deserialization in Spark Streaming

2016-06-23 Thread swetha kasireddy
gt;> I keep getting the following error in my Spark Streaming every now and >> then >> after the job runs for say around 10 hours. I have those 2 classes >> registered in kryo as shown below. sampleMap is a field in SampleSession >> as shown below. Any suggestion as to how

Re: Kryo ClassCastException during Serialization/deserialization in Spark Streaming

2016-06-23 Thread Ted Yu
Can you illustrate how sampleMap is populated ? Thanks On Thu, Jun 23, 2016 at 12:34 PM, SRK wrote: > Hi, > > I keep getting the following error in my Spark Streaming every now and then > after the job runs for say around 10 hours. I have those 2 classes > registered in kryo

Kryo ClassCastException during Serialization/deserialization in Spark Streaming

2016-06-23 Thread SRK
Hi, I keep getting the following error in my Spark Streaming every now and then after the job runs for say around 10 hours. I have those 2 classes registered in kryo as shown below. sampleMap is a field in SampleSession as shown below. Any suggestion as to how to avoid this would be of great

How does Spark Streaming updateStateByKey or mapWithState scale with state size?

2016-06-23 Thread Martin Eden
Hi all, It is currently difficult to understand from the Spark docs or the materials online that I came across, how the updateStateByKey and mapWithState operators in Spark Streaming scale with the size of the state and how to reason about sizing the cluster appropriately. According to this

Re: Improving performance of a kafka spark streaming app

2016-06-22 Thread Colin Kincaid Williams
Streaming UI tab showing empty events and very different metrics than on 1.5.2 On Thu, Jun 23, 2016 at 5:06 AM, Colin Kincaid Williams wrote: > After a bit of effort I moved from a Spark cluster running 1.5.2, to a > Yarn cluster running 1.6.1 jars. I'm still setting the maxRPP. The > completed b

Re: Improving performance of a kafka spark streaming app

2016-06-22 Thread Colin Kincaid Williams
After a bit of effort I moved from a Spark cluster running 1.5.2, to a Yarn cluster running 1.6.1 jars. I'm still setting the maxRPP. The completed batches are no longer showing the number of events processed in the Streaming UI tab . I'm getting around 4k inserts per second in hbase, but I haven't

Recovery techniques for Spark Streaming scheduling delay

2016-06-22 Thread C. Josephson
We have a Spark Streaming application that has basically zero scheduling delay for hours, but then suddenly it jumps up to multiple minutes and spirals out of control (see screenshot of job manager here: http://i.stack.imgur.com/kSftN.png) This is happens after a while even if we double the batch

Re: spark streaming questions

2016-06-22 Thread pandees waran
ime / data loss .(especially when you want k being down the cluster and > create a new one for running spark streaming ) > > > Sent from my iPhone > >> On Jun 22, 2016, at 10:17 AM, Mich Talebzadeh >> wrote: >> >> Hi Pandees, >> >> can you k

Re: spark streaming questions

2016-06-22 Thread pandees waran
stion (2) is about how to manage the clusters without any downtime / data loss .(especially when you want k being down the cluster and create a new one for running spark streaming ) Sent from my iPhone > On Jun 22, 2016, at 10:17 AM, Mich Talebzadeh > wrote: > > Hi Pandees, &g

Re: spark streaming questions

2016-06-22 Thread Mich Talebzadeh
Hi Pandees, can you kindly explain what you are trying to achieve by incorporating Spark streaming with workflow orchestration. Is this some form of back-to-back seamless integration. I have not used it myself but would be interested in knowing more about your use case. Cheers, Dr Mich

Re: how to avoid duplicate messages with spark streaming using checkpoint after restart in case of failure

2016-06-22 Thread Jörn Franke
s lower. A risk evaluation from a business point >> of view has to be done anyway... >> >> > On 22 Jun 2016, at 09:09, sandesh deshmane wrote: >> > >> > Hi, >> > >> > I am writing spark streaming application which reads messages from Kafka.

spark streaming questions

2016-06-22 Thread pandees waran
Hello all, I have few questions regarding spark streaming : * I am wondering anyone uses spark streaming with workflow orchestrators such as data pipeline/SWF/any other framework. Is there any advantages /drawbacks on using a workflow orchestrator for spark streaming? *How do you guys manage

Re: how to avoid duplicate messages with spark streaming using checkpoint after restart in case of failure

2016-06-22 Thread Cody Koeninger
inked from the page below? That should clarify what your options are. https://github.com/koeninger/kafka-exactly-once On Wed, Jun 22, 2016 at 5:55 AM, Denys Cherepanin wrote: > Hi Sandesh, > > As I understand you are using "receiver based" approach to integrate kafka > with s

Re: how to avoid duplicate messages with spark streaming using checkpoint after restart in case of failure

2016-06-22 Thread sandesh deshmane
app. Thanks Sandesh On Wed, Jun 22, 2016 at 4:25 PM, Denys Cherepanin wrote: > Hi Sandesh, > > As I understand you are using "receiver based" approach to integrate kafka > with spark streaming. > > Did you tried "direct" approach > <http://

Re: how to avoid duplicate messages with spark streaming using checkpoint after restart in case of failure

2016-06-22 Thread Denys Cherepanin
Hi Sandesh, As I understand you are using "receiver based" approach to integrate kafka with spark streaming. Did you tried "direct" approach <http://spark.apache.org/docs/latest/streaming-kafka-integration.html#approach-2-direct-approach-no-receivers> ? In this case

Re: how to avoid duplicate messages with spark streaming using checkpoint after restart in case of failure

2016-06-22 Thread sandesh deshmane
t needed requires in any system including spark more effort > and usually the throughput is lower. A risk evaluation from a business > point of view has to be done anyway... > > > On 22 Jun 2016, at 09:09, sandesh deshmane > wrote: > > > > Hi, > > > > I am writ

Re: how to avoid duplicate messages with spark streaming using checkpoint after restart in case of failure

2016-06-22 Thread Jörn Franke
throughput is lower. A risk evaluation from a business point of view has to be done anyway... > On 22 Jun 2016, at 09:09, sandesh deshmane wrote: > > Hi, > > I am writing spark streaming application which reads messages from Kafka. > > I am using checkpointing and write

Re: how to avoid duplicate messages with spark streaming using checkpoint after restart in case of failure

2016-06-22 Thread Mich Talebzadeh
> <https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>* >> >> >> >> http://talebzadehmich.wordpress.com >> >> >> >> On 22 June 2016 at 09:57, sandesh deshmane >> wrote: >> >>> Here I refer to

Re: how to avoid duplicate messages with spark streaming using checkpoint after restart in case of failure

2016-06-22 Thread sandesh deshmane
e in spark app. >> >> So When I restart , i see duplicate messages. >> >> To replicate the scenario , i just do kill mysparkapp and then restart . >> >> On Wed, Jun 22, 2016 at 1:10 PM, Mich Talebzadeh < >> mich.talebza...@gmail.com> wrote: >>

Re: how to avoid duplicate messages with spark streaming using checkpoint after restart in case of failure

2016-06-22 Thread Mich Talebzadeh
art . > > On Wed, Jun 22, 2016 at 1:10 PM, Mich Talebzadeh < > mich.talebza...@gmail.com> wrote: > >> As I see it you are using Spark streaming to read data from source >> through Kafka. Your batch interval is 10 sec, so in that interval you have >> 10*300K = 3Mi

Re: how to avoid duplicate messages with spark streaming using checkpoint after restart in case of failure

2016-06-22 Thread sandesh deshmane
Here I refer to failure in spark app. So When I restart , i see duplicate messages. To replicate the scenario , i just do kill mysparkapp and then restart . On Wed, Jun 22, 2016 at 1:10 PM, Mich Talebzadeh wrote: > As I see it you are using Spark streaming to read data from source thro

Re: how to avoid duplicate messages with spark streaming using checkpoint after restart in case of failure

2016-06-22 Thread Mich Talebzadeh
As I see it you are using Spark streaming to read data from source through Kafka. Your batch interval is 10 sec, so in that interval you have 10*300K = 3Milion messages When you say there is failure are you referring to the failure in the source or in Spark streaming app? HTH Dr Mich Talebzadeh

how to avoid duplicate messages with spark streaming using checkpoint after restart in case of failure

2016-06-22 Thread sandesh deshmane
Hi, I am writing spark streaming application which reads messages from Kafka. I am using checkpointing and write ahead logs ( WAL) to achieve fault tolerance . I have created batch size of 10 sec for reading messages from kafka. I read messages for kakfa and generate the count of messages as

Re: Improving performance of a kafka spark streaming app

2016-06-21 Thread Colin Kincaid Williams
Thanks @Cody, I will try that out. In the interm, I tried to validate my Hbase cluster by running a random write test and see 30-40K writes per second. This suggests there is noticeable room for improvement. On Tue, Jun 21, 2016 at 8:32 PM, Cody Koeninger wrote: > Take HBase out of the equation a

Re: Improving performance of a kafka spark streaming app

2016-06-21 Thread Cody Koeninger
Take HBase out of the equation and just measure what your read performance is by doing something like createDirectStream(...).foreach(_.println) not take() or print() On Tue, Jun 21, 2016 at 3:19 PM, Colin Kincaid Williams wrote: > @Cody I was able to bring my processing time down to a second b

Re: Improving performance of a kafka spark streaming app

2016-06-21 Thread Colin Kincaid Williams
@Cody I was able to bring my processing time down to a second by setting maxRatePerPartition as discussed. My bad that I didn't recognize it as the cause of my scheduling delay. Since then I've tried experimenting with a larger Spark Context duration. I've been trying to get some noticeable improv

Can Spark Streaming checkpoint only metadata ?

2016-06-21 Thread Natu Lauchande
Hi, I wonder if it is possible to checkpoint only metadata and not the data in RDD's and dataframes. Thanks, Natu

Re: Number of consumers in Kafka with Spark Streaming

2016-06-21 Thread Cody Koeninger
same time. The direct stream doesn't use consumer groups in the same way the kafka high level consumer does, but you should be able to pass group id in the kafka parameters. On Tue, Jun 21, 2016 at 9:56 AM, Guillermo Ortiz wrote: > I use Spark Streaming with Kafka and I'd like to

Number of consumers in Kafka with Spark Streaming

2016-06-21 Thread Guillermo Ortiz
I use Spark Streaming with Kafka and I'd like to know how many consumers are generated. I guess that as many as partitions in Kafka but I'm not sure. Is there a way to know the name of the groupId generated in Spark to Kafka?

Re: scala.NotImplementedError: put() should not be called on an EmptyStateMap while doing stateful computation on spark streaming

2016-06-21 Thread Ted Yu
Are you using 1.6.1 ? If not, does the problem persist when you use 1.6.1 ? Thanks > On Jun 20, 2016, at 11:16 PM, umanga wrote: > > I am getting following warning while running stateful computation. The state > consists of BloomFilter (stream-lib) as Value and Integer as key. > > The program

Re: scala.NotImplementedError: put() should not be called on an EmptyStateMap while doing stateful computation on spark streaming

2016-06-21 Thread umanga
further descriptions: Environment: Spark cluster running in standalone mode with 1 master, 5 slaves, each has 4 vCPUS, 8GB RAM data is being streamed from 3 node kafka cluster (managed by 3 node zk cluster). Checkpointing is being done at hadoop-cluster, plus we are also saving state in HBase (

scala.NotImplementedError: put() should not be called on an EmptyStateMap while doing stateful computation on spark streaming

2016-06-20 Thread umanga
I am getting following warning while running stateful computation. The state consists of BloomFilter (stream-lib) as Value and Integer as key. The program runs smoothly for few minutes and after that, i am getting this warning, and streaming app becomes unstable (processing time increases exponent

Re: Improving performance of a kafka spark streaming app

2016-06-20 Thread Colin Kincaid Williams
I'll try dropping the maxRatePerPartition=400, or maybe even lower. However even at application starts up I have this large scheduling delay. I will report my progress later on. On Mon, Jun 20, 2016 at 2:12 PM, Cody Koeninger wrote: > If your batch time is 1 second and your average processing tim

Re: Improving performance of a kafka spark streaming app

2016-06-20 Thread Cody Koeninger
If your batch time is 1 second and your average processing time is 1.16 seconds, you're always going to be falling behind. That would explain why you've built up an hour of scheduling delay after eight hours of running. On Sat, Jun 18, 2016 at 4:40 PM, Colin Kincaid Williams wrote: > Hi Mich aga

Re: spark streaming - how to purge old data files in data directory

2016-06-18 Thread Akhil Das
Currently, there is no out of the box solution for this. Although, you can use other hdfs utils to remove older files from the directory (say 24hrs old). Another approach is discussed here <http://apache-spark-user-list.1001560.n3.nabble.com/Spark-streaming-tracking-deleting-processed-fi

spark streaming - how to purge old data files in data directory

2016-06-18 Thread Vamsi Krishna
Hi, I'm on HDP 2.3.2 cluster (Spark 1.4.1). I have a spark streaming app which uses 'textFileStream' to stream simple CSV files and process. I see the old data files that are processed are left in the data directory. What is the right way to purge the old data files in data di

Re: Improving performance of a kafka spark streaming app

2016-06-18 Thread Colin Kincaid Williams
Hi Mich again, Regarding batch window, etc. I have provided the sources, but I'm not currently calling the window function. Did you see the program source? It's only 100 lines. https://gist.github.com/drocsid/b0efa4ff6ff4a7c3c8bb56767d0b6877 Then I would expect I'm using defaults, other than wha

Re: Improving performance of a kafka spark streaming app

2016-06-18 Thread Mich Talebzadeh
Ok What is the set up for these please? batch window window length sliding interval And also in each batch window how much data do you get in (no of messages in the topic whatever)? Dr Mich Talebzadeh LinkedIn * https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUr

Re: Improving performance of a kafka spark streaming app

2016-06-18 Thread Mich Talebzadeh
I believe you have an issue with performance? have you checked spark GUI (default 4040) for details including shuffles etc? HTH Dr Mich Talebzadeh LinkedIn * https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw

Re: Improving performance of a kafka spark streaming app

2016-06-18 Thread Colin Kincaid Williams
I'm attaching a picture from the streaming UI. On Sat, Jun 18, 2016 at 7:59 PM, Colin Kincaid Williams wrote: > There are 25 nodes in the spark cluster. > > On Sat, Jun 18, 2016 at 7:53 PM, Mich Talebzadeh > wrote: >> how many nodes are in your cluster? >> >> --num-executors 6 \ >> --driver-mem

Re: Improving performance of a kafka spark streaming app

2016-06-18 Thread Colin Kincaid Williams
There are 25 nodes in the spark cluster. On Sat, Jun 18, 2016 at 7:53 PM, Mich Talebzadeh wrote: > how many nodes are in your cluster? > > --num-executors 6 \ > --driver-memory 4G \ > --executor-memory 2G \ > --total-executor-cores 12 \ > > > Dr Mich Talebzadeh > > > > LinkedIn > https://www.l

Re: Improving performance of a kafka spark streaming app

2016-06-18 Thread Mich Talebzadeh
how many nodes are in your cluster? --num-executors 6 \ --driver-memory 4G \ --executor-memory 2G \ --total-executor-cores 12 \ Dr Mich Talebzadeh LinkedIn * https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw

Re: Improving performance of a kafka spark streaming app

2016-06-18 Thread Colin Kincaid Williams
I updated my app to Spark 1.5.2 streaming so that it consumes from Kafka using the direct api and inserts content into an hbase cluster, as described in this thread. I was away from this project for awhile due to events in my family. Currently my scheduling delay is high, but the processing time i

Spark Streaming WAL issue**: File exists and there is no append support!

2016-06-16 Thread tosaigan...@gmail.com
org.apache.spark.streaming.util.HdfsUtils$.getOutputStream(HdfsUtils.scala:35) at org.apache.spark.streaming.util.FileBasedWriteAheadLogWriter.org$apache$spark$streaming$util$FileBasedWriteAheadLogWriter$$stream$lzycompute(FileBasedWriteAheadLogWriter.scala:33) at

Re: Recommended way to push data into HBase through Spark streaming

2016-06-16 Thread Mohammad Tariq
Forgot to add, I'm on HBase 1.0.0-cdh5.4.5, so can't use HBaseContext. And spark version is 1.6.1 [image: http://] Tariq, Mohammad about.me/mti [image: http://] On Thu, Jun 16, 2016 at 10:12 PM, Mohammad Tariq wrote: > Hi group, > > I have a streaming job which reads d

Recommended way to push data into HBase through Spark streaming

2016-06-16 Thread Mohammad Tariq
Hi group, I have a streaming job which reads data from Kafka, performs some computation and pushes the result into HBase. Actually the results are pushed into 3 different HBase tables. So I was wondering what could be the best way to achieve this. Since each executor will open its own HBase conne

spark streaming application - deployment best practices

2016-06-15 Thread vimal dinakaran
don't have hadoop or S3 environment. This mode of deployment is inconvenient. I could do spark submit from one node in client mode but it doesn't provide high availablity . What is the best way to deploy spark streaming applications in production ? Thanks Vimal

RE: restarting of spark streaming

2016-06-15 Thread Chen, Yan I
Could anyone answer my question? _ From: Chen, Yan I Sent: 2016, June, 14 1:34 PM To: 'user@spark.apache.org' Subject: restarting of spark streaming Hi, I notice that in the process of restarting, spark streaming will try to recover/repl

RE: Handle empty kafka in Spark Streaming

2016-06-15 Thread David Newberger
Hi Yogesh, I'm not sure if this is possible or not. I'd be interested in knowing. My gut thinks it would be an anti-pattern if it's possible to do something like this and that's why I handle it in either the foreachRDD or foreachPartition. The way I look at spark streaming

RE: Handle empty kafka in Spark Streaming

2016-06-15 Thread David Newberger
ednesday, June 15, 2016 6:31 AM To: user Subject: Handle empty kafka in Spark Streaming Hi, Does anyone knows how to handle empty Kafka while Spark Streaming job is running ? Regards, Yogesh - To unsubscribe, e-mail: user-

Handle empty kafka in Spark Streaming

2016-06-15 Thread Yogesh Vyas
Hi, Does anyone knows how to handle empty Kafka while Spark Streaming job is running ? Regards, Yogesh - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org

restarting of spark streaming

2016-06-14 Thread Chen, Yan I
Hi, I notice that in the process of restarting, spark streaming will try to recover/replay all the batches it missed. But in this process, will streams be checkpointed like the way they are checkpointed in the normal process? Does anyone know? Sometimes our cluster goes maintenance, and our

Re: Spark Streaming application failing with Kerboros issue while writing data to HBase

2016-06-14 Thread Kamesh
t; On Mon, Jun 13, 2016 at 4:44 AM, Kamesh wrote: > >> Hi All, >> We are building a spark streaming application and that application >> writes data to HBase table. But writes/reads are failing with following >> exception >> >> 16/06/13 04:35:16 ERROR ipc.Abst

Re: Spark Streaming application failing with Kerboros issue while writing data to HBase

2016-06-13 Thread Ted Yu
Can you show snippet of your code, please ? Please refer to obtainTokenForHBase() in yarn/src/main/scala/org/apache/spark/deploy/yarn/YarnSparkHadoopUtil.scala Cheers On Mon, Jun 13, 2016 at 4:44 AM, Kamesh wrote: > Hi All, > We are building a spark streaming application and that appli

Spark Streaming application failing with Kerboros issue while writing data to HBase

2016-06-13 Thread Kamesh
Hi All, We are building a spark streaming application and that application writes data to HBase table. But writes/reads are failing with following exception 16/06/13 04:35:16 ERROR ipc.AbstractRpcClient: SASL authentication failed. The most likely cause is missing or invalid credentials

Re: Spark Installation to work on Spark Streaming and MLlib

2016-06-10 Thread Ram Krishna
Thanks for suggestion. Can you suggest me from where and how I how to start from the scratch to work on Spark. On Fri, Jun 10, 2016 at 8:18 PM, Holden Karau wrote: > So that's a bit complicated - you might want to start with reading the > code for the existing algorithms and go from there. If yo

Re: Long Running Spark Streaming getting slower

2016-06-10 Thread Mich Talebzadeh
the nature of this spark streaming if you can divulge on it? HTH Dr Mich Talebzadeh LinkedIn * https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw <https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*

Re: Long Running Spark Streaming getting slower

2016-06-10 Thread John Simon
d6zP6AcPCCdOABUrV8Pw > <https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>* > > > > http://talebzadehmich.wordpress.com > > > > On 10 June 2016 at 18:21, john.simon wrote: > >> Hi all, >> >> I'm running Spark

Re: Long Running Spark Streaming getting slower

2016-06-10 Thread Mich Talebzadeh
/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw <https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>* http://talebzadehmich.wordpress.com On 10 June 2016 at 18:21, john.simon wrote: > Hi all, > > I'm running Spark Streaming with Kafka Dire

Long Running Spark Streaming getting slower

2016-06-10 Thread john.simon
Hi all, I'm running Spark Streaming with Kafka Direct Stream, but after running a couple of days, the batch processing time almost doubles. I didn't find any slowdown on JVM GC logs, but I did find that Spark broadcast variable reading time increasing. Initially it takes less than 10ms,

Re: Spark Installation to work on Spark Streaming and MLlib

2016-06-10 Thread Holden Karau
So that's a bit complicated - you might want to start with reading the code for the existing algorithms and go from there. If your goal is to contribute the algorithm to Spark you should probably take a look at the JIRA as well as the contributing to Spark guide on the wiki. Also we have a seperate

Re: Spark Installation to work on Spark Streaming and MLlib

2016-06-10 Thread Ram Krishna
Hi All, How to add new ML algo in Spark MLlib. On Fri, Jun 10, 2016 at 12:50 PM, Ram Krishna wrote: > Hi All, > > I am new to this this field, I want to implement new ML algo using Spark > MLlib. What is the procedure. > > -- > Regards, > Ram Krishna KT > > > > > > -- Regards, Ram Krishna KT

Re: Spark Installation to work on Spark Streaming and MLlib

2016-06-10 Thread Holden Karau
Hi Ram, Not super certain what you are looking to do. Are you looking to add a new algorithm to Spark MLlib for streaming or use Spark MLlib on streaming data? Cheers, Holden On Friday, June 10, 2016, Ram Krishna wrote: > Hi All, > > I am new to this this field, I want to implement new ML alg

Spark Installation to work on Spark Streaming and MLlib

2016-06-10 Thread Ram Krishna
Hi All, I am new to this this field, I want to implement new ML algo using Spark MLlib. What is the procedure. -- Regards, Ram Krishna KT

Re: Error while using checkpointing . Spark streaming 1.5.2- DStream checkpointing has been enabled but the DStreams with their functions are not serialisable

2016-06-09 Thread sandesh deshmane
ining the myFunction inside the Function and see if the problem persists. > > On Thu, Jun 9, 2016 at 3:57 AM, sandesh deshmane > wrote: > >> Hi, >> >> I am using spark streaming for streaming data from kafka 0.8 >> >> I am using checkpointing in HDFS . I am

Re: Spark Streaming heap space out of memory

2016-06-09 Thread christian.dancu...@rbc.com
Issue was resolved by upgrading Spark to version 1.6 -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-Streaming-heap-space-out-of-memory-tp27050p27131.html Sent from the Apache Spark User List mailing list archive at Nabble.com

Re: Processing Time Spikes (Spark Streaming)

2016-06-09 Thread christian.dancu...@rbc.com
What version of Spark are you running? Do you see the heap space slowly increase over time? Have you set the ttl cleaner? -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Processing-Time-Spikes-Spark-Streaming-tp22375p27130.html Sent from the Apache Spark

Re: Error while using checkpointing . Spark streaming 1.5.2- DStream checkpointing has been enabled but the DStreams with their functions are not serialisable

2016-06-09 Thread Tathagata Das
> I am using spark streaming for streaming data from kafka 0.8 > > I am using checkpointing in HDFS . I am getting error like below > > java.io.NotSerializableException: DStream checkpointing has been enabled > but the DStreams with their functions are not seria

<    8   9   10   11   12   13   14   15   16   17   >