Re: Maelstrom: Kafka integration with Spark

2016-08-24 Thread Jeoffrey Lim
retry strategy on numerous places around Kafka related routines). Not that i'm complaining or competing, at the end of the day having a Spark App that continues to work overnight gives developer a good sleep at night :) On Thu, Aug 25, 2016 at 3:23 AM, Jeoffrey Lim wrote: > Hi Cody, thank

Re: Maelstrom: Kafka integration with Spark

2016-08-24 Thread Jeoffrey Lim
x27;t achieve sub-millisecond > end-to-end stream processing, so my guess is you need to be more > specific about your terms there. > > I promise I'm not trying to start a pissing contest :) just wanted to > check if you were aware of the current state of the other co

Re: Maelstrom: Kafka integration with Spark

2016-08-23 Thread Jeoffrey Lim
1.3 and Kafka 0.8.2.1 (and of course with the latest Kafka 0.10 as well) On Wed, Aug 24, 2016 at 9:49 AM, Cody Koeninger wrote: > Were you aware that the spark 2.0 / kafka 0.10 integration also reuses > kafka consumer instances on the executors? > > On Tue, Aug 23, 2016 at 3:19 PM, J

Maelstrom: Kafka integration with Spark

2016-08-23 Thread Jeoffrey Lim
has been running stable in production environment and has been proven to be resilient to numerous production issues. Please check out the project's page in github: https://github.com/jeoffreylim/maelstrom Contributors welcome! Cheers! Jeoffrey Lim P.S. I am also looking for a job opport

Re: How to initiate a shutdown of Spark Streaming context?

2014-09-15 Thread Jeoffrey Lim
What we did for gracefully shutting down the spark streaming context is extend a Spark Web UI Tab and perform a SparkContext.SparkUI.attachTab(). However, the custom scala Web UI extensions needs to be under the package org.apache.spark.ui to get around with the package access restrictions. Would

Re: Some Serious Issue with Spark Streaming ? Blocks Getting Removed and Jobs have Failed..

2014-09-12 Thread Jeoffrey Lim
Our issue could be related to this problem as described in: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-Streaming-in-1-hour-batch-duration-RDD-files-gets-lost-td14027.html which the DStream is processed for every 1 hour batch duration. I have implemented IO throttling in the Receiver

Spark Streaming in 1 hour batch duration RDD files gets lost

2014-09-11 Thread Jeoffrey Lim
Hi, Our spark streaming app is configured to pull data from Kafka in 1 hour batch duration which performs aggregation of data by specific keys and store the related RDDs to HDFS in the transform phase. We have tried checkpoint of 7 days on the DStream of Kafka to ensure that the generated stream