Okay, going back to your origin question, it wasnt clear what is the reduce
function that you are trying to implement. Going by the 2nd example using
window() operation, following by a count+filter (using sql), I am guessing
you are trying to maintain a count of the all the active "states" in the
last 15 minutes. One way to do this could be

eventData.filter(_.state == "Active").countByWindow(Minutes(15), Seconds(3))

This, underneath will do the counting in an incremental manner, using
reduce and inverse reduce function.

Another different reason could simply be that you really need more
resources to process that much data.

TD

On Wed, Aug 6, 2014 at 7:58 PM, salemi <alireza.sal...@udo.edu> wrote:

> Hi,
>
> The reason I am looking to do it differently is because the latency and
> batch processing times are bad about 40 sec. I took the times from the
> Streaming UI.
>
> As you suggested I tried the window as below and still the times are bad.
>  val dStream = KafkaUtils.createStream(ssc, zkQuorum, group, topicpMap)
>       val eventData = dStream.map(_._2).map(_.split(",")).map(data =>
> Data(data(0), data(1), data(2), data(3), data(4))).window(Minutes(15),
> Seconds(3))
>
>       val result =  eventData.transform((rdd, time) => {
>         rdd.registerAsTable("data")
>         sql("SELECT count(state) FROM data WHERE state='Active'")
>       })
>       result.print()
>
> Any suggestions?
>
> Ali
>
>
>
>
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/Spark-Streaming-reduceByWindow-reduceFunc-invReduceFunc-windowDuration-slideDuration-tp11591p11612.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
>
>

Reply via email to