Okay, going back to your origin question, it wasnt clear what is the reduce function that you are trying to implement. Going by the 2nd example using window() operation, following by a count+filter (using sql), I am guessing you are trying to maintain a count of the all the active "states" in the last 15 minutes. One way to do this could be
eventData.filter(_.state == "Active").countByWindow(Minutes(15), Seconds(3)) This, underneath will do the counting in an incremental manner, using reduce and inverse reduce function. Another different reason could simply be that you really need more resources to process that much data. TD On Wed, Aug 6, 2014 at 7:58 PM, salemi <alireza.sal...@udo.edu> wrote: > Hi, > > The reason I am looking to do it differently is because the latency and > batch processing times are bad about 40 sec. I took the times from the > Streaming UI. > > As you suggested I tried the window as below and still the times are bad. > val dStream = KafkaUtils.createStream(ssc, zkQuorum, group, topicpMap) > val eventData = dStream.map(_._2).map(_.split(",")).map(data => > Data(data(0), data(1), data(2), data(3), data(4))).window(Minutes(15), > Seconds(3)) > > val result = eventData.transform((rdd, time) => { > rdd.registerAsTable("data") > sql("SELECT count(state) FROM data WHERE state='Active'") > }) > result.print() > > Any suggestions? > > Ali > > > > > > > > -- > View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com/Spark-Streaming-reduceByWindow-reduceFunc-invReduceFunc-windowDuration-slideDuration-tp11591p11612.html > Sent from the Apache Spark User List mailing list archive at Nabble.com. > > --------------------------------------------------------------------- > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org > For additional commands, e-mail: user-h...@spark.apache.org > >