Hi,
The reason I am looking to do it differently is because the latency and
batch processing times are bad about 40 sec. I took the times from the
Streaming UI.
As you suggested I tried the window as below and still the times are bad.
val dStream = KafkaUtils.createStream(ssc, zkQuorum, group, topicpMap)
val eventData = dStream.map(_._2).map(_.split(",")).map(data =>
Data(data(0), data(1), data(2), data(3), data(4))).window(Minutes(15),
Seconds(3))
val result = eventData.transform((rdd, time) => {
rdd.registerAsTable("data")
sql("SELECT count(state) FROM data WHERE state='Active'")
})
result.print()
Any suggestions?
Ali
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Spark-Streaming-reduceByWindow-reduceFunc-invReduceFunc-windowDuration-slideDuration-tp11591p11612.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]