JavaDStream.foreachRDD (https://spark.apache.org/docs/1.2.1/api/java/org/apache/spark/streaming/api/java/JavaDStreamLike.html#foreachRDD(org.apache.spark.api.java.function.Function)) and Statistics.corr (https://spark.apache.org/docs/1.2.1/api/java/org/apache/spark/mllib/stat/Statistics.html#corr(org.apache.spark.rdd.RDD)) should be good starting points. -Xiangrui
On Mon, Feb 16, 2015 at 6:39 AM, Spico Florin <spicoflo...@gmail.com> wrote: > Hello! > I'm newbie to Spark and I have the following case study: > 1. Client sending at 100ms the following data: > {uniqueId, timestamp, measure1, measure2 } > 2. Each 30 seconds I would like to correlate the data collected in the > window, with some predefined double vector pattern for each given key. The > predefined pattern has 300 records. The data should be also sorted by > timestamp. > 3. When the correlation is greater than a predefined threshold (e.g 0.9) I > would like to emit an new message containing {uniqueId, > doubleCorrelationValue} > 4. For the correlation I would like to use MLlib > 5. As a programming language I would like to muse Java 7. > > Can you please give me some suggestions on how to create the skeleton for > the above scenario? > > Thanks. > Regards, > Florin > --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org