I don't have any personal experience with Spark Streaming. Whether you store your data in HDFS or a database or something else probably depends on the nature of your use case.
On Fri, Aug 29, 2014 at 10:38 AM, huylv <huy.le...@insight-centre.org> wrote: > Hi Daniel, > > Your suggestion is definitely an interesting approach. In fact, I already > have another system to deal with the stream analytical processing part. So > basically, the Spark job to aggregate data just accumulatively computes > aggregations from historical data together with new batch, which has been > partly summarized by the stream processor. Answering queries involves in > combining pre-calculated historical data together with on-stream > aggregations. This sounds much like what Spark Streaming is intended to do. > So I'll take a look deeper into Spark Streaming to consider porting the > stream processing part to use Spark Streaming. > > Regarding saving pre-calculated data onto external storages (disk, > database...), I'm looking at Cassandra for now. But I don't know how it > fits > into my context and how is its performance compared to saving to files in > HDFS. Also, is there anyway to keep the precalculated data both on disk and > on memory, so that when the batch job terminated, historical data still > available on memory for combining with stream processor, while still be > able > to survive system failure or upgrade? Not to mention the size of > precalculated data might get too big, in that case, partly store newest > data > on memory only would be better. Tachyon looks like a nice option but again, > I don't have experience with it and it's still an experimental feature of > Spark. > > Regards, > Huy > > > > -- > View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com/Where-to-save-intermediate-results-tp13062p13127.html > Sent from the Apache Spark User List mailing list archive at Nabble.com. > > --------------------------------------------------------------------- > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org > For additional commands, e-mail: user-h...@spark.apache.org > > -- Daniel Siegmann, Software Developer Velos Accelerating Machine Learning 440 NINTH AVENUE, 11TH FLOOR, NEW YORK, NY 10001 E: daniel.siegm...@velos.io W: www.velos.io