Re: Where to save intermediate results?

Daniel Siegmann Tue, 02 Sep 2014 07:06:12 -0700

I don't have any personal experience with Spark Streaming. Whether you
store your data in HDFS or a database or something else probably depends on
the nature of your use case.



On Fri, Aug 29, 2014 at 10:38 AM, huylv <huy.le...@insight-centre.org>
wrote:

> Hi Daniel,
>
> Your suggestion is definitely an interesting approach. In fact, I already
> have another system to deal with the stream analytical processing part. So
> basically, the Spark job to aggregate data just accumulatively computes
> aggregations from historical data together with new batch, which has been
> partly summarized by the stream processor. Answering queries involves in
> combining pre-calculated historical data together with on-stream
> aggregations. This sounds much like what Spark Streaming is intended to do.
> So I'll take a look deeper into Spark Streaming to consider porting the
> stream processing part to use Spark Streaming.
>
> Regarding saving pre-calculated data onto external storages (disk,
> database...), I'm looking at Cassandra for now. But I don't know how it
> fits
> into my context and how is its performance compared to saving to files in
> HDFS. Also, is there anyway to keep the precalculated data both on disk and
> on memory, so that when the batch job terminated, historical data still
> available on memory for combining with stream processor, while still be
> able
> to survive system failure or upgrade? Not to mention the size of
> precalculated data might get too big, in that case, partly store newest
> data
> on memory only would be better. Tachyon looks like a nice option but again,
> I don't have experience with it and it's still an experimental feature of
> Spark.
>
> Regards,
> Huy
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/Where-to-save-intermediate-results-tp13062p13127.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
>
>


-- 
Daniel Siegmann, Software Developer
Velos
Accelerating Machine Learning

440 NINTH AVENUE, 11TH FLOOR, NEW YORK, NY 10001
E: daniel.siegm...@velos.io W: www.velos.io

Re: Where to save intermediate results?

Reply via email to