Thank you guys, I’ll try Parquet and if that’s not quick enough I’ll go the usual route with either read-only or normal database.
On 13.09.2014, at 12:45, andy petrella <andy.petre...@gmail.com> wrote: > however, the cache is not guaranteed to remain, if other jobs are launched in > the cluster and require more memory than what's left in the overall caching > memory, previous RDDs will be discarded. > > Using an off heap cache like tachyon as a dump repo can help. > > In general, I'd say that using a persistent sink (like Cassandra for > instance) is best. > > my .2¢ > > > aℕdy ℙetrella > about.me/noootsab > > > > On Sat, Sep 13, 2014 at 9:20 AM, Mayur Rustagi <mayur.rust...@gmail.com> > wrote: > You can cache data in memory & query it using Spark Job Server. > Most folks dump data down to a queue/db for retrieval > You can batch up data & store into parquet partitions as well. & query it > using another SparkSQL shell, JDBC driver in SparkSQL is part 1.1 i believe. > -- > Regards, > Mayur Rustagi > Ph: +1 (760) 203 3257 > http://www.sigmoidanalytics.com > @mayur_rustagi > > > On Fri, Sep 12, 2014 at 2:54 PM, Marius Soutier <mps....@gmail.com> wrote: > > Hi there, > > I’m pretty new to Spark, and so far I’ve written my jobs the same way I wrote > Scalding jobs - one-off, read data from HDFS, count words, write counts back > to HDFS. > > Now I want to display these counts in a dashboard. Since Spark allows to > cache RDDs in-memory and you have to explicitly terminate your app (and > there’s even a new JDBC server in 1.1), I’m assuming it’s possible to keep an > app running indefinitely and query an in-memory RDD from the outside (via > SparkSQL for example). > > Is this how others are using Spark? Or are you just dumping job results into > message queues or databases? > > > Thanks > - Marius > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org > For additional commands, e-mail: user-h...@spark.apache.org > > > >