Re: Serving data

Marius Soutier Mon, 15 Sep 2014 04:42:45 -0700

Thank you guys, I’ll try Parquet and if that’s not quick enough I’ll go the 
usual route with either read-only or normal database.


On 13.09.2014, at 12:45, andy petrella <andy.petre...@gmail.com> wrote:

> however, the cache is not guaranteed to remain, if other jobs are launched in 
> the cluster and require more memory than what's left in the overall caching 
> memory, previous RDDs will be discarded.
> 
> Using an off heap cache like tachyon as a dump repo can help.
> 
> In general, I'd say that using a persistent sink (like Cassandra for 
> instance) is best.
> 
> my .2¢
> 
> 
> aℕdy ℙetrella
> about.me/noootsab
> 
> 
> 
> On Sat, Sep 13, 2014 at 9:20 AM, Mayur Rustagi <mayur.rust...@gmail.com> 
> wrote:
> You can cache data in memory & query it using Spark Job Server. 
> Most folks dump data down to a queue/db for retrieval 
> You can batch up data & store into parquet partitions as well. & query it 
> using another SparkSQL  shell, JDBC driver in SparkSQL is part 1.1 i believe. 
> -- 
> Regards,
> Mayur Rustagi
> Ph: +1 (760) 203 3257
> http://www.sigmoidanalytics.com
> @mayur_rustagi
> 
> 
> On Fri, Sep 12, 2014 at 2:54 PM, Marius Soutier <mps....@gmail.com> wrote:
> 
> Hi there, 
> 
> I’m pretty new to Spark, and so far I’ve written my jobs the same way I wrote 
> Scalding jobs - one-off, read data from HDFS, count words, write counts back 
> to HDFS. 
> 
> Now I want to display these counts in a dashboard. Since Spark allows to 
> cache RDDs in-memory and you have to explicitly terminate your app (and 
> there’s even a new JDBC server in 1.1), I’m assuming it’s possible to keep an 
> app running indefinitely and query an in-memory RDD from the outside (via 
> SparkSQL for example). 
> 
> Is this how others are using Spark? Or are you just dumping job results into 
> message queues or databases? 
> 
> 
> Thanks 
> - Marius 
> 
> 
> --------------------------------------------------------------------- 
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org 
> For additional commands, e-mail: user-h...@spark.apache.org 
> 
> 
> 
>

Re: Serving data

Reply via email to