Re: Serving data

2014-09-17 Thread Marius Soutier
No you’re right, that’s exactly what I’m doing right now. The choice would have been *either* Parquet *or* a database. What’s unfortunate is that apparently this only works with Playframework 2.2, not 2.3, because of the incompatible Akka versions. On 16.09.2014, at 16:37, Yana Kadiyska wrote:

Re: Serving data

2014-09-16 Thread Yana Kadiyska
If your dashboard is doing ajax/pull requests against say a REST API you can always create a Spark context in your rest service and use SparkSQL to query over the parquet files. The parquet files are already on disk so it seems silly to write both to parquet and to a DB...unless I'm missing somethi

Re: Serving data

2014-09-16 Thread Marius Soutier
Writing to Parquet and querying the result via SparkSQL works great (except for some strange SQL parser errors). However the problem remains, how do I get that data back to a dashboard. So I guess I’ll have to use a database after all. You can batch up data & store into parquet partitions as we

Re: Serving data

2014-09-15 Thread Marius Soutier
Nice, I’ll check it out. At first glance, writing Parquet files seems to be a bit complicated. On 15.09.2014, at 13:54, andy petrella wrote: > nope. > It's an efficient storage for genomics data :-D > > aℕdy ℙetrella > about.me/noootsab > > > > On Mon, Sep 15, 2014 at 1:52 PM, Marius Soutie

Re: Serving data

2014-09-15 Thread andy petrella
nope. It's an efficient storage for genomics data :-D aℕdy ℙetrella about.me/noootsab [image: aℕdy ℙetrella on about.me] On Mon, Sep 15, 2014 at 1:52 PM, Marius Soutier wrote: > So you are living the dream of using HDFS as a database? ;) > > On 15.09.2014, at 13:50,

Re: Serving data

2014-09-15 Thread Marius Soutier
So you are living the dream of using HDFS as a database? ;) On 15.09.2014, at 13:50, andy petrella wrote: > I'm using Parquet in ADAM, and I can say that it works pretty fine! > Enjoy ;-) > > aℕdy ℙetrella > about.me/noootsab > > > > On Mon, Sep 15, 2014 at 1:41 PM, Marius Soutier wrote: >

Re: Serving data

2014-09-15 Thread andy petrella
I'm using Parquet in ADAM, and I can say that it works pretty fine! Enjoy ;-) aℕdy ℙetrella about.me/noootsab [image: aℕdy ℙetrella on about.me] On Mon, Sep 15, 2014 at 1:41 PM, Marius Soutier wrote: > Thank you guys, I’ll try Parquet and if that’s not quick enough I

Re: Serving data

2014-09-15 Thread Marius Soutier
Thank you guys, I’ll try Parquet and if that’s not quick enough I’ll go the usual route with either read-only or normal database. On 13.09.2014, at 12:45, andy petrella wrote: > however, the cache is not guaranteed to remain, if other jobs are launched in > the cluster and require more memory

Re: Serving data

2014-09-13 Thread andy petrella
however, the cache is not guaranteed to remain, if other jobs are launched in the cluster and require more memory than what's left in the overall caching memory, previous RDDs will be discarded. Using an off heap cache like tachyon as a dump repo can help. In general, I'd say that using a persist

Re: Serving data

2014-09-13 Thread Mayur Rustagi
You can cache data in memory & query it using Spark Job Server.  Most folks dump data down to a queue/db for retrieval  You can batch up data & store into parquet partitions as well. & query it using another SparkSQL  shell, JDBC driver in SparkSQL is part 1.1 i believe.  -- Regards, Mayur Rust

Serving data

2014-09-12 Thread Marius Soutier
Hi there, I’m pretty new to Spark, and so far I’ve written my jobs the same way I wrote Scalding jobs - one-off, read data from HDFS, count words, write counts back to HDFS. Now I want to display these counts in a dashboard. Since Spark allows to cache RDDs in-memory and you have to explicitly