Re: Relation between RDDs, DataFrames and Project Tungsten

2015-11-23 Thread Mark Hamstra
> > In the near future, I guess GUI interfaces of Spark will be available > soon. Spark users (e.g, CEOs) might not need to know what are RDDs at all. > They can analyze their data by clicking a few buttons, instead of writing > the programs. : ) That's not in the future. :) On Mon, Nov 23,

Relation between RDDs, DataFrames and Project Tungsten

2015-11-23 Thread Jakob Odersky
Hi everyone, I'm doing some reading-up on all the newer features of Spark such as DataFrames, DataSets and Project Tungsten. This got me a bit confused on the relation between all these concepts. When starting to learn Spark, I read a book and the original paper on RDDs, this lead me to

Re: Relation between RDDs, DataFrames and Project Tungsten

2015-11-23 Thread Michael Armbrust
Here is how I view the relationship between the various components of Spark: - *RDDs - *a low level API for expressing DAGs that will be executed in parallel by Spark workers - *Catalyst -* an internal library for expressing trees that we use to build relational algebra and expression

Re: Relation between RDDs, DataFrames and Project Tungsten

2015-11-23 Thread Xiao Li
Let me share my understanding. If we view Spark as analytics OS, RDD APIs are like OS system calls. These low-level system calls can be called in the program languages like C. DataFrame and Dataset APIs are like higher-level programming languages. They hide the low level complexity and the

Re: Relation between RDDs, DataFrames and Project Tungsten

2015-11-23 Thread Jakob Odersky
Thanks Michael, that helped me a lot! On 23 November 2015 at 17:47, Michael Armbrust wrote: > Here is how I view the relationship between the various components of > Spark: > > - *RDDs - *a low level API for expressing DAGs that will be executed in > parallel by Spark