>
> In the near future, I guess GUI interfaces of Spark will be available
> soon. Spark users (e.g, CEOs) might not need to know what are RDDs at all.
> They can analyze their data by clicking a few buttons, instead of writing
> the programs. : )
That's not in the future. :)
On Mon, Nov 23,
Hi everyone,
I'm doing some reading-up on all the newer features of Spark such as
DataFrames, DataSets and Project Tungsten. This got me a bit confused on
the relation between all these concepts.
When starting to learn Spark, I read a book and the original paper on RDDs,
this lead me to
Here is how I view the relationship between the various components of Spark:
- *RDDs - *a low level API for expressing DAGs that will be executed in
parallel by Spark workers
- *Catalyst -* an internal library for expressing trees that we use to
build relational algebra and expression
Let me share my understanding.
If we view Spark as analytics OS, RDD APIs are like OS system calls. These
low-level system calls can be called in the program languages like C.
DataFrame and Dataset APIs are like higher-level programming languages.
They hide the low level complexity and the
Thanks Michael, that helped me a lot!
On 23 November 2015 at 17:47, Michael Armbrust
wrote:
> Here is how I view the relationship between the various components of
> Spark:
>
> - *RDDs - *a low level API for expressing DAGs that will be executed in
> parallel by Spark