Re: Spark 2.0: Unify DataFrames and Datasets question

Michael Armbrust Tue, 14 Jun 2016 10:45:50 -0700

>
> 1) What does this really mean to an Application developer?
>

It means there are less concepts to learn.



> 2) Why this unification was needed in Spark 2.0?
>

To simplify the API and reduce the number of concepts that needed to be
learned.  We only didn't do it in 1.6 because we didn't want to break
binary compatibility in a minor release.


> 3) What changes can be observed in Spark 2.0 vs Spark 1.6?
>

There is no DataFrame class, all methods are still available, except those
that returned an RDD (now you can call df.rdd.map if that is still what you
want)


> 4) Compile time safety will be there for DataFrames too?
>

Slide 7


> 5) Python API is supported for Datasets in 2.0?
>

Slide 10

Re: Spark 2.0: Unify DataFrames and Datasets question

Reply via email to