On Thu, Sep 1, 2016 at 4:56 PM, Mich Talebzadeh
<mich.talebza...@gmail.com> wrote:
> Data Frame built on top of RDD to create as tabular format that we all love
> to make the original build easily usable (say SQL like queries, column
> headings etc). The drawback is it restricts you with what you can do with
> Data Frame (now that you have dome RDD.toDF)

DataFrame is a Dataset[Row], literally, rather than based on an RDD.

> DataSet  is the new RDD with improvements on RDD. As I understand from
> Sean's explanation they add some optimisation on top the common RDD.

At the moment I don't think there's any particular reason to use RDDs
except to interoperate with code that uses RDDs -- which is entirely
valid. I believe new code would generally touch only Dataset and
DataFrame otherwise. So I don't think there are really 3 elemental
concepts in play as of Spark 2.x.

To unsubscribe e-mail: user-unsubscr...@spark.apache.org

Reply via email to