On Thu, Sep 1, 2016 at 4:56 PM, Mich Talebzadeh <mich.talebza...@gmail.com> wrote: > Data Frame built on top of RDD to create as tabular format that we all love > to make the original build easily usable (say SQL like queries, column > headings etc). The drawback is it restricts you with what you can do with > Data Frame (now that you have dome RDD.toDF)
DataFrame is a Dataset[Row], literally, rather than based on an RDD. > DataSet is the new RDD with improvements on RDD. As I understand from > Sean's explanation they add some optimisation on top the common RDD. At the moment I don't think there's any particular reason to use RDDs except to interoperate with code that uses RDDs -- which is entirely valid. I believe new code would generally touch only Dataset and DataFrame otherwise. So I don't think there are really 3 elemental concepts in play as of Spark 2.x. --------------------------------------------------------------------- To unsubscribe e-mail: user-unsubscr...@spark.apache.org