For DataFrame, there are also transformations and actions. And transformations are also lazily evaluated. However, DataFrame transformations like filter(), select(), agg() return a DataFrame rather than an RDD. Other methods like show() and collect() are actions.

Cheng

On 6/8/15 1:33 PM, kiran lonikar wrote:
Thanks for replying twice :) I think I sent this question by email and somehow thought I did not sent it, hence created the other one on the web interface. Lets retain this thread since you have provided more details here.

Great, it confirms my intuition about DataFrame. It's similar to Shark columnar layout, with the addition of compression. There it used java nio's ByteBuffer to hold actual data. I will go through the code you pointed.

I have another question about DataFrame: The RDD operations are divided in two groups: *transformations *which are lazily evaluated and return a new RDD and *actions *which evaluate lineage defined by transformations, invoke actions and return results. What about DataFrame operations like join, groupBy, agg, unionAll etc which are all transformations in RDD? Are they lazily evaluated or immediately executed?



Reply via email to