Hi Mohit,
"Seems like the limit on parent is executed twice and return different
records each time. Not sure why it is executed twice when I mentioned only
once"
That is to be expected. Since spark follows lazy evaluation, which means
that execution only happens when you call an action, every act
Dear All,
I would like to know how, in spark 2.0, can I split a dataframe into two
dataframes when I know the exact counts the two dataframes should have. I
tried using limit but got quite weird results. Also, I am looking for exact
counts in child dfs, not the approximate % based split.
*Followi