Re: what is the optimized way to combine multiple dataframes into one dataframe ?

2016-11-16 Thread Deepak Sharma
Can you try caching the individual dataframes and then union them? It may save you time. Thanks Deepak On Wed, Nov 16, 2016 at 12:35 PM, Devi P.V wrote: > Hi all, > > I have 4 data frames with three columns, > > client_id,product_id,interest > > I want to combine these 4 dataframes into one dat

RE: what is the optimized way to combine multiple dataframes into one dataframe ?

2016-11-15 Thread Shreya Agarwal
If you are reading all these datasets from files in persistent storage, functions like sc.textFile can take folders/patterns as input and read all of the files matching into the same RDD. Then you can convert it to a dataframe. When you say it is time consuming with union, how are you measuring