Can you try caching the individual dataframes and then union them?
It may save you time.
Thanks
Deepak
On Wed, Nov 16, 2016 at 12:35 PM, Devi P.V wrote:
> Hi all,
>
> I have 4 data frames with three columns,
>
> client_id,product_id,interest
>
> I want to combine these 4 dataframes into one dat
If you are reading all these datasets from files in persistent storage,
functions like sc.textFile can take folders/patterns as input and read all of
the files matching into the same RDD. Then you can convert it to a dataframe.
When you say it is time consuming with union, how are you measuring