Maybe something like
var finalDF = spark.sqlContext.emptyDataFrame
for (df <- dfs){
finalDF = finalDF.union(df)
}
Where dfs is a Seq of dataframes.
From: Cesar <[email protected]>
Date: Thursday, April 5, 2018 at 2:17 PM
To: user <[email protected]>
Subject: Union of multiple data frames
The following code works for small n, but not for large n (>20):
val dfUnion = Seq(df1,df2,df3,...dfn).reduce(_ union _)
dfUnion.show()
By not working, I mean that Spark takes a lot of time to create the execution
plan.
Is there a more optimal way to perform a union of multiple data frames?
thanks
--
Cesar Flores