Hi Ceasar I have used Brandson approach in the past with out any problem
Andy From: Brandon Geise <[email protected]> Date: Thursday, April 5, 2018 at 11:23 AM To: Cesar <[email protected]>, "user @spark" <[email protected]> Subject: Re: Union of multiple data frames > Maybe something like > > var finalDF = spark.sqlContext.emptyDataFrame > for (df <- dfs){ > finalDF = finalDF.union(df) > } > > > Where dfs is a Seq of dataframes. > > > From: Cesar <[email protected]> > Date: Thursday, April 5, 2018 at 2:17 PM > To: user <[email protected]> > Subject: Union of multiple data frames > > > > > > The following code works for small n, but not for large n (>20): > > > > val dfUnion = Seq(df1,df2,df3,...dfn).reduce(_ union _) > > dfUnion.show() > > > > By not working, I mean that Spark takes a lot of time to create the execution > plan. > > > > Is there a more optimal way to perform a union of multiple data frames? > > > > thanks > -- > > Cesar Flores
