Thanks for your answers. The suggested method works when the number of Data Frames is small.
However, I am trying to union >30 Data Frames, and the time to create the plan is taking longer than the execution, which should not be the case. Thanks! -- Cesar On Thu, Apr 5, 2018 at 1:29 PM, Andy Davidson <[email protected] > wrote: > > Hi Ceasar > > I have used Brandson approach in the past with out any problem > > Andy > From: Brandon Geise <[email protected]> > Date: Thursday, April 5, 2018 at 11:23 AM > To: Cesar <[email protected]>, "user @spark" <[email protected]> > Subject: Re: Union of multiple data frames > > Maybe something like > > > > var finalDF = spark.sqlContext.emptyDataFrame > > for (df <- dfs){ > > finalDF = finalDF.union(df) > > } > > > > > > Where dfs is a Seq of dataframes. > > > > *From: *Cesar <[email protected]> > *Date: *Thursday, April 5, 2018 at 2:17 PM > *To: *user <[email protected]> > *Subject: *Union of multiple data frames > > > > > > The following code works for small n, but not for large n (>20): > > > > val dfUnion = Seq(df1,df2,df3,...dfn).reduce(_ union _) > > dfUnion.show() > > > > By not working, I mean that Spark takes a lot of time to create the > execution plan. > > > > *Is there a more optimal way to perform a union of multiple data frames?* > > > > > thanks > > -- > > Cesar Flores > > -- Cesar Flores
