[ https://issues.apache.org/jira/browse/SPARK-28630?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Hyukjin Kwon resolved SPARK-28630. ---------------------------------- Resolution: Invalid Use {{unionByName}} instead. > Union fails when column order is different > ------------------------------------------ > > Key: SPARK-28630 > URL: https://issues.apache.org/jira/browse/SPARK-28630 > Project: Spark > Issue Type: Improvement > Components: Spark Core > Affects Versions: 2.2.3 > Reporter: nirav patel > Priority: Major > > I am trying to union 2 df which has same number of columns and types but in > different order. It fails. > val df1 = sparkSession.sparkContext.parallelize(List( > (1, 5, true), > (2, 3, false), > (4, 4, true) > )).toDF("id","age", "vaccinated") > > val df3 = sparkSession.sparkContext.parallelize(List( > (1, true, 6), > (2, false, 3), > (3, false, 2) > )).toDF("id","vaccinated", "age") > df1.union(df3) > > Actual output: > org.apache.spark.sql.AnalysisException: Union can only be performed on tables > with the same number of columns > Expected output: > > It should read schema (column names and/Or types ). I can see sometime you > want to ignore column name and just want to do merge based on types. So May > be introduce an option whether to use names followed by type or just type > while doing merge. -- This message was sent by Atlassian JIRA (v7.6.14#76016) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org