[ 
https://issues.apache.org/jira/browse/SPARK-28630?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-28630.
----------------------------------
    Resolution: Invalid

Use {{unionByName}} instead.

> Union fails when column order is different
> ------------------------------------------
>
>                 Key: SPARK-28630
>                 URL: https://issues.apache.org/jira/browse/SPARK-28630
>             Project: Spark
>          Issue Type: Improvement
>          Components: Spark Core
>    Affects Versions: 2.2.3
>            Reporter: nirav patel
>            Priority: Major
>
> I am trying to union 2 df which has same number of columns and types but in 
> different order. It fails.
>       val df1 = sparkSession.sparkContext.parallelize(List(
>         (1, 5, true),
>         (2, 3, false),
>         (4, 4, true)
>       )).toDF("id","age", "vaccinated")
>  
>       val df3 = sparkSession.sparkContext.parallelize(List(
>         (1,  true, 6),
>         (2, false, 3),
>         (3, false, 2)
>       )).toDF("id","vaccinated", "age")
> df1.union(df3)
>  
> Actual output:
> org.apache.spark.sql.AnalysisException: Union can only be performed on tables 
> with the same number of columns
> Expected output:
>  
> It should read schema (column names and/Or types ). I can see sometime you 
> want to ignore column name and just want to do merge based on types. So May 
> be introduce an option whether to use names followed by type or just type 
> while doing merge.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to