Aravind B created SPARK-12556: ---------------------------------- Summary: Pyspark dataframe unionAll call accepts incorrect input Key: SPARK-12556 URL: https://issues.apache.org/jira/browse/SPARK-12556 Project: Spark Issue Type: Bug Components: PySpark Affects Versions: 1.4.1 Reporter: Aravind B
I actually encountered this problem with two dataframes that have 8 and 10 columns each. The below is a made up example that reproduces what I observed going wrong. Consider the two dataframes: df1: +-------+----------+ |id | count| +-------+----------+ +-------+----------+ df2: +-------+---------+----------+ |id |new_count| count| +-------+---------+----------+ | 1| 4| 6| | 1| 5| 6| | 3| 6| 6| | 2| 7| 6| +-------+---------+----------+ The call: df3 = df1.unionAll(df2) returns successfully with df3 containing 2 cloumns. However, some columns now have swapped values (with other columns). Based on my previous experience I would say that df3's count column will actually be the new_count column. I believe that this call should never complete successfully in the first place and should throw an exception instead. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org