Olexiy Oryeshko created SPARK-31600: ---------------------------------------
Summary: Error message from DataFrame creation is misleading. Key: SPARK-31600 URL: https://issues.apache.org/jira/browse/SPARK-31600 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 2.4.5 Environment: DataBricks 6.4, Spark 2.4.5, Scala 2.11 Reporter: Olexiy Oryeshko *Description:* DataFrame creation from pandas.DataFrame fails when one of the features contains only NaN values (which is ok). However, error message mentions wrong feature as the culprit, which makes it hard to find the root cause. *How to reproduce:* {code:java} import numpy as np import pandas as pd df2 = pd.DataFrame({'a': np.array([np.nan, np.nan], dtype=np.object_), 'b': [np.nan, 'aaa']}) display(spark.createDataFrame(df2[['b']])) # Works fine spark.createDataFrame(df2) # Raises TypeError. {code} In the code above, column 'a' is bad. However, the `TypeError` raised in the last command mentions feature 'b' as the culprit: TypeError: field b: Can not merge type <class 'pyspark.sql.types.DoubleType'> and <class 'pyspark.sql.types.StringType'> -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org