The corr() and cov() methods of DataFrame require an instance of str for column names:
. https://github.com/apache/spark/blob/master/python/pyspark/sql/dataframe.py#L1356 although instances of basestring appear to work for addressing columns: . https://github.com/apache/spark/blob/master/python/pyspark/sql/dataframe.py#L708 Humble request: could we replace the "isinstance(col1, str)" tests with "isinstance(col1, basestring)"? Less humble request: why test types at all? Why not just do one of {raise KeyError, coerce to string}? Cheers, Sam -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/pyspark-accept-unicode-column-names-in-DataFrame-corr-and-cov-tp28065.html Sent from the Apache Spark User List mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe e-mail: user-unsubscr...@spark.apache.org