Hi Sam, I think I have some answers for two questions.
> Humble request: could we replace the "isinstance(col1, str)" tests with "isinstance(col1, basestring)"? IMHO, yes, I believe this should be basestring. Otherwise, some functions would not accept unicode as arguments for columns in Python 2.7. > Less humble request: why test types at all? Why not just do one of {raise KeyError, coerce to string}? I believe argument type checking is pretty common in other Python libraries too such as numpy. ValueError might be more appropriate because the type of the value is not correct. Also, I think forcing it into string might confuse user. If the current why is problematic and not coherent, I guess you should change this but I think it is okay as it is. Thanks. 2016-11-12 9:36 GMT+09:00 SamPenrose <spenr...@mozilla.com>: > The corr() and cov() methods of DataFrame require an instance of str for > column names: > > . > https://github.com/apache/spark/blob/master/python/ > pyspark/sql/dataframe.py#L1356 > > although instances of basestring appear to work for addressing columns: > > . > https://github.com/apache/spark/blob/master/python/ > pyspark/sql/dataframe.py#L708 > > Humble request: could we replace the "isinstance(col1, str)" tests with > "isinstance(col1, basestring)"? > > Less humble request: why test types at all? Why not just do one of {raise > KeyError, coerce to string}? > > Cheers, > Sam > > > > -- > View this message in context: http://apache-spark-user-list. > 1001560.n3.nabble.com/pyspark-accept-unicode-column-names- > in-DataFrame-corr-and-cov-tp28065.html > Sent from the Apache Spark User List mailing list archive at Nabble.com. > > --------------------------------------------------------------------- > To unsubscribe e-mail: user-unsubscr...@spark.apache.org > >