Re: pyspark: accept unicode column names in DataFrame.corr and cov

2016-11-12 Thread Hyukjin Kwon
Hi Sam, I think I have some answers for two questions. > Humble request: could we replace the "isinstance(col1, str)" tests with "isinstance(col1, basestring)"? IMHO, yes, I believe this should be basestring. Otherwise, some functions would not accept unicode as arguments for columns in Python

pyspark: accept unicode column names in DataFrame.corr and cov

2016-11-11 Thread SamPenrose
The corr() and cov() methods of DataFrame require an instance of str for column names: . https://github.com/apache/spark/blob/master/python/pyspark/sql/dataframe.py#L1356 although instances of basestring appear to work for addressing columns: .