Hi Sam,
I think I have some answers for two questions.
> Humble request: could we replace the "isinstance(col1, str)" tests with
"isinstance(col1, basestring)"?
IMHO, yes, I believe this should be basestring. Otherwise, some functions
would not accept unicode as arguments for columns in Python 2
The corr() and cov() methods of DataFrame require an instance of str for
column names:
.
https://github.com/apache/spark/blob/master/python/pyspark/sql/dataframe.py#L1356
although instances of basestring appear to work for addressing columns:
.
https://github.com/apache/spark/blob/master/pytho