Re: pyspark: accept unicode column names in DataFrame.corr and cov

Hyukjin Kwon Sat, 12 Nov 2016 03:44:24 -0800

Hi Sam,

I think I have some answers for two questions.


> Humble request: could we replace the "isinstance(col1, str)" tests with
"isinstance(col1, basestring)"?

IMHO, yes, I believe this should be basestring. Otherwise, some functions
would not accept unicode as arguments for columns in Python 2.7.

> Less humble request: why test types at all? Why not just do one of {raise
KeyError, coerce to string}?

I believe argument type checking is pretty common in other Python libraries
too such as numpy.
ValueError might be more appropriate because the type of the value is not
correct.
Also, I think forcing it into string might confuse user.

If the current why is problematic and not coherent, I guess you should
change this but I think
it is okay as it is.

Thanks.

2016-11-12 9:36 GMT+09:00 SamPenrose <spenr...@mozilla.com>:

> The corr() and cov() methods of DataFrame require an instance of str for
> column names:
>
> .
> https://github.com/apache/spark/blob/master/python/
> pyspark/sql/dataframe.py#L1356
>
> although instances of basestring appear to work for addressing columns:
>
> .
> https://github.com/apache/spark/blob/master/python/
> pyspark/sql/dataframe.py#L708
>
> Humble request: could we replace the "isinstance(col1, str)" tests with
> "isinstance(col1, basestring)"?
>
> Less humble request: why test types at all? Why not just do one of {raise
> KeyError, coerce to string}?
>
> Cheers,
> Sam
>
>
>
> --
> View this message in context: http://apache-spark-user-list.
> 1001560.n3.nabble.com/pyspark-accept-unicode-column-names-
> in-DataFrame-corr-and-cov-tp28065.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe e-mail: user-unsubscr...@spark.apache.org
>
>

Re: pyspark: accept unicode column names in DataFrame.corr and cov

Reply via email to