This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch branch-2.4 in repository https://gitbox.apache.org/repos/asf/spark.git
The following commit(s) were added to refs/heads/branch-2.4 by this push: new 361b605 [SPARK-29240][PYTHON] Pass Py4J column instance to support PySpark column in element_at function 361b605 is described below commit 361b605eeb614e14977f81682d54ba94327280d3 Author: HyukjinKwon <gurwls...@apache.org> AuthorDate: Fri Sep 27 11:04:55 2019 -0700 [SPARK-29240][PYTHON] Pass Py4J column instance to support PySpark column in element_at function ### What changes were proposed in this pull request? This PR makes `element_at` in PySpark able to take PySpark `Column` instances. ### Why are the changes needed? To match with Scala side. Seems it was intended but not working correctly as a bug. ### Does this PR introduce any user-facing change? Yes. See below: ```python from pyspark.sql import functions as F x = spark.createDataFrame([([1,2,3],1),([4,5,6],2),([7,8,9],3)],['list','num']) x.withColumn('aa',F.element_at('list',x.num.cast('int'))).show() ``` Before: ``` Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/.../spark/python/pyspark/sql/functions.py", line 2059, in element_at return Column(sc._jvm.functions.element_at(_to_java_column(col), extraction)) File "/.../spark/python/lib/py4j-0.10.8.1-src.zip/py4j/java_gateway.py", line 1277, in __call__ File "/.../spark/python/lib/py4j-0.10.8.1-src.zip/py4j/java_gateway.py", line 1241, in _build_args File "/.../spark/python/lib/py4j-0.10.8.1-src.zip/py4j/java_gateway.py", line 1228, in _get_args File "/.../forked/spark/python/lib/py4j-0.10.8.1-src.zip/py4j/java_collections.py", line 500, in convert File "/.../spark/python/pyspark/sql/column.py", line 344, in __iter__ raise TypeError("Column is not iterable") TypeError: Column is not iterable ``` After: ``` +---------+---+---+ | list|num| aa| +---------+---+---+ |[1, 2, 3]| 1| 1| |[4, 5, 6]| 2| 5| |[7, 8, 9]| 3| 9| +---------+---+---+ ``` ### How was this patch tested? Manually tested against literal, Python native types, and PySpark column. Closes #25950 from HyukjinKwon/SPARK-29240. Authored-by: HyukjinKwon <gurwls...@apache.org> Signed-off-by: Dongjoon Hyun <dh...@apple.com> (cherry picked from commit fda0e6e48d00a1ba8e9d41d7670b3ad3c6951492) Signed-off-by: Dongjoon Hyun <dh...@apple.com> --- python/pyspark/sql/functions.py | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/python/pyspark/sql/functions.py b/python/pyspark/sql/functions.py index 3833746..069354e 100644 --- a/python/pyspark/sql/functions.py +++ b/python/pyspark/sql/functions.py @@ -1990,11 +1990,12 @@ def element_at(col, extraction): [Row(element_at(data, 1)=u'a'), Row(element_at(data, 1)=None)] >>> df = spark.createDataFrame([({"a": 1.0, "b": 2.0},), ({},)], ['data']) - >>> df.select(element_at(df.data, "a")).collect() + >>> df.select(element_at(df.data, lit("a"))).collect() [Row(element_at(data, a)=1.0), Row(element_at(data, a)=None)] """ sc = SparkContext._active_spark_context - return Column(sc._jvm.functions.element_at(_to_java_column(col), extraction)) + return Column(sc._jvm.functions.element_at( + _to_java_column(col), lit(extraction)._jc)) # noqa: F821 'lit' is dynamically defined. @since(2.4) --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org