[spark] branch branch-2.4 updated: [SPARK-29240][PYTHON] Pass Py4J column instance to support PySpark column in element_at function

dongjoon Fri, 27 Sep 2019 11:06:40 -0700

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch branch-2.4
in repository https://gitbox.apache.org/repos/asf/spark.git



The following commit(s) were added to refs/heads/branch-2.4 by this push:
     new 361b605  [SPARK-29240][PYTHON] Pass Py4J column instance to support 
PySpark column in element_at function
361b605 is described below

commit 361b605eeb614e14977f81682d54ba94327280d3
Author: HyukjinKwon <gurwls...@apache.org>
AuthorDate: Fri Sep 27 11:04:55 2019 -0700

    [SPARK-29240][PYTHON] Pass Py4J column instance to support PySpark column 
in element_at function
    
    ### What changes were proposed in this pull request?
    
    This PR makes `element_at` in PySpark able to take PySpark `Column` 
instances.
    
    ### Why are the changes needed?
    
    To match with Scala side. Seems it was intended but not working correctly 
as a bug.
    
    ### Does this PR introduce any user-facing change?
    
    Yes. See below:
    
    ```python
    from pyspark.sql import functions as F
    x = 
spark.createDataFrame([([1,2,3],1),([4,5,6],2),([7,8,9],3)],['list','num'])
    x.withColumn('aa',F.element_at('list',x.num.cast('int'))).show()
    ```
    
    Before:
    
    ```
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
      File "/.../spark/python/pyspark/sql/functions.py", line 2059, in 
element_at
        return Column(sc._jvm.functions.element_at(_to_java_column(col), 
extraction))
      File "/.../spark/python/lib/py4j-0.10.8.1-src.zip/py4j/java_gateway.py", 
line 1277, in __call__
      File "/.../spark/python/lib/py4j-0.10.8.1-src.zip/py4j/java_gateway.py", 
line 1241, in _build_args
      File "/.../spark/python/lib/py4j-0.10.8.1-src.zip/py4j/java_gateway.py", 
line 1228, in _get_args
      File 
"/.../forked/spark/python/lib/py4j-0.10.8.1-src.zip/py4j/java_collections.py", 
line 500, in convert
      File "/.../spark/python/pyspark/sql/column.py", line 344, in __iter__
        raise TypeError("Column is not iterable")
    TypeError: Column is not iterable
    ```
    
    After:
    
    ```
    +---------+---+---+
    |     list|num| aa|
    +---------+---+---+
    |[1, 2, 3]|  1|  1|
    |[4, 5, 6]|  2|  5|
    |[7, 8, 9]|  3|  9|
    +---------+---+---+
    ```
    
    ### How was this patch tested?
    
    Manually tested against literal, Python native types, and PySpark column.
    
    Closes #25950 from HyukjinKwon/SPARK-29240.
    
    Authored-by: HyukjinKwon <gurwls...@apache.org>
    Signed-off-by: Dongjoon Hyun <dh...@apple.com>
    (cherry picked from commit fda0e6e48d00a1ba8e9d41d7670b3ad3c6951492)
    Signed-off-by: Dongjoon Hyun <dh...@apple.com>
---
 python/pyspark/sql/functions.py | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/python/pyspark/sql/functions.py b/python/pyspark/sql/functions.py
index 3833746..069354e 100644
--- a/python/pyspark/sql/functions.py
+++ b/python/pyspark/sql/functions.py
@@ -1990,11 +1990,12 @@ def element_at(col, extraction):
     [Row(element_at(data, 1)=u'a'), Row(element_at(data, 1)=None)]
 
     >>> df = spark.createDataFrame([({"a": 1.0, "b": 2.0},), ({},)], ['data'])
-    >>> df.select(element_at(df.data, "a")).collect()
+    >>> df.select(element_at(df.data, lit("a"))).collect()
     [Row(element_at(data, a)=1.0), Row(element_at(data, a)=None)]
     """
     sc = SparkContext._active_spark_context
-    return Column(sc._jvm.functions.element_at(_to_java_column(col), 
extraction))
+    return Column(sc._jvm.functions.element_at(
+        _to_java_column(col), lit(extraction)._jc))  # noqa: F821 'lit' is 
dynamically defined.
 
 
 @since(2.4)


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-2.4 updated: [SPARK-29240][PYTHON] Pass Py4J column instance to support PySpark column in element_at function

Reply via email to