spark git commit: [SPARK-22347][PYSPARK][DOC] Add document to notice users for using udfs with conditional expressions

wenchen Wed, 01 Nov 2017 05:09:52 -0700

Repository: spark
Updated Branches:
  refs/heads/master 96798d14f -> 07f390a27



[SPARK-22347][PYSPARK][DOC] Add document to notice users for using udfs with 
conditional expressions

## What changes were proposed in this pull request?

Under the current execution mode of Python UDFs, we don't well support Python 
UDFs as branch values or else value in CaseWhen expression.

Since to fix it might need the change not small (e.g., #19592) and this issue 
has simpler workaround. We should just notice users in the document about this.

## How was this patch tested?

Only document change.

Author: Liang-Chi Hsieh <vii...@gmail.com>

Closes #19617 from viirya/SPARK-22347-3.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/07f390a2
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/07f390a2
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/07f390a2

Branch: refs/heads/master
Commit: 07f390a27d7b793291c352a643d4bbd5f47294a6
Parents: 96798d1
Author: Liang-Chi Hsieh <vii...@gmail.com>
Authored: Wed Nov 1 13:09:35 2017 +0100
Committer: Wenchen Fan <wenc...@databricks.com>
Committed: Wed Nov 1 13:09:35 2017 +0100

----------------------------------------------------------------------
 python/pyspark/sql/functions.py | 14 ++++++++++++++
 1 file changed, 14 insertions(+)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/spark/blob/07f390a2/python/pyspark/sql/functions.py
----------------------------------------------------------------------
diff --git a/python/pyspark/sql/functions.py b/python/pyspark/sql/functions.py
index 0d40368..3981549 100644
--- a/python/pyspark/sql/functions.py
+++ b/python/pyspark/sql/functions.py
@@ -2185,6 +2185,13 @@ def udf(f=None, returnType=StringType()):
         duplicate invocations may be eliminated or the function may even be 
invoked more times than
         it is present in the query.
 
+    .. note:: The user-defined functions do not support conditional execution 
by using them with
+        SQL conditional expressions such as `when` or `if`. The functions 
still apply on all rows no
+        matter the conditions are met or not. So the output is correct if the 
functions can be
+        correctly run on all rows without failure. If the functions can cause 
runtime failure on the
+        rows that do not satisfy the conditions, the suggested workaround is 
to incorporate the
+        condition logic into the functions.
+
     :param f: python function if used as a standalone function
     :param returnType: a :class:`pyspark.sql.types.DataType` object
 
@@ -2278,6 +2285,13 @@ def pandas_udf(f=None, returnType=StringType()):
        .. seealso:: :meth:`pyspark.sql.GroupedData.apply`
 
     .. note:: The user-defined function must be deterministic.
+
+    .. note:: The user-defined functions do not support conditional execution 
by using them with
+        SQL conditional expressions such as `when` or `if`. The functions 
still apply on all rows no
+        matter the conditions are met or not. So the output is correct if the 
functions can be
+        correctly run on all rows without failure. If the functions can cause 
runtime failure on the
+        rows that do not satisfy the conditions, the suggested workaround is 
to incorporate the
+        condition logic into the functions.
     """
     return _create_udf(f, returnType=returnType, 
pythonUdfType=PythonUdfType.PANDAS_UDF)
 


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

spark git commit: [SPARK-22347][PYSPARK][DOC] Add document to notice users for using udfs with conditional expressions

Reply via email to