Repository: spark Updated Branches: refs/heads/master 96798d14f -> 07f390a27
[SPARK-22347][PYSPARK][DOC] Add document to notice users for using udfs with conditional expressions ## What changes were proposed in this pull request? Under the current execution mode of Python UDFs, we don't well support Python UDFs as branch values or else value in CaseWhen expression. Since to fix it might need the change not small (e.g., #19592) and this issue has simpler workaround. We should just notice users in the document about this. ## How was this patch tested? Only document change. Author: Liang-Chi Hsieh <vii...@gmail.com> Closes #19617 from viirya/SPARK-22347-3. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/07f390a2 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/07f390a2 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/07f390a2 Branch: refs/heads/master Commit: 07f390a27d7b793291c352a643d4bbd5f47294a6 Parents: 96798d1 Author: Liang-Chi Hsieh <vii...@gmail.com> Authored: Wed Nov 1 13:09:35 2017 +0100 Committer: Wenchen Fan <wenc...@databricks.com> Committed: Wed Nov 1 13:09:35 2017 +0100 ---------------------------------------------------------------------- python/pyspark/sql/functions.py | 14 ++++++++++++++ 1 file changed, 14 insertions(+) ---------------------------------------------------------------------- http://git-wip-us.apache.org/repos/asf/spark/blob/07f390a2/python/pyspark/sql/functions.py ---------------------------------------------------------------------- diff --git a/python/pyspark/sql/functions.py b/python/pyspark/sql/functions.py index 0d40368..3981549 100644 --- a/python/pyspark/sql/functions.py +++ b/python/pyspark/sql/functions.py @@ -2185,6 +2185,13 @@ def udf(f=None, returnType=StringType()): duplicate invocations may be eliminated or the function may even be invoked more times than it is present in the query. + .. note:: The user-defined functions do not support conditional execution by using them with + SQL conditional expressions such as `when` or `if`. The functions still apply on all rows no + matter the conditions are met or not. So the output is correct if the functions can be + correctly run on all rows without failure. If the functions can cause runtime failure on the + rows that do not satisfy the conditions, the suggested workaround is to incorporate the + condition logic into the functions. + :param f: python function if used as a standalone function :param returnType: a :class:`pyspark.sql.types.DataType` object @@ -2278,6 +2285,13 @@ def pandas_udf(f=None, returnType=StringType()): .. seealso:: :meth:`pyspark.sql.GroupedData.apply` .. note:: The user-defined function must be deterministic. + + .. note:: The user-defined functions do not support conditional execution by using them with + SQL conditional expressions such as `when` or `if`. The functions still apply on all rows no + matter the conditions are met or not. So the output is correct if the functions can be + correctly run on all rows without failure. If the functions can cause runtime failure on the + rows that do not satisfy the conditions, the suggested workaround is to incorporate the + condition logic into the functions. """ return _create_udf(f, returnType=returnType, pythonUdfType=PythonUdfType.PANDAS_UDF) --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org