This is an automated email from the ASF dual-hosted git repository.

ruifengz pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
     new f6e4a466705 [SPARK-46063][PYTHON][CONNECT] Improve error messages 
related to argument types in cute, rollup, groupby, and pivot
f6e4a466705 is described below

commit f6e4a4667057e226a06b4d1b063a62b698ffb25f
Author: Hyukjin Kwon <gurwls...@apache.org>
AuthorDate: Thu Nov 23 15:33:15 2023 +0800

    [SPARK-46063][PYTHON][CONNECT] Improve error messages related to argument 
types in cute, rollup, groupby, and pivot
    
    ### What changes were proposed in this pull request?
    
    This PR  improves error messages related to argument types in `cute`, 
`rollup`, `groupBy`, and `pivot`.
    
    ```bash
    ./bin/pyspark --remote local
    ```
    
    ```python
    >>> help(spark.range(1).cube)
    Help on method cube in module pyspark.sql.connect.dataframe:
    
    cube(*cols: 'ColumnOrName') -> 'GroupedData' method of 
pyspark.sql.connect.dataframe.DataFrame instance
        Create a multi-dimensional cube for the current :class:`DataFrame` using
        the specified columns, allowing aggregations to be performed on them.
    ...
    ```
    
    **Before:**
    
    ```python
    >>> spark.range(1).cube(1.2)
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
      File "/.../python/pyspark/sql/connect/dataframe.py", line 544, in cube
        raise PySparkTypeError(
    pyspark.errors.exceptions.base.PySparkTypeError: [NOT_COLUMN_OR_STR] 
Argument `cube` should be a Column or str, got float.
    ```
    
    **After:**
    
    ```
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
      File "/.../python/pyspark/sql/connect/dataframe.py", line 544, in cube
        raise PySparkTypeError(
    pyspark.errors.exceptions.base.PySparkTypeError: [NOT_COLUMN_OR_STR] 
Argument `cols` should be a Column or str, got float.
    ```
    
    ### Why are the changes needed?
    
    For better error messages to end users.
    
    ### Does this PR introduce _any_ user-facing change?
    
    Yes, it fixes the user-facing error message.
    
    ### How was this patch tested?
    
    Manually tested.
    
    ### Was this patch authored or co-authored using generative AI tooling?
    
    No.
    
    Closes #43968 from HyukjinKwon/SPARK-46063.
    
    Authored-by: Hyukjin Kwon <gurwls...@apache.org>
    Signed-off-by: Ruifeng Zheng <ruife...@apache.org>
---
 python/pyspark/sql/connect/dataframe.py | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/python/pyspark/sql/connect/dataframe.py 
b/python/pyspark/sql/connect/dataframe.py
index c713bb85c1e..c7b51205363 100644
--- a/python/pyspark/sql/connect/dataframe.py
+++ b/python/pyspark/sql/connect/dataframe.py
@@ -495,7 +495,7 @@ class DataFrame:
             else:
                 raise PySparkTypeError(
                     error_class="NOT_COLUMN_OR_STR",
-                    message_parameters={"arg_name": "groupBy", "arg_type": 
type(c).__name__},
+                    message_parameters={"arg_name": "cols", "arg_type": 
type(c).__name__},
                 )
 
         return GroupedData(df=self, group_type="groupby", grouping_cols=_cols)
@@ -520,7 +520,7 @@ class DataFrame:
             else:
                 raise PySparkTypeError(
                     error_class="NOT_COLUMN_OR_STR",
-                    message_parameters={"arg_name": "rollup", "arg_type": 
type(c).__name__},
+                    message_parameters={"arg_name": "cols", "arg_type": 
type(c).__name__},
                 )
 
         return GroupedData(df=self, group_type="rollup", grouping_cols=_cols)
@@ -543,7 +543,7 @@ class DataFrame:
             else:
                 raise PySparkTypeError(
                     error_class="NOT_COLUMN_OR_STR",
-                    message_parameters={"arg_name": "cube", "arg_type": 
type(c).__name__},
+                    message_parameters={"arg_name": "cols", "arg_type": 
type(c).__name__},
                 )
 
         return GroupedData(df=self, group_type="cube", grouping_cols=_cols)


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

Reply via email to