Rafal Wojdyla created SPARK-40363:
-------------------------------------

             Summary: Add SQL misc function to assert/check column value
                 Key: SPARK-40363
                 URL: https://issues.apache.org/jira/browse/SPARK-40363
             Project: Spark
          Issue Type: New Feature
          Components: SQL
    Affects Versions: 3.3.0
            Reporter: Rafal Wojdyla


SQL function that allows to assert a condition on a column that:
* fails when condition is not met
* returns original value otherwise

Related: SPARK-32793

But {{assert_true}} and {{raise_error}} do not really cut it. In case of 
{{assert_true}} you have to actually collect the empty column, and the check 
might no happen if you drop the assertion column, which you will likely do 
since it's empty. Having a function that returns some value as part of the 
check, in most cases it would be the checked column would be handy.

I'm working with pyspark, so here's python implementation:

{code:python}
@overload
def assert_col_condition(
    col: Union[str, Column],
    cond: Callable[[Column], Column],
    error_msg: Optional[str] = None,
) -> Column:
    """Asserts condition on a column, IFF it holds returns the original value 
under `col`"""
    ...


@overload
def assert_col_condition(
    col: Union[str, Column], cond: Column, error_msg: Optional[str] = None
) -> Column:
    """Asserts condition on a column, IFF it holds returns the original value 
under `col`"""
    ...


def assert_col_condition(
    col: Union[str, Column],
    cond: Union[Column, Callable[[Column], Column]],
    error_msg: Optional[str] = None,
) -> Column:
    col = str_to_col(col)
    if not isinstance(cond, Column):
        cond = cond(col)
    return F.when(
        ~cond, F.raise_error(error_msg or f"Assertion failed: {cond}")
    ).otherwise(col)
{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to