Emil Ejbyfeldt created SPARK-47927:
--------------------------------------

             Summary: Nullability after join not respected in UDF
                 Key: SPARK-47927
                 URL: https://issues.apache.org/jira/browse/SPARK-47927
             Project: Spark
          Issue Type: Bug
          Components: SQL
    Affects Versions: 3.4.3, 3.5.1, 4.0.0
            Reporter: Emil Ejbyfeldt


{code:java}
val ds1 = Seq(1).toDS()
val ds2 = Seq[Int]().toDS()
val f = udf[(Int, Option[Int]), (Int, Option[Int])](identity)
ds1.join(ds2, ds1("value") === ds2("value"), 
"outer").select(f(struct(ds1("value"), ds2("value")))).show()
ds1.join(ds2, ds1("value") === ds2("value"), 
"outer").select(struct(ds1("value"), ds2("value"))).show() {code}
outputs
{code:java}
+---------------------------------------+
|UDF(struct(value, value, value, value))|
+---------------------------------------+
|                                 {1, 0}|
+---------------------------------------+

+--------------------+
|struct(value, value)|
+--------------------+
|           {1, NULL}|
+--------------------+ {code}

So when the result is passed to UDF the null-ability after the the join is not 
respected and we incorrectly end up with a 0 value instead of a null/None value.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to