Emil Ejbyfeldt created SPARK-47927: -------------------------------------- Summary: Nullability after join not respected in UDF Key: SPARK-47927 URL: https://issues.apache.org/jira/browse/SPARK-47927 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 3.4.3, 3.5.1, 4.0.0 Reporter: Emil Ejbyfeldt
{code:java} val ds1 = Seq(1).toDS() val ds2 = Seq[Int]().toDS() val f = udf[(Int, Option[Int]), (Int, Option[Int])](identity) ds1.join(ds2, ds1("value") === ds2("value"), "outer").select(f(struct(ds1("value"), ds2("value")))).show() ds1.join(ds2, ds1("value") === ds2("value"), "outer").select(struct(ds1("value"), ds2("value"))).show() {code} outputs {code:java} +---------------------------------------+ |UDF(struct(value, value, value, value))| +---------------------------------------+ | {1, 0}| +---------------------------------------+ +--------------------+ |struct(value, value)| +--------------------+ | {1, NULL}| +--------------------+ {code} So when the result is passed to UDF the null-ability after the the join is not respected and we incorrectly end up with a 0 value instead of a null/None value. -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org