I have upgraded my spark job from spark 3.3.1 to spark 3.5.0, I am querying to Mysql Database and applying
`*UPPER(col) = UPPER(value)*` in the subsequent sql query. It is working as expected in spark 3.3.1 , but not working with 3.5.0. Where Condition :: `*UPPER(vn) = 'ERICSSON' AND (upper(st) = 'OPEN' OR upper(st) = 'REOPEN' OR upper(st) = 'CLOSED')*` The *st *column is ENUM in the database and it is causing the issue. Below is the Physical Plan of *FILTER* phase : For 3.3.1 : +- Filter ((upper(vn#11) = ERICSSON) AND (((upper(st#42) = OPEN) OR (upper(st#42) = REOPEN)) OR (upper(st#42) = CLOSED))) For 3.5.0 : +- Filter ((upper(vn#11) = ERICSSON) AND (((upper(staticinvoke(class org.apache.spark.sql.catalyst.util.CharVarcharCodegenUtils, StringType, readSidePadding, st#42, 13, true, false, true)) = OPEN) OR (upper(staticinvoke(class org.apache.spark.sql.catalyst.util.CharVarcharCodegenUtils, StringType, readSidePadding, st#42, 13, true, false, true)) = REOPEN)) OR (upper(staticinvoke(class org.apache.spark.sql.catalyst.util.CharVarcharCodegenUtils, StringType, readSidePadding, st#42, 13, true, false, true)) = CLOSED))) ----------------------------------------------------------------- I have debug it and found that Spark added a property in version 3.4.0 , i.e. **spark.sql.readSideCharPadding** which has default value **true**. Link to the JIRA : https://issues.apache.org/jira/browse/SPARK-40697 Added a new method in Class **CharVarcharCodegenUtils** public static UTF8String readSidePadding(UTF8String inputStr, int limit) { int numChars = inputStr.numChars(); if (numChars == limit) { return inputStr; } else if (numChars < limit) { return inputStr.rpad(limit, SPACE); } else { return inputStr; } } **This method is appending some whitespace padding to the ENUM values while reading and causing the Issue.** ----------------------------------------------------------------------- When I am removing the UPPER function from the where condition the **FILTER** Phase looks like this : +- Filter (((staticinvoke(class org.apache.spark.sql.catalyst.util.CharVarcharCodegenUtils, StringType, readSidePadding, st#42, 13, true, false, true) = OPEN ) OR (staticinvoke(class org.apache.spark.sql.catalyst.util.CharVarcharCodegenUtils, StringType, readSidePadding, st#42, 13, true, false, true) = REOPEN )) OR (staticinvoke(class org.apache.spark.sql.catalyst.util.CharVarcharCodegenUtils, StringType, readSidePadding, st#42, 13, true, false, true) = CLOSED )) **You can see it has added some white space after the value and the query runs fine giving the correct result.** But with the UPPER function I am not getting the data. ------------------------------------------------------------------ I have also tried to disable this Property *spark.sql.readSideCharPadding = false* with following cases : 1. With Upper function in where clause : It is not pushing the filters to Database and the *query works fine*. +- Filter (((upper(st#42) = OPEN) OR (upper(st#42) = REOPEN)) OR (upper(st#42) = CLOSED)) 2. But when I am removing the upper function *It is pushing the filter to Mysql with the white spaces and I am not getting the data. (THIS IS A CAUSING VERY BIG ISSUE)* PushedFilters: [*IsNotNull(vn), *EqualTo(vn,ERICSSON), *Or(Or(EqualTo(st,OPEN ),EqualTo(st,REOPEN )),EqualTo(st,CLOSED ))] I cannot move this filter to JDBC read query , also I can't remove this UPPER function in the where clause. -------------------------------------------------------------------------------------------------------- Also I found same data getting written to CASSANDRA with *PADDING .*