Hi all,

The default behaviour of Spark is to add a null value for casts that fail,
unless ANSI SQL is enabled, SPARK-30292
<https://issues.apache.org/jira/browse/SPARK-30292>.

Whilst I understand that this is a subset of ANSI compliant behaviour, I
don't understand why this feature is so coupled. Enabling ANSI also comes
with other consequences that fall outside casting behaviour, and not all
Spark operations are done via the SQL interface (i.e. spark.sql("") ).

I can imagine it's a pretty useful feature to have something like an extra
arg that would raise an exception if casting fails (e.g. *df.age.cast("int",
raise=True)* ) without enabling ANSI as an option.

Does anyone know why this approach was chosen/have I missed something?
Would others find something like this useful?

Thanks,
Yeachan

Reply via email to