I don't think there is a definitive right or wrong approach here. The SLS
feature would not have been added to Spark if there was no real need for it,
and AFAIK it required quite a bit of refactoring of Spark internals. So I'm
sure this discussion was already made in the developers community :)
May I ask why the ETL job and DL ( Assuming you mean deep learning here)
task can not be run as 2 separate spark job?
IMHO it is better practice to split up entire pipeline into logical steps
and orchestrate them.
That way you can pick your profile as you need for 2 very different type of
workloa
Hi all,
I am trying to read data (using spark sql) via a hive metastore which has a
column of type bigint. Underlying parquet data has int as the datatype for
the same column. I am getting the following error while trying to read the
data using spark sql -
java.lang.ClassCastException: org.apache.