I would not leave it to data scientists unless they will maintain it.

The key decision in cases I've seen was usually people
cost/availability with ETL operations cost taken into account.

Often the situation is that ETL cloud cost is small and you will not
save much. Then it is just skills cost/availability.
For Python skills you pay less and you can pick people with other
useful skills and also you can more easily train people you have
internally.

Often you have some simple ETL scripts before moving to spark and
these scripts are usually written in Python.

Best Regards,

Jacek


sob., 10 paź 2020 o 12:32 Jörn Franke <jornfra...@gmail.com> napisał(a):
>
> It really depends on what your data scientists talk. I don’t think it makes 
> sense for ad hoc data science things to impose a language on them, but let 
> them choose.
> For more complex AI engineering things you can though apply different 
> standards and criteria. And then it really depends on architecture aspects 
> etc.
>
> Am 09.10.2020 um 22:57 schrieb Mich Talebzadeh <mich.talebza...@gmail.com>:
>
> 
> I have come across occasions when the teams use Python with Spark for ETL, 
> for example processing data from S3 buckets into Snowflake with Spark.
>
> The only reason I think they are choosing Python as opposed to Scala is 
> because they are more familiar with Python. Since Spark is written in Scala, 
> itself is an indication of why I think Scala has an edge.
>
> I have not done one to one comparison of Spark with Scala vs Spark with 
> Python. I understand for data science purposes most libraries like TensorFlow 
> etc. are written in Python but I am at loss to understand the validity of 
> using Python with Spark for ETL purposes.
>
> These are my understanding but they are not facts so I would like to get some 
> informed views on this if I can?
>
> Many thanks,
>
> Mich
>
>
>
>
> LinkedIn  
> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>
>
>
>
>
> Disclaimer: Use it at your own risk. Any and all responsibility for any loss, 
> damage or destruction of data or any other property which may arise from 
> relying on this email's technical content is explicitly disclaimed. The 
> author will in no case be liable for any monetary damages arising from such 
> loss, damage or destruction.
>
>

---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscr...@spark.apache.org

Reply via email to