I have one observation: is "python udf is slow due to deserialization
penulty" still relevant? Even after arrow is used as in memory data mgmt
and so heavy investment from spark dev community on making pandas first
class citizen including Udfs.

As I work with multiple clients, my exp is org culture and available people
are most imp driver for this choice regardless the use case. Use case is
relevant only when there is a feature imparity

On Sun, 11 Oct 2020 at 7:39 am, Gourav Sengupta <gourav.sengu...@gmail.com>
wrote:

> Not quite sure how meaningful this discussion is, but in case someone is
> really faced with this query the question still is 'what is the use case'?
> I am just a bit confused with the one size fits all deterministic approach
> here thought that those days were over almost 10 years ago.
> Regards
> Gourav
>
> On Sat, 10 Oct 2020, 21:24 Stephen Boesch, <java...@gmail.com> wrote:
>
>> I agree with Wim's assessment of data engineering / ETL vs Data Science.
>>   I wrote pipelines/frameworks for large companies and scala was a much
>> better choice. But for ad-hoc work interfacing directly with data science
>> experiments pyspark presents less friction.
>>
>> On Sat, 10 Oct 2020 at 13:03, Mich Talebzadeh <mich.talebza...@gmail.com>
>> wrote:
>>
>>> Many thanks everyone for their valuable contribution.
>>>
>>> We all started with Spark a few years ago where Scala was the talk
>>> of the town. I agree with the note that as long as Spark stayed nish and
>>> elite, then someone with Scala knowledge was attracting premiums. In
>>> fairness in 2014-2015, there was not much talk of Data Science input (I may
>>> be wrong). But the world has moved on so to speak. Python itself has been
>>> around a long time (long being relative here). Most people either knew UNIX
>>> Shell, C, Python or Perl or a combination of all these. I recall we had a
>>> director a few years ago who asked our Hadoop admin for root password to
>>> log in to the edge node. Later he became head of machine learning
>>> somewhere else and he loved C and Python. So Python was a gift in disguise.
>>> I think Python appeals to those who are very familiar with CLI and shell
>>> programming (Not GUI fan). As some members alluded to there are more people
>>> around with Python knowledge. Most managers choose Python as the unifying
>>> development tool because they feel comfortable with it. Frankly I have not
>>> seen a manager who feels at home with Scala. So in summary it is a bit
>>> disappointing to abandon Scala and switch to Python just for the sake of it.
>>>
>>> Disclaimer: These are opinions and not facts so to speak :)
>>>
>>> Cheers,
>>>
>>>
>>> Mich
>>>
>>>
>>>
>>>
>>>
>>>
>>> On Fri, 9 Oct 2020 at 21:56, Mich Talebzadeh <mich.talebza...@gmail.com>
>>> wrote:
>>>
>>>> I have come across occasions when the teams use Python with Spark for
>>>> ETL, for example processing data from S3 buckets into Snowflake with Spark.
>>>>
>>>> The only reason I think they are choosing Python as opposed to Scala is
>>>> because they are more familiar with Python. Since Spark is written in
>>>> Scala, itself is an indication of why I think Scala has an edge.
>>>>
>>>> I have not done one to one comparison of Spark with Scala vs Spark with
>>>> Python. I understand for data science purposes most libraries like
>>>> TensorFlow etc. are written in Python but I am at loss to understand the
>>>> validity of using Python with Spark for ETL purposes.
>>>>
>>>> These are my understanding but they are not facts so I would like to
>>>> get some informed views on this if I can?
>>>>
>>>> Many thanks,
>>>>
>>>> Mich
>>>>
>>>>
>>>>
>>>>
>>>> LinkedIn * 
>>>> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>>>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> *Disclaimer:* Use it at your own risk. Any and all responsibility for
>>>> any loss, damage or destruction of data or any other property which may
>>>> arise from relying on this email's technical content is explicitly
>>>> disclaimed. The author will in no case be liable for any monetary damages
>>>> arising from such loss, damage or destruction.
>>>>
>>>>
>>>>
>>> --
Best Regards,
Ayan Guha

Reply via email to