> However, what really worries me is not having Dataset APIs at all in
Python. I think thats a deal breaker.

What is the functionality you are missing? In Spark 2.0 a DataFrame is just
an alias for Dataset[Row] ("type DataFrame = Dataset[Row]" in
core/.../o/a/s/sql/package.scala).
Since python is dynamically typed, you wouldn't really gain anything by
using Datasets anyway.

On Thu, Sep 1, 2016 at 2:20 PM, ayan guha <guha.a...@gmail.com> wrote:

> Thanks All for your replies.
>
> Feature Parity:
>
> MLLib, RDD and dataframes features are totally comparable. Streaming is
> now at par in functionality too, I believe. However, what really worries me
> is not having Dataset APIs at all in Python. I think thats a deal breaker.
>
> Performance:
> I do  get this bit when RDDs are involved, but not when Data frame is the
> only construct I am operating on.  Dataframe supposed to be
> language-agnostic in terms of performance.  So why people think python is
> slower? is it because of using UDF? Any other reason?
>
> *Is there any kind of benchmarking/stats around Python UDF vs Scala UDF
> comparison? like the one out there  b/w RDDs.*
>
> @Kant:  I am not comparing ANY applications. I am comparing SPARK
> applications only. I would be glad to hear your opinion on why pyspark
> applications will not work, if you have any benchmarks please share if
> possible.
>
>
>
>
>
> On Fri, Sep 2, 2016 at 12:57 AM, kant kodali <kanth...@gmail.com> wrote:
>
>> c'mon man this is no Brainer..Dynamic Typed Languages for Large Code
>> Bases or Large Scale Distributed Systems makes absolutely no sense. I can
>> write a 10 page essay on why that wouldn't work so great. you might be
>> wondering why would spark have it then? well probably because its ease of
>> use for ML (that would be my best guess).
>>
>>
>>
>> On Wed, Aug 31, 2016 11:45 PM, AssafMendelson assaf.mendel...@rsa.com
>> wrote:
>>
>>> I believe this would greatly depend on your use case and your
>>> familiarity with the languages.
>>>
>>>
>>>
>>> In general, scala would have a much better performance than python and
>>> not all interfaces are available in python.
>>>
>>> That said, if you are planning to use dataframes without any UDF then
>>> the performance hit is practically nonexistent.
>>>
>>> Even if you need UDF, it is possible to write those in scala and wrap
>>> them for python and still get away without the performance hit.
>>>
>>> Python does not have interfaces for UDAFs.
>>>
>>>
>>>
>>> I believe that if you have large structured data and do not generally
>>> need UDF/UDAF you can certainly work in python without losing too much.
>>>
>>>
>>>
>>>
>>>
>>> *From:* ayan guha [mailto:[hidden email]
>>> <http:///user/SendEmail.jtp?type=node&node=27637&i=0>]
>>> *Sent:* Thursday, September 01, 2016 5:03 AM
>>> *To:* user
>>> *Subject:* Scala Vs Python
>>>
>>>
>>>
>>> Hi Users
>>>
>>>
>>>
>>> Thought to ask (again and again) the question: While I am building any
>>> production application, should I use Scala or Python?
>>>
>>>
>>>
>>> I have read many if not most articles but all seems pre-Spark 2.
>>> Anything changed with Spark 2? Either pro-scala way or pro-python way?
>>>
>>>
>>>
>>> I am thinking performance, feature parity and future direction, not so
>>> much in terms of skillset or ease of use.
>>>
>>>
>>>
>>> Or, if you think it is a moot point, please say so as well.
>>>
>>>
>>>
>>> Any real life example, production experience, anecdotes, personal taste,
>>> profanity all are welcome :)
>>>
>>>
>>>
>>> --
>>>
>>> Best Regards,
>>> Ayan Guha
>>>
>>> ------------------------------
>>> View this message in context: RE: Scala Vs Python
>>> <http://apache-spark-user-list.1001560.n3.nabble.com/RE-Scala-Vs-Python-tp27637.html>
>>> Sent from the Apache Spark User List mailing list archive
>>> <http://apache-spark-user-list.1001560.n3.nabble.com/> at Nabble.com.
>>>
>>
>
>
> --
> Best Regards,
> Ayan Guha
>

Reply via email to