Regarding features, the general workflow for the Spark community when adding
new features is to first add them in Scala (since Spark is written in
Scala). Once this is done, a Jira ticket will be created requesting that the
feature be added to the Python API (example -  SPARK-9773
<https://issues.apache.org/jira/browse/SPARK-9773>  ). Some of these Python
API tickets get done very quickly, some don't. As such, the Scala API will
always be more feature rich from a Spark perspective, while the Python API
can lag behind in some cases. In general, the intent is to make the PySpark
API contain all features of the Scala API, since Python is considered a
first class citizen in the Spark community; the difference is that if you
need the latest and greatest and need it right away, Scala is the best
choice.

Regarding performance, others have said it very eloquently:


https://www.linkedin.com/pulse/why-i-choose-scala-apache-spark-project-lan-jiang
<https://www.linkedin.com/pulse/why-i-choose-scala-apache-spark-project-lan-jiang>
  
http://stackoverflow.com/questions/17236936/api-compatibility-between-scala-and-python
<http://stackoverflow.com/questions/17236936/api-compatibility-between-scala-and-python>
  
http://apache-spark-developers-list.1001551.n3.nabble.com/A-Comparison-of-Platforms-for-Implementing-and-Running-Very-Large-Scale-Machine-Learning-Algorithms-td7823.html#a7824
<http://apache-spark-developers-list.1001551.n3.nabble.com/A-Comparison-of-Platforms-for-Implementing-and-Running-Very-Large-Scale-Machine-Learning-Algorithms-td7823.html#a7824>
  



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Does-feature-parity-exist-between-Spark-and-PySpark-tp24963p24971.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Reply via email to