Regarding features, the general workflow for the Spark community when adding new features is to first add them in Scala (since Spark is written in Scala). Once this is done, a Jira ticket will be created requesting that the feature be added to the Python API (example - SPARK-9773 <https://issues.apache.org/jira/browse/SPARK-9773> ). Some of these Python API tickets get done very quickly, some don't. As such, the Scala API will always be more feature rich from a Spark perspective, while the Python API can lag behind in some cases. In general, the intent is to make the PySpark API contain all features of the Scala API, since Python is considered a first class citizen in the Spark community; the difference is that if you need the latest and greatest and need it right away, Scala is the best choice.
Regarding performance, others have said it very eloquently: https://www.linkedin.com/pulse/why-i-choose-scala-apache-spark-project-lan-jiang <https://www.linkedin.com/pulse/why-i-choose-scala-apache-spark-project-lan-jiang> http://stackoverflow.com/questions/17236936/api-compatibility-between-scala-and-python <http://stackoverflow.com/questions/17236936/api-compatibility-between-scala-and-python> http://apache-spark-developers-list.1001551.n3.nabble.com/A-Comparison-of-Platforms-for-Implementing-and-Running-Very-Large-Scale-Machine-Learning-Algorithms-td7823.html#a7824 <http://apache-spark-developers-list.1001551.n3.nabble.com/A-Comparison-of-Platforms-for-Implementing-and-Running-Very-Large-Scale-Machine-Learning-Algorithms-td7823.html#a7824> -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Does-feature-parity-exist-between-Spark-and-PySpark-tp24963p24971.html Sent from the Apache Spark User List mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org