OK, I did some uber-basic testing of the Python ALS example and the Scala ALS example (I wouldn't call this real benchmarking because of the configuration and casual nature of the test).
CPU:i5-2500K Memory allotted to an example with -Djava.executor.memory=2g I've got one master and one slave running. I'm listing the results in <API Language> <List of params: movies users features iterations slices> : <Time taken> format. <Scala> 500 2000 100 5 2 1m21s <Scala> 500 2000 100 5 4 0m50s <Scala> 700 2000 100 5 2 1m41s <Scala> 700 2000 100 5 4 1m14s <Python> 500 2000 100 5 4 8m18s (Sorry, no more for Python, I'm pressed for time at the moment.) I noticed that average CPU utilization on the quad-core was always 99%+ (except for the drops to ~90% between iterations) during the Scala runs. During Python however, it was around 55-67%, and the rest was WAIT. Evidently a huge time was being wasted (on I/O? slow loops?). And a funnier thing was, the RMSE over the 5 iterations for Scala began with 0.82 and ended with 0.73, while for the Python version, it started with an RMSE of 1294.1236 and ended with 210.2984. That's a pretty huge gap. Can someone very all this at least on a single node? I haven't even modified any code, so Scala's using the usual Colt. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Python-API-Performance-tp1048p1109.html Sent from the Apache Spark User List mailing list archive at Nabble.com.