On Thu, Jan 30, 2014 at 7:51 PM, Evan R. Sparks <evan.spa...@gmail.com>wrote:
> If you just need basic matrix operations - Spark is dependent on JBlas ( > http://mikiobraun.github.io/jblas/) to have access to quick linear > algebra routines inside of MLlib and graphx. Jblas does a nice job of > avoiding boxing/unboxing issues when calling out to blas, so it might be > what you're looking for. The programming patterns you'll be able to support > with jblas (matrix ops on local partitions) are very similar to what you'd > get with numpy, etc. > jblas is not the top java matrix library when it comes to performance: https://code.google.com/p/java-matrix-benchmark/wiki/RuntimeCorei7v2600_2013_10 > > I agree that the python libraries are more complete/feature rich, but if > you really crave high performance then I'd recommend staying pure scala and > giving jblas a try. > > > On Thu, Jan 30, 2014 at 8:30 AM, nileshc <nil...@nileshc.com> wrote: > >> Hi there, >> >> *Background:* >> I need to do some matrix multiplication stuff inside the mappers, and >> trying >> to choose between Python and Scala for writing the Spark MR jobs. I'm >> equally fluent with Python and Java, and find Scala pretty easy too for >> what >> it's worth. Going with Python would let me use numpy + scipy, which is >> blazing fast when compared to Java libraries like Colt etc. Configuring >> Java >> with BLAS seems to be a pain when compared to scipy (direct apt-get >> installs, or pip). >> >> *Question:* >> I posted a couple of comments on this answer at StackOverflow: >> >> http://stackoverflow.com/questions/17236936/api-compatibility-between-scala-and-python >> . >> Basically it states that as of Spark 0.7.2, the Python API would be slower >> than Scala. What's the performance scenario now? The fork issue seems to >> be >> fixed. How about serialization? Can it match Java/Scala Writable-like >> serialization (having knowledge of object type beforehand, reducing I/O) >> performance? Also, a probably silly question - loops seem to be slow in >> Python in general, do you think this can turn out to be an issue? >> >> Bottomline, should I choose Python for computation-intensive algorithms >> like >> PageRank? Scipy gives me an edge, but does the framework kill it? >> >> Any help, insights, benchmarks will be much appreciated. :) >> >> Cheers, >> Nilesh >> >> >> >> -- >> View this message in context: >> http://apache-spark-user-list.1001560.n3.nabble.com/Python-API-Performance-tp1048.html >> Sent from the Apache Spark User List mailing list archive at Nabble.com. >> > >