Hi Evans,

Thanks! I didn't know that Sparks has a dependency on JBLAS. That's good to
know. Does this mean I can directly use JBLAS from my Spark MR code and not
worry about the painstaking setup of getting Java to recognize the native
BLAS libraries on my system? Does Spark take care of that?

But then again, my particular use case deals with large sparse matrices, in
which case my only option on the Java/Scala side seems to be Colt (which is
pretty slow compared to both JBLAS and scipy/numpy). MTJ is another option
- but I'm not sure how much BLAS/ATLAS-setup that'll need. That's what's
confusing me - I can't figure out how this will balance out until I take
some time off to code some benchmarks myself. :(

Nilesh


On Fri, Jan 31, 2014 at 3:04 AM, Evan R. Sparks [via Apache Spark User
List] <ml-node+s1001560n1068...@n3.nabble.com> wrote:

> If you just need basic matrix operations - Spark is dependent on JBlas (
> http://mikiobraun.github.io/jblas/) to have access to quick linear
> algebra routines inside of MLlib and graphx. Jblas does a nice job of
> avoiding boxing/unboxing issues when calling out to blas, so it might be
> what you're looking for. The programming patterns you'll be able to support
> with jblas (matrix ops on local partitions) are very similar to what you'd
> get with numpy, etc.
>
> I agree that the python libraries are more complete/feature rich, but if
> you really crave high performance then I'd recommend staying pure scala and
> giving jblas a try.
>
>
> On Thu, Jan 30, 2014 at 8:30 AM, nileshc <[hidden 
> email]<http://user/SendEmail.jtp?type=node&node=1068&i=0>
> > wrote:
>
>> Hi there,
>>
>> *Background:*
>> I need to do some matrix multiplication stuff inside the mappers, and
>> trying
>> to choose between Python and Scala for writing the Spark MR jobs. I'm
>> equally fluent with Python and Java, and find Scala pretty easy too for
>> what
>> it's worth. Going with Python would let me use numpy + scipy, which is
>> blazing fast when compared to Java libraries like Colt etc. Configuring
>> Java
>> with BLAS seems to be a pain when compared to scipy (direct apt-get
>> installs, or pip).
>>
>> *Question:*
>> I posted a couple of comments on this answer at StackOverflow:
>>
>> http://stackoverflow.com/questions/17236936/api-compatibility-between-scala-and-python
>> .
>> Basically it states that as of Spark 0.7.2, the Python API would be slower
>> than Scala. What's the performance scenario now? The fork issue seems to
>> be
>> fixed. How about serialization? Can it match Java/Scala Writable-like
>> serialization (having knowledge of object type beforehand, reducing I/O)
>> performance? Also, a probably silly question - loops seem to be slow in
>> Python in general, do you think this can turn out to be an issue?
>>
>> Bottomline, should I choose Python for computation-intensive algorithms
>> like
>> PageRank? Scipy gives me an edge, but does the framework kill it?
>>
>> Any help, insights, benchmarks will be much appreciated. :)
>>
>> Cheers,
>> Nilesh
>>
>>
>>
>> --
>> View this message in context:
>> http://apache-spark-user-list.1001560.n3.nabble.com/Python-API-Performance-tp1048.html
>> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>>
>
>
>
> ------------------------------
>  If you reply to this email, your message will be added to the discussion
> below:
>
> http://apache-spark-user-list.1001560.n3.nabble.com/Python-API-Performance-tp1048p1068.html
>  To unsubscribe from Python API Performance, click 
> here<http://apache-spark-user-list.1001560.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_code&node=1048&code=bmlsZXNoQG5pbGVzaGMuY29tfDEwNDh8MTA4ODg3MjEwMg==>
> .
> NAML<http://apache-spark-user-list.1001560.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewer&id=instant_html%21nabble%3Aemail.naml&base=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespace&breadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml>
>



-- 
A quest eternal, a life so small! So don't just play the guitar, build one.
You can also email me at cont...@nileshc.com or visit my
website<http://www.nileshc.com/>




--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Python-API-Performance-tp1048p1070.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

Reply via email to