Re: Contribute code to MLlib

2015-05-21 Thread Trevor Grant
Thank you Ram and Joseph. I am also hoping to contribute to MLib once my Scala gets up to snuff, this is the guidance I needed for how to proceed when ready. Best wishes, Trevor On Wed, May 20, 2015 at 1:55 PM, Joseph Bradley jos...@databricks.com wrote: Hi Trevor, I may be repeating what

Re: Contribute code to MLlib

2015-05-20 Thread Trevor Grant
Hey Ram, I'm not speaking to Tarek's package specifically but to the spirit of MLib. There are a number of method/algorithms for PCA, I'm not sure by what criterion the current one is considered 'standard'. It is rare to find ANY machine learning algo that is 'clearly better' than any other.

Re: Contribute code to MLlib

2015-05-20 Thread Ram Sriharsha
Hi Trevor Good point, I didn't mean that some algorithm has to be clearly better than another in every scenario to be included in MLLib. However, even if someone is willing to be the maintainer of a piece of code, it does not make sense to accept every possible algorithm into the core library.

Re: Contribute code to MLlib

2015-05-20 Thread Ram Sriharsha
Hi Trevor I'm attaching the MLLib contribution guideline here: https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark#ContributingtoSpark-MLlib-specificContributionGuidelines It speaks to widely known and accepted algorithms but not to whether an algorithm has to be better than

Re: Contribute code to MLlib

2015-05-20 Thread Joseph Bradley
Hi Trevor, I may be repeating what Ram said, but to 2nd it, a few points: We do want MLlib to become an extensive and rich ML library; as you said, scikit-learn is a great example. To make that happen, we of course need to include important algorithms. Important is hazy, but roughly means

Re: Contribute code to MLlib

2015-05-19 Thread Trevor Grant
There are most likely advantages and disadvantages to Tarek's algorithm against the current implementation, and different scenarios where each is more appropriate. Would we not offer multiple PCA algorithms and let the user choose? Trevor Trevor Grant Data Scientist *Fortunate is he, who is

Re: Contribute code to MLlib

2015-05-18 Thread Joseph Bradley
Hi Tarek, Thanks for your interest for checking the guidelines first! On 2 points: Algorithm: PCA is of course a critical algorithm. The main question is how your algorithm/implementation differs from the current PCA. If it's different and potentially better, I'd recommend opening up a JIRA

Contribute code to MLlib

2015-05-18 Thread Tarek Elgamal
Hi, I would like to contribute an algorithm to the MLlib project. I have implemented a scalable PCA algorithm on spark. It is scalable for both tall and fat matrices and the paper around it is accepted for publication in SIGMOD 2015 conference. I looked at the guidelines in the following link: