Re: [GSOC 2014] Uniform API for Mahout Clustering

2014-03-21 Thread chalitha udara Perera
Hi everyone, I have submitted the proposal [1]. Thanks a lot everyone for valuable insights. I would greatly appreciate if you can take few minutes to review it. [1] https://www.google-melange.com/gsoc/proposal/review/student/google/gsoc2014/chalitha_perera/5629499534213120 Thanks. Chalitha

Re: [GSOC 2014] Uniform API for Mahout Clustering

2014-03-19 Thread Sebastian Schelter
It's not about directly porting algorithms to Spark, its about porting them to a DSL that executes on top of Spark. This page has information about it: https://mahout.apache.org/users/sparkbindings/home.html --sebastian On 03/19/2014 08:43 AM, chalitha udara Perera wrote: Thanks a lot everyo

Re: [GSOC 2014] Uniform API for Mahout Clustering

2014-03-19 Thread chalitha udara Perera
Thanks a lot everyone for valuable insights. Since now the main focus is on porting to Spark, I would be really happy to get involved with it. Can you give me more information on current progress with porting, specially regrading clustering component. Regards, Chalitha On Wed, Mar 19, 2014 at 12

Re: [GSOC 2014] Uniform API for Mahout Clustering

2014-03-19 Thread Suneel Marthi
On Wednesday, March 19, 2014 3:09 AM, Dmitriy Lyubimov wrote: On Tue, Mar 18, 2014 at 11:56 PM, chalitha udara Perera < chalithaud...@gmail.com> wrote: > Hi Dmitriy, > > I agree with you that i need to be more specific on this matter. Here I was > referring to some suggestion given by Sun

Re: [GSOC 2014] Uniform API for Mahout Clustering

2014-03-19 Thread Dmitriy Lyubimov
On Tue, Mar 18, 2014 at 11:56 PM, chalitha udara Perera < chalithaud...@gmail.com> wrote: > Hi Dmitriy, > > I agree with you that i need to be more specific on this matter. Here I was > referring to some suggestion given by Suneel on Mahout 1.0 goals [1], b and > c. > He mainly speaks of test cove

Re: [GSOC 2014] Uniform API for Mahout Clustering

2014-03-19 Thread Sebastian Schelter
I think it would be great to port our kMeans implementation to Spark. It should be done by using Dmitriy's DSL similar to what I'm trying in https://issues.apache.org/jira/browse/MAHOUT-1464 On 03/19/2014 07:56 AM, chalitha udara Perera wrote: Hi Dmitriy, I agree with you that i need to be m

Re: [GSOC 2014] Uniform API for Mahout Clustering

2014-03-18 Thread chalitha udara Perera
Hi Dmitriy, I agree with you that i need to be more specific on this matter. Here I was referring to some suggestion given by Suneel on Mahout 1.0 goals [1], b and c. For example this is one thing i have experienced while using mahout clustering. I have used both simple kmeans and spectral kmeans

Re: [GSOC 2014] Uniform API for Mahout Clustering

2014-03-18 Thread Dmitriy Lyubimov
I think you need to be a little bit more specific as to what you are proposing exactly. I think "uniform clustering api" needs a bit of elaboration. I, generally, cannot say that I experienced any pain calling out clustering algorithms say in R as a well-documented function. In Mahout just doing t

Re: [GSOC 2014] Uniform API for Mahout Clustering

2014-03-18 Thread chalitha udara Perera
Hi everyone, Greatly appreciate your interest on this issue. I have gone through the document ScalaSparkBindings [1] . In this project my initial idea was to provide high level API for end user programmers so that they have the flexibility of plugin in different types of algorithms without concern

Re: [GSOC 2014] Uniform API for Mahout Clustering

2014-03-17 Thread Dmitriy Lyubimov
Yes. there's interest. Note that we are trying to unify linear algebra primitives and optimization on Spark as well. All new linear algebra and interaction with spark context should probably go thru this layer. This is ongoing thing but some stuff is working [1] [1] mAHOUT-1346 https://issues.apac

[GSOC 2014] Uniform API for Mahout Clustering

2014-03-17 Thread chalitha udara Perera
Hi All, Going through the mail tread Mahout 1.0 goals, I found that the main focus of mahout is now towards the code re-factoring and integration with Spark rather than implementing new algorithms. Recently I have used mahout for implementing document clustering module a Content Management System.