Hi Aditya, I noticed the KNN poster http://dsr.cise.ufl.edu/wp-content/uploads/2016/05/MADlib_Combined.pptx.pdf and was wondering if you have plans to make a pull request?
Frank On Mon, Mar 28, 2016 at 9:37 PM, Roman Shaposhnik <r...@apache.org> wrote: > Awesome! > > On Mon, Mar 28, 2016 at 9:18 PM, Frank McQuillan <fmcquil...@pivotal.io> > wrote: > > Thanks Roman. I was able to do it just now. > > > > Frank > > > > On Mon, Mar 28, 2016 at 9:12 PM, Roman Shaposhnik <r...@apache.org> > wrote: > >> > >> I can help with that -- stay tuned. > >> > >> On Mon, Mar 28, 2016 at 8:29 PM, Frank McQuillan <fmcquil...@pivotal.io > > > >> wrote: > >> > Let me figure out how to do this and add Aditya as the owner of that > >> > JIRA. > >> > My initial attempts in ASF infra-land were not quite successful. > >> > > >> > Frank > >> > > >> > On Mon, Mar 28, 2016 at 4:54 PM, Rahul Iyer <ri...@pivotal.io> wrote: > >> >> > >> >> @Frank, Roman: I believe Aditya needs to be added as a developer to > the > >> >> MADlib project to assign a JIRA to him? Is this only available to the > >> >> lead/owner? > >> >> > >> >> On Mon, Mar 28, 2016 at 3:49 PM, Aditya Nain <adityana...@gmail.com> > >> >> wrote: > >> >>> > >> >>> Hi Rahul, > >> >>> > >> >>> I didn't have an id, so I created one now. > >> >>> My id is : Aditya Nain > >> >>> > >> >>> Thanks, > >> >>> Aditya > >> >>> > >> >>> On Mon, Mar 28, 2016 at 6:40 PM, Rahul Iyer <ri...@pivotal.io> > wrote: > >> >>> > >> >>> > I can assign this to you, but you need to have an account in > >> >>> > https://issues.apache.org. > >> >>> > If you already have an account, then please send your id - I > wasn't > >> >>> > able to > >> >>> > find you just using your name. > >> >>> > > >> >>> > On Mon, Mar 28, 2016 at 3:31 PM, Aditya Nain < > adityana...@gmail.com> > >> >>> > wrote: > >> >>> > > >> >>> > > Hi Rahul, > >> >>> > > > >> >>> > > Thanks for the reply! > >> >>> > > > >> >>> > > I am working on implementing Gaussian Mixture Model assuming > that > >> >>> > > the > >> >>> > > co-variance matrix is same for all the Gaussians. > >> >>> > > The JIRA which deals GMM is MADBLIB-410: > >> >>> > > > >> >>> > > >> >>> > > >> >>> > https://issues.apache.org/jira/browse/MADLIB-410?jql= > project%20%3D%20MADLIB > >> >>> > > > >> >>> > > Can this be assigned to me, or how do I get it assigned to me? > >> >>> > > > >> >>> > > Thanks, > >> >>> > > Aditya > >> >>> > > > >> >>> > > On Mon, Mar 21, 2016 at 3:41 PM, Rahul Iyer <ri...@pivotal.io> > >> >>> > > wrote: > >> >>> > > > >> >>> > > > Hi Aditya, > >> >>> > > > > >> >>> > > > Welcome to the MADlib community! > >> >>> > > > > >> >>> > > > Gaussian Mixture models is extrememly useful and we would > >> >>> > > > heartily > >> >>> > > welcome > >> >>> > > > a contribution for it. The SQLEM paper might be > oversimplifying > >> >>> > > > the > >> >>> > > > capabilities of the database (e.g. assuming there is no array > >> >>> > > > type > >> >>> > > > is > >> >>> > > > unnecessary for Postgresql). You could speed things (both dev > >> >>> > > > time > >> >>> > > > and > >> >>> > > > execution time) by writing some of the functions in C++. > K-means > >> >>> > > > is > >> >>> > > > an > >> >>> > > > example of how clustering is implemented. > >> >>> > > > IMO, assuming the same covariance matrix is reasonable. We > could > >> >>> > > > extend > >> >>> > > the > >> >>> > > > capabilities after the initial implementation is complete. > >> >>> > > > > >> >>> > > > There was some work started a long time ago that built > >> >>> > > > perceptrons > >> >>> > using > >> >>> > > > the convex framework (link > >> >>> > > > <https://github.com/iyerr3/madlib/tree/mlp > >> >>> > >). > >> >>> > > > There are still some bugs in that code since the trained > network > >> >>> > > > isn't > >> >>> > > > converging. You could start there or build a new module - > either > >> >>> > > > ways > >> >>> > an > >> >>> > > > MLP module is frequently demanded by the data science > community. > >> >>> > > > > >> >>> > > > I would suggest starting with Gaussian mixtures and then > moving > >> >>> > > > to > >> >>> > > > perceptrons if GMM work is completed. > >> >>> > > > > >> >>> > > > Feel free to ask questions on this forum. Looking forward to > >> >>> > > collaborating > >> >>> > > > with you. > >> >>> > > > > >> >>> > > > Best, > >> >>> > > > Rahul > >> >>> > > > > >> >>> > > > On Thu, Mar 17, 2016 at 2:08 PM, Aditya Nain > >> >>> > > > <adityana...@gmail.com> > >> >>> > > > wrote: > >> >>> > > > > >> >>> > > > > Hi, > >> >>> > > > > > >> >>> > > > > My name is Aditya Nain, and I am a graduate student at > >> >>> > > > > University > >> >>> > > > > of > >> >>> > > > > Florida. > >> >>> > > > > I have been learning MADLib for a while and want to > contribute > >> >>> > > > > to > >> >>> > > MADLib. > >> >>> > > > > I went through some of the open stories in JIRA and started > >> >>> > > > > working > >> >>> > on > >> >>> > > > > MADLIB-410 : > >> >>> > > > > > >> >>> > > > > > >> >>> > > > > >> >>> > > > >> >>> > > >> >>> > > >> >>> > https://issues.apache.org/jira/browse/MADLIB-410?jql= > project%20%3D%20MADLIB > >> >>> > > > > > >> >>> > > > > which is about implementing Gaussian Mixture Model using > >> >>> > > > > Expectation > >> >>> > > > > Maximization (EM) algorithm. > >> >>> > > > > > >> >>> > > > > I came across the following paper while searching for > >> >>> > > > > distributed > >> >>> > > > > EM > >> >>> > > > > algorithm which can be implemented in MADLib. > >> >>> > > > > > >> >>> > > > > Carlos Ordonez, Paul Cereghini "SQLEM: fast clustering in > SQL > >> >>> > > > > using > >> >>> > the > >> >>> > > > EM > >> >>> > > > > algorithm" ACM SIGMOD Record, Volume 29 Issue 2, June 2000 > >> >>> > > > > Pages > >> >>> > > 559-570. > >> >>> > > > > > >> >>> > > > > http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.28. > 7564 > >> >>> > > > > > >> >>> > > > > I thought of implementing the approach discussed in the > paper, > >> >>> > > > > but > >> >>> > the > >> >>> > > > > paper makes an assumption that the covariance martix is the > >> >>> > > > > same > >> >>> > > > > for > >> >>> > > all > >> >>> > > > > the clusters ( i.e covariance matrix is same for all the > >> >>> > > > > Gaussian > >> >>> > > > > distributions). So, I wanted to know the opinion of the > >> >>> > > > > community > >> >>> > > > > if > >> >>> > > it's > >> >>> > > > > fine to go with the assumption made in the paper and > implement > >> >>> > > > > it > >> >>> > > > > in > >> >>> > > > > MADLib. > >> >>> > > > > > >> >>> > > > > Also, currently MADLib doesn't have an implementation of a > >> >>> > perceptron, > >> >>> > > > nor > >> >>> > > > > did I find any open story related to it in JIRA. I came > across > >> >>> > > > > the > >> >>> > > > > following paper, which talks about a distributed algorithm > for > >> >>> > > > perceptron : > >> >>> > > > > > >> >>> > > > > Ryan McDonald, Keith Hall, Gideon Mann "Distributed training > >> >>> > strategies > >> >>> > > > for > >> >>> > > > > the structured perceptron" > >> >>> > > > > http://dl.acm.org/citation.cfm?id=1858068 > >> >>> > > > > > >> >>> > > > > Would it useful to have a distributed implementaion of > >> >>> > > > > perceptron > >> >>> > > > > in > >> >>> > > > > MADlib? > >> >>> > > > > > >> >>> > > > > Thanks, > >> >>> > > > > Aditya > >> >>> > > > > > >> >>> > > > > >> >>> > > > >> >>> > > >> >> > >> >> > >> > > > > > >