Remove unused recommenders?

2012-12-06 Thread Sebastian Schelter
Hi there, I'm currently thinking whether we should do a little cleanup in the non-distributed recommenders package and throw out recommenders that have not been used/asked about on the mailinglist or that have been replaced by a superior implementation. If anyone reads this and sees a

Re: Remove unused recommenders?

2012-12-06 Thread Sean Owen
The tree-based ones are very old and not fast, and were more of an experiment. I recall a few questions about them but it seemed like people were really just trying to do clustering, and this is a bad way to do clustering. knn is old too, and in a sense spiritually quite similar to ALS. I don't

Re: Remove unused recommenders?

2012-12-06 Thread Koobas
As a n00b, I am still revolving in the kNN space. Could you please point me to some details on ALS. Thanks! On Thu, Dec 6, 2012 at 10:14 AM, Sean Owen sro...@gmail.com wrote: The tree-based ones are very old and not fast, and were more of an experiment. I recall a few questions about them but

Re: Remove unused recommenders?

2012-12-06 Thread Sean Owen
Are you speaking specifically about the implementation in the .knn package, which is a fairly particular thing, or just a k nearest neighbor approaches in general? The latter aren't going away. On Thu, Dec 6, 2012 at 3:18 PM, Koobas koo...@gmail.com wrote: As a n00b, I am still revolving in the

Re: Remove unused recommenders?

2012-12-06 Thread Sebastian Schelter
FunkSVD is a suboptimal duplicate of RatingSGDFactorizer, ImplicitLinearRegressionFactorizer is a duplicate of ALSWR so I think we should only keep one of each. The other three recommenders seem to be used almost never, so I'd like to remove them, however I wouldn't have a problem with keeping

Re: Remove unused recommenders?

2012-12-06 Thread Koobas
On Thu, Dec 6, 2012 at 10:20 AM, Sean Owen sro...@gmail.com wrote: Are you speaking specifically about the implementation in the .knn package, which is a fairly particular thing, or just a k nearest neighbor approaches in general? The latter aren't going away. kNN in general. Glad to hear it

Re: Clustering points in a unit hypercube

2012-12-06 Thread Dan Filimon
I took the plunge and rendered a few plots in R with how the parameters of streaming-k-means evolve. Here's the link [1]. [1] https://github.com/dfilimon/knn/wiki/skm-visualization On Thu, Dec 6, 2012 at 2:01 AM, Ted Dunning ted.dunn...@gmail.com wrote: Still not that odd if several clusters

Re: Clustering points in a unit hypercube

2012-12-06 Thread Ted Dunning
Yeah... very useful. Clearly the adaptive limit on the number of surrogate points is much too restrictive. On Fri, Dec 7, 2012 at 1:21 AM, Dan Filimon dangeorge.fili...@gmail.comwrote: I took the plunge and rendered a few plots in R with how the parameters of streaming-k-means evolve. Here's

Re: Remove unused recommenders?

2012-12-06 Thread Ted Dunning
Deprecating is a nice first step to let people know where things are headed. On Thu, Dec 6, 2012 at 4:21 PM, Sebastian Schelter s...@apache.org wrote: The other three recommenders seem to be used almost never, so I'd like to remove them, however I wouldn't have a problem with keeping them for

Re: Decision Forest - Partial implementation

2012-12-06 Thread Marty Kube
Yes I'm on a project in which we classify a large data set. We do use mapreduce to do the classification as the data set is much larger than the working memory. We have a non-mahout implementation... So we put the decision forest in memory via a distributed cache and partition the data set

Re: Remove unused recommenders?

2012-12-06 Thread Marty Kube
One nice way to do this to to mark the classes in question depreciated for a few releases, and then remove them on an announced schedule. That lets any end users know what is coming and gives them time to respond. On 12/06/2012 10:21 AM, Sebastian Schelter wrote: FunkSVD is a suboptimal

HA: seq2sparse dictionary question

2012-12-06 Thread Abramov Pavel
May be I dont understand your question. we apply this formula http://search.apache.org/~doronc/api/org/apache/lucene/search/DefaultSimilarity.html#idf%28int,%20int%29 to frequency.file-0. (seq2sparse output with term DocFreq) Just checked TF vectors and TFIDF vectors, this furmula gives me