Re: Why some encoders called setProbs(2)

2012-04-09 Thread Federico Castanedo
setProbes() indicates the number of locations updated by the FeatureVectorEncoder which is 1 by default. In this example (see line 110 of NewsGroupHelper.java) the encoder use 2 values 2012/4/9 冯超 : > Hello everyone, >    I am a freshman in mahout, today I read the example code of the mahout > and

Re: some new clustering code

2012-04-06 Thread Federico Castanedo
Hello Ted, The only difference I notice between Kmeans and StreamingKmeans class is the dynamic increment of maxClusters and the distanceCutoff test. So, i execute the KMeans class against a subset of the BigCross dataset and it works fine. What is the rationale behind choosing f = estimateCuto

Re: streaming kmeans

2012-01-16 Thread Federico Castanedo
streamkm++ each win sometimes and lose > sometimes relative to each other. I don't see a clear win on either side. > > On Sun, Jan 15, 2012 at 9:59 PM, Federico Castanedo < > castanedof...@gmail.com > > wrote: > > > Hi Ted, > > > > Oops, i jus

Re: streaming kmeans

2012-01-15 Thread Federico Castanedo
avoid map-reduce iterations? > > On Sun, Jan 15, 2012 at 9:23 PM, Federico Castanedo < > castanedof...@gmail.com > > wrote: > > > Hi all, > > > > These days i've been looking to this paper: > > "*Fast and Accurate *k*-means for Large Datasets

streaming kmeans

2012-01-15 Thread Federico Castanedo
Hi all, These days i've been looking to this paper: "*Fast and Accurate *k*-means for Large Datasets",* recently presented in NIPS'2011. http://web.engr.oregonstate.edu/~shindler/papers/StreamingKMeans_soda11.pdf It seems an outstanding state-of-the-art approach to implement streaming kmeans for

Re: Average distance between two points in unit hypercube?

2011-10-22 Thread Federico Castanedo
It also lacks a proof which is kind of important among some circles. > The > > > clever proof is actually not that hard to grok. > > > > > > On Fri, Oct 21, 2011 at 5:07 AM, Federico Castanedo < > > > castanedof...@gmail.com > > > > wro

Re: Average distance between two points in unit hypercube?

2011-10-21 Thread Federico Castanedo
I think, that's a good explanation of the Johnson-Lindenstrauss Lemma, which is the basis of the manifold learning theory using random projections. 2011/10/21 Ted Dunning > Sort of. > > I may be misunderstanding the question. > > If you take a random orthogonal projection, then distances will be

Re: Average distance between two points in unit hypercube?

2011-10-19 Thread Federico Castanedo
what about this: http://www.wisdom.weizmann.ac.il/~oded/p_aver-metric.html HTW 2011/10/19 Sean Owen > (And when I do the simulation correctly, I get a better answer: sqrt(n/6) ) > > On Wed, Oct 19, 2011 at 5:21 PM, Sean Owen wrote: > > Hmm. Not knowing the analytics answer I just wrote a simu

Re: Meetup in the Bay Area in Sept?

2010-09-03 Thread Federico Castanedo
I'll keep my mouth close :D 2010/9/2 Dmitriy Lyubimov : > Cool by me. > :) > > On Thu, Sep 2, 2010 at 1:30 PM, Jake Mannix wrote: > >> You can come, but only if you're sworn to secrecy and won't tell anyone the >> secret handshake. >> >> On Thu, Sep 2, 2010 at 1:02 PM, Dmitriy Lyubimov >> wrote

Re: Meetup in the Bay Area in Sept?

2010-09-03 Thread Federico Castanedo
Hi all, I would also like to attend the meeting (I'm not a commiter) but I'm interested on mahout and the algorithms you are developing. Regards, Federico 2010/9/2 Dmitriy Lyubimov : > I would love to attend, if non-committers are allowed. > -Dmitriy > > On Thu, Sep 2, 2010 at 12:56 PM, Ted Dunn