Hot damn! Well spotted.
On Thu, Mar 28, 2013 at 12:08 AM, Dan Filimon <dangeorge.fili...@gmail.com>wrote: > Ted, remember we talked about this last week? > > The problem was (I think it's fixed now) that when I was asking for 20 > clusters, every mapper would give me 20 clusters (rather than k log n > ~ 200) and the points clumped together resulting in one cluster with > the vast majority of the points ~17K out the ~19K. > > Now that I fixed that added more tests that seem to be confirming all > StreamingKMeans implementations get about the same results (whether > they're local or MapReduce) and the multiple restarts of BallKMeans, > I'm expecting it to be a lot better. > > Actual data tests coming soon (please check that new cluster thread). :) >