Ted, remember we talked about this last week?

The problem was (I think it's fixed now) that when I was asking for 20
clusters, every mapper would give me 20 clusters (rather than k log n
~ 200) and the points clumped together resulting in one cluster with
the vast majority of the points ~17K out the ~19K.

Now that I fixed that added more tests that seem to be confirming all
StreamingKMeans implementations get about the same results (whether
they're local or MapReduce) and the multiple restarts of BallKMeans,
I'm expecting it to be a lot better.

Actual data tests coming soon (please check that new cluster thread). :)

Reply via email to