Have you looked at the streaming k-means work?  The basic idea is that you
generate a sketch of the data which you can then cluster in-memory.  That
lets you use very advanced centroid generation algorithms that require lots
of processing.




On Tue, Nov 26, 2013 at 6:29 AM, Chih-Hsien Wu <chjaso...@gmail.com> wrote:

> Hi all, I'm trying to clustering text documents via top-down approach. I
> have experienced both random seed and canopy generation, and have seen
> their pros and cons. I realize that canopy is great for not known exact
> cluster numbers; nevertheless, the memory need for canopy is great. I was
> hoping to find something similar to canopy generation and was wondering if
> there is any other recommendation?
>

Reply via email to