Just out of curiosity. Is there a threshold limitation for canopy
algorithm? Is it just defined by the user's preference based on the
inter-cluster distances? or perhaps it is just limited by how much memory
allowed to execute them?
the threshold is based on user's pref of inter-cluster distances. If you are
running out of memory, suggest increasing the JVM memory settings.
Not sure as to what you are trying to accomplish, but if you are looking to get
a first cut at clustering; suggest u look at the new Streaming kmeans th
Hey Suneel, thanks for the reply. I'm trying to create hierarchical
clusters via top down approach. I'm caught in the trade off between the
lower canopy threshold and running out of heap memory. Stream Kmeans
sounds ideal for top clustering. What are the major differences between
Streaming kmeans