Canopy threshold limitation

2013-11-22 Thread Chih-Hsien Wu
Just out of curiosity. Is there a threshold limitation for canopy algorithm? Is it just defined by the user's preference based on the inter-cluster distances? or perhaps it is just limited by how much memory allowed to execute them?

Re: Canopy threshold limitation

2013-11-22 Thread Suneel Marthi
the threshold is based on user's pref of inter-cluster distances. If you are running out of memory, suggest increasing the JVM memory settings. Not sure as to what you are trying to accomplish, but if you are looking to get a first cut at clustering; suggest u look at the new Streaming kmeans th

Re: Canopy threshold limitation

2013-11-25 Thread Chih-Hsien Wu
Hey Suneel, thanks for the reply. I'm trying to create hierarchical clusters via top down approach. I'm caught in the trade off between the lower canopy threshold and running out of heap memory. Stream Kmeans sounds ideal for top clustering. What are the major differences between Streaming kmeans