Is there any rational to what u r proposing? Its better to go with Streaming KMeans than the combination of Canopy - KMeans clustering.
Moreover, Canopy clustering (due to a single reducer in Canopy Generation phase) is more likely to fail with large datasets and that's a behavior that's been oft reported by several users in these forums. On Wednesday, March 12, 2014 4:17 PM, Bikash Gupta <bikash.gupt...@gmail.com> wrote: Hi, Finding out right T1 and T2 in canopy is time taking task with manual intervention. I am planning to automate the process of calculation. Idea is I would increment T1 and T2 by x times of 3.1 and x times of 2.1, and would collect the approx T1 and T2 for each K cluster. Not sure if this is good idea. Please suggest!!! -- Thanks & Regards Bikash Gupta