Is there any rational to what u r proposing?  

Its better to go with Streaming KMeans than the combination of Canopy - KMeans 
clustering.  

Moreover, Canopy clustering (due to a single reducer in Canopy Generation 
phase) is more likely to fail with large datasets and that's a behavior that's 
been oft reported by several users in these forums.







On Wednesday, March 12, 2014 4:17 PM, Bikash Gupta <bikash.gupt...@gmail.com> 
wrote:
 
Hi,

Finding out right T1 and T2 in canopy is time taking task with manual
intervention. I am planning to automate the process of calculation.

Idea is I would increment T1 and T2 by x times of 3.1 and x times of 2.1,
and would collect the approx T1 and T2 for each K cluster.

Not sure if this is good idea. Please suggest!!!

-- 
Thanks & Regards
Bikash Gupta

Reply via email to