Categorical Features for K-Means Clustering

Wen Phan Fri, 11 Jul 2014 07:08:14 -0700

Hi Folks,

Does any one have experience or recommendations on incorporating categorical 
features (attributes) into k-means clustering in Spark?  In other words, I want 
to cluster on a set of attributes that include categorical variables.


I know I could probably implement some custom code to parse and calculate my 
own similarity function, but I wanted to reach out before I did so.  I’d also 
prefer to take advantage of the k-means\parallel initialization feature of the 
model in MLlib, so an MLlib-based implementation would be preferred.

Thanks in advance.

Best,

-Wen

signature.asc
Description: Message signed with OpenPGP using GPGMail

Categorical Features for K-Means Clustering

Reply via email to