Hi Folks, Does any one have experience or recommendations on incorporating categorical features (attributes) into k-means clustering in Spark? In other words, I want to cluster on a set of attributes that include categorical variables.
I know I could probably implement some custom code to parse and calculate my own similarity function, but I wanted to reach out before I did so. I’d also prefer to take advantage of the k-means\parallel initialization feature of the model in MLlib, so an MLlib-based implementation would be preferred. Thanks in advance. Best, -Wen
signature.asc
Description: Message signed with OpenPGP using GPGMail