On Mon, Feb 17, 2014 at 9:00 AM, Bikash Gupta <bikash.gupt...@gmail.com>wrote:
> Let say I am clustering users, I am providing their profile data to > discover similarity between two user. > > So my input would be [UserId, Location, Age, Gender, Time Created ] > > Now if my UserId length is of minimum 10 characters which is > comparative very large number than other categorical data. > User id is not a good field for clustering. Location is fine if you want geo-graphical clsutering. Location + age + gender is fine for geo-demo-graphical clustering. Adding time created might give a tiny bit of insight. But these fields are not going to lead to great insights.