On Mon, Feb 17, 2014 at 9:00 AM, Bikash Gupta <bikash.gupt...@gmail.com>wrote:

> Let say I am clustering users, I am providing their profile data to
> discover similarity between two user.
>
> So my input would be [UserId, Location, Age, Gender, Time Created ]
>
> Now if my UserId length is of minimum 10 characters which is
> comparative very large number than other categorical data.
>

User id is not a good field for clustering.

Location is fine if you want geo-graphical clsutering.

Location + age + gender is fine for geo-demo-graphical clustering.

Adding time created might give a tiny bit of insight.

But these fields are not going to lead to great insights.

Reply via email to