Basically I am trying to achieve customer segmentation. Now to measure customer similarity within a cluster I need to understand which two customer are similar.
Assumption: To understand these customer uniquely I need to provide their CustomerId Is my assumption correct? If yes then, will customerId affect the clustering output If no then how can I identify customer uniquely On Tue, Feb 18, 2014 at 2:55 AM, Ted Dunning <ted.dunn...@gmail.com> wrote: > That really depends on what you want to do. > > What is it that you want? > > > On Mon, Feb 17, 2014 at 12:25 PM, Bikash Gupta > <bikash.gupt...@gmail.com>wrote: > >> Ok...so UserId is not a good field for this combination, but if I want >> User Clustering, what should be combination(just for >> understanding)..... >> >> On Tue, Feb 18, 2014 at 1:44 AM, Ted Dunning <ted.dunn...@gmail.com> >> wrote: >> > On Mon, Feb 17, 2014 at 9:00 AM, Bikash Gupta <bikash.gupt...@gmail.com >> >wrote: >> > >> >> Let say I am clustering users, I am providing their profile data to >> >> discover similarity between two user. >> >> >> >> So my input would be [UserId, Location, Age, Gender, Time Created ] >> >> >> >> Now if my UserId length is of minimum 10 characters which is >> >> comparative very large number than other categorical data. >> >> >> > >> > User id is not a good field for clustering. >> > >> > Location is fine if you want geo-graphical clsutering. >> > >> > Location + age + gender is fine for geo-demo-graphical clustering. >> > >> > Adding time created might give a tiny bit of insight. >> > >> > But these fields are not going to lead to great insights. >> >> >> >> -- >> Thanks & Regards >> Bikash Kumar Gupta >> -- Thanks & Regards Bikash Kumar Gupta