Basically I am trying to achieve customer segmentation.

Now to measure customer similarity within a cluster I need to
understand which two customer are similar.

Assumption: To understand these customer uniquely I need to provide
their CustomerId

Is my assumption correct? If yes then, will customerId affect the
clustering output

If no then how can I identify customer uniquely

On Tue, Feb 18, 2014 at 2:55 AM, Ted Dunning <ted.dunn...@gmail.com> wrote:
> That really depends on what you want to do.
>
> What is it that you want?
>
>
> On Mon, Feb 17, 2014 at 12:25 PM, Bikash Gupta 
> <bikash.gupt...@gmail.com>wrote:
>
>> Ok...so UserId is not a good field for this combination, but if I want
>> User Clustering, what should be combination(just for
>> understanding).....
>>
>> On Tue, Feb 18, 2014 at 1:44 AM, Ted Dunning <ted.dunn...@gmail.com>
>> wrote:
>> > On Mon, Feb 17, 2014 at 9:00 AM, Bikash Gupta <bikash.gupt...@gmail.com
>> >wrote:
>> >
>> >> Let say I am clustering users, I am providing their profile data to
>> >> discover similarity between two user.
>> >>
>> >> So my input would be [UserId, Location, Age, Gender, Time Created ]
>> >>
>> >> Now if my UserId length is of minimum 10 characters which is
>> >> comparative very large number than other categorical data.
>> >>
>> >
>> > User id is not a good field for clustering.
>> >
>> > Location is fine if you want geo-graphical clsutering.
>> >
>> > Location + age + gender is fine for geo-demo-graphical clustering.
>> >
>> > Adding time created might give a tiny bit of insight.
>> >
>> > But these fields are not going to lead to great insights.
>>
>>
>>
>> --
>> Thanks & Regards
>> Bikash Kumar Gupta
>>



-- 
Thanks & Regards
Bikash Kumar Gupta

Reply via email to