Re: [Edit] Approach for Clustering Data

Bikash Gupta Mon, 17 Feb 2014 09:01:20 -0800

Hi Ted,

Thanks, this helped to think align. However a new question came into mind.


Let say I am clustering users, I am providing their profile data to
discover similarity between two user.

So my input would be [UserId, Location, Age, Gender, Time Created ]

Now if my UserId length is of minimum 10 characters which is
comparative very large number than other categorical data.

Question) Z-Score normalization on this data is right approach to
nullify the participation of UserId and increase its weight. Please
suggest.

On Mon, Feb 17, 2014 at 8:54 PM, Ted Dunning <ted.dunn...@gmail.com> wrote:
> Think about the question in terms of whether this will define a reasonable
> kind of distance between items or users.
>
> Can you first define what you want to do?  Are you clustering users?  Are
> you clustering items?
>
> If users, how could the data you provide give any kind of idea about which
> users are similar?
>
> If items, where is information about the item?
>
>
>
>
> On Mon, Feb 17, 2014 at 2:25 AM, Bikash Gupta <bikash.gupt...@gmail.com>wrote:
>
>> Hi,
>>
>> Just to clear my below question I am citing an another example
>>
>> Let say I will be clustering on any User's monthly summarized data
>>
>> UserID, Transaction, Quantity, Discount
>>
>> Question 1) If I input UserID, Transaction, Quantity, Discount in
>> Kmeans, will the output would be accurate as ideally UserId shouldn't
>> have participated
>>
>> Question2) If I input Transaction, Quantity, Discount in Kmeans, how I
>> will map UserId with output clustered data
>>
>>
>> Request you all to help me with the basic problem that I am facing in
>> data mining.
>>
>> Regards
>> Bikash
>>
>> On Fri, Feb 14, 2014 at 11:25 PM, Bikash Gupta <bikash.gupt...@gmail.com>
>> wrote:
>> > I am newbie to Mahout and working on a data mining clustering use case
>> > using K-Means. I need a help to understand how to map the original
>> > data with the clustered output to gain more insight. Let say
>> >
>> > After performing data preparation we have a summarized data set having
>> > following attributes
>> >
>> > Key1,Key2,Dimension1,Dimension2,Measure1,Measure2,Measure3
>> >
>> > Now I have executed clustering algorithm on following attributes
>> >
>> > Measure1,Measure2,Measure3
>> >
>> > Output of the clustering would be Cluster Id with its
>> > data(Measure1,Measure2,Measure3).
>> >
>> > Question: How can I perform clustering on specific attributes in
>> > dataset, where the clustered output must contain all attributes.
>> >
>> > Request to help me with right approach.
>> >
>> > --
>> > Regards
>> > Bikash Gupta
>>



-- 
Thanks & Regards
Bikash Kumar Gupta

Re: [Edit] Approach for Clustering Data

Reply via email to