Hi Ted, Thanks, this helped to think align. However a new question came into mind.
Let say I am clustering users, I am providing their profile data to discover similarity between two user. So my input would be [UserId, Location, Age, Gender, Time Created ] Now if my UserId length is of minimum 10 characters which is comparative very large number than other categorical data. Question) Z-Score normalization on this data is right approach to nullify the participation of UserId and increase its weight. Please suggest. On Mon, Feb 17, 2014 at 8:54 PM, Ted Dunning <ted.dunn...@gmail.com> wrote: > Think about the question in terms of whether this will define a reasonable > kind of distance between items or users. > > Can you first define what you want to do? Are you clustering users? Are > you clustering items? > > If users, how could the data you provide give any kind of idea about which > users are similar? > > If items, where is information about the item? > > > > > On Mon, Feb 17, 2014 at 2:25 AM, Bikash Gupta <bikash.gupt...@gmail.com>wrote: > >> Hi, >> >> Just to clear my below question I am citing an another example >> >> Let say I will be clustering on any User's monthly summarized data >> >> UserID, Transaction, Quantity, Discount >> >> Question 1) If I input UserID, Transaction, Quantity, Discount in >> Kmeans, will the output would be accurate as ideally UserId shouldn't >> have participated >> >> Question2) If I input Transaction, Quantity, Discount in Kmeans, how I >> will map UserId with output clustered data >> >> >> Request you all to help me with the basic problem that I am facing in >> data mining. >> >> Regards >> Bikash >> >> On Fri, Feb 14, 2014 at 11:25 PM, Bikash Gupta <bikash.gupt...@gmail.com> >> wrote: >> > I am newbie to Mahout and working on a data mining clustering use case >> > using K-Means. I need a help to understand how to map the original >> > data with the clustered output to gain more insight. Let say >> > >> > After performing data preparation we have a summarized data set having >> > following attributes >> > >> > Key1,Key2,Dimension1,Dimension2,Measure1,Measure2,Measure3 >> > >> > Now I have executed clustering algorithm on following attributes >> > >> > Measure1,Measure2,Measure3 >> > >> > Output of the clustering would be Cluster Id with its >> > data(Measure1,Measure2,Measure3). >> > >> > Question: How can I perform clustering on specific attributes in >> > dataset, where the clustered output must contain all attributes. >> > >> > Request to help me with right approach. >> > >> > -- >> > Regards >> > Bikash Gupta >> -- Thanks & Regards Bikash Kumar Gupta