subject:"\[Edit\] Approach for Clustering Data"

Re: [Edit] Approach for Clustering Data

2014-02-18 Thread Bikash Gupta

Ted/Peter, Thanks for the response. This is exactly what I am trying to achieve. May be I was not able to put my questions clearly. I am clustering on few variables of Customer/User(except their customer_id/user_id) and storing customer_id/user_id list in a separate place. Question) What is

Re: [Edit] Approach for Clustering Data

2014-02-18 Thread Suneel Marthi

On Tuesday, February 18, 2014 3:37 AM, Bikash Gupta bikash.gupt...@gmail.com wrote: Ted/Peter, Thanks for the response. This is exactly what I am trying to achieve. May be I was not able to put my questions clearly. I am clustering on few variables of Customer/User(except their

Re: [Edit] Approach for Clustering Data

2014-02-18 Thread Bikash Gupta

Suneel, Thanks for the information. I am using 0.7 packaged with CDH . On Tue, Feb 18, 2014 at 2:14 PM, Suneel Marthi suneel_mar...@yahoo.com wrote: On Tuesday, February 18, 2014 3:37 AM, Bikash Gupta bikash.gupt...@gmail.com wrote: Ted/Peter, Thanks for the response. This is

Re: [Edit] Approach for Clustering Data

2014-02-18 Thread Ted Dunning

Bikash, Don't use that version. Use a more recent release. We can't help that Cloudera has an old version. On Tue, Feb 18, 2014 at 1:26 AM, Bikash Gupta bikash.gupt...@gmail.comwrote: Suneel, Thanks for the information. I am using 0.7 packaged with CDH . On Tue, Feb 18, 2014 at 2:14

Re: [Edit] Approach for Clustering Data

2014-02-18 Thread Bikash Gupta

Yeah Tedseems there is major change in 0.9 In 0.9 I found that clsuteredPoint data are getting written in PairKey,Vector rather than only Vector. Its good. Thanks to everyone to answer correctly for an unframed question :) On Tue, Feb 18, 2014 at 7:36 PM, Ted Dunning ted.dunn...@gmail.com

Re: [Edit] Approach for Clustering Data

2014-02-18 Thread Sean Owen

FYI, CDH5 includes version 0.8 + patches. But 0.9 should work fine with CDH4. You do have to build with the Hadoop 2.x profile, as usual. On Tue, Feb 18, 2014 at 2:06 PM, Ted Dunning ted.dunn...@gmail.com wrote: Bikash, Don't use that version. Use a more recent release. We can't help that

Re: [Edit] Approach for Clustering Data

2014-02-18 Thread Bikash Gupta

Thanks Sean. I will check how to support 0.9 with CDH4. However 0.9 has solved my problem. On Tue, Feb 18, 2014 at 7:45 PM, Sean Owen sro...@gmail.com wrote: FYI, CDH5 includes version 0.8 + patches. But 0.9 should work fine with CDH4. You do have to build with the Hadoop 2.x profile, as

Re: [Edit] Approach for Clustering Data

2014-02-17 Thread Bikash Gupta

Hi, Just to clear my below question I am citing an another example Let say I will be clustering on any User's monthly summarized data UserID, Transaction, Quantity, Discount Question 1) If I input UserID, Transaction, Quantity, Discount in Kmeans, will the output would be accurate as ideally

Re: [Edit] Approach for Clustering Data

2014-02-17 Thread Ted Dunning

Think about the question in terms of whether this will define a reasonable kind of distance between items or users. Can you first define what you want to do? Are you clustering users? Are you clustering items? If users, how could the data you provide give any kind of idea about which users are

Re: [Edit] Approach for Clustering Data

2014-02-17 Thread Bikash Gupta

Hi Ted, Thanks, this helped to think align. However a new question came into mind. Let say I am clustering users, I am providing their profile data to discover similarity between two user. So my input would be [UserId, Location, Age, Gender, Time Created ] Now if my UserId length is of minimum

Re: [Edit] Approach for Clustering Data

2014-02-17 Thread Ted Dunning

On Mon, Feb 17, 2014 at 9:00 AM, Bikash Gupta bikash.gupt...@gmail.comwrote: Let say I am clustering users, I am providing their profile data to discover similarity between two user. So my input would be [UserId, Location, Age, Gender, Time Created ] Now if my UserId length is of minimum 10

Re: [Edit] Approach for Clustering Data

2014-02-17 Thread Bikash Gupta

Ok...so UserId is not a good field for this combination, but if I want User Clustering, what should be combination(just for understanding). On Tue, Feb 18, 2014 at 1:44 AM, Ted Dunning ted.dunn...@gmail.com wrote: On Mon, Feb 17, 2014 at 9:00 AM, Bikash Gupta bikash.gupt...@gmail.comwrote:

Re: [Edit] Approach for Clustering Data

2014-02-17 Thread Ted Dunning

That really depends on what you want to do. What is it that you want? On Mon, Feb 17, 2014 at 12:25 PM, Bikash Gupta bikash.gupt...@gmail.comwrote: Ok...so UserId is not a good field for this combination, but if I want User Clustering, what should be combination(just for understanding).

Re: [Edit] Approach for Clustering Data

2014-02-17 Thread Bikash Gupta

Basically I am trying to achieve customer segmentation. Now to measure customer similarity within a cluster I need to understand which two customer are similar. Assumption: To understand these customer uniquely I need to provide their CustomerId Is my assumption correct? If yes then, will

Re: [Edit] Approach for Clustering Data

2014-02-17 Thread Peter Jaumann

Bikash, As Ted pointed out already.. You can cluster on all variables except your customer_id. That's your identifier. Customers within a cluster are 'similar'; how similar depends on the fidelity of your clustering. The clustering algorithm uses (you'll choose) some kind of distance, or

Re: [Edit] Approach for Clustering Data

2014-02-17 Thread Ted Dunning

Bikash, Peter is just right. Yes, you can cluster on these few variables that you have. Probably you should translate location to x,y,z coordinates so that you don't have strange geometry problems, but location, gender and age are quite reasonable characteristics. You will get a fairly weak

Re: [Edit] Approach for Clustering Data

Re: [Edit] Approach for Clustering Data

Re: [Edit] Approach for Clustering Data

Re: [Edit] Approach for Clustering Data

Re: [Edit] Approach for Clustering Data

Re: [Edit] Approach for Clustering Data

Re: [Edit] Approach for Clustering Data

Re: [Edit] Approach for Clustering Data

Re: [Edit] Approach for Clustering Data

Re: [Edit] Approach for Clustering Data

Re: [Edit] Approach for Clustering Data

Re: [Edit] Approach for Clustering Data

Re: [Edit] Approach for Clustering Data

Re: [Edit] Approach for Clustering Data

Re: [Edit] Approach for Clustering Data

Re: [Edit] Approach for Clustering Data

16 matches

Site Navigation

Mail list logo

Footer information