Ted/Peter,
Thanks for the response.
This is exactly what I am trying to achieve. May be I was not able to
put my questions clearly.
I am clustering on few variables of Customer/User(except their
customer_id/user_id) and storing customer_id/user_id list in a
separate place.
Question) What is
On Tuesday, February 18, 2014 3:37 AM, Bikash Gupta bikash.gupt...@gmail.com
wrote:
Ted/Peter,
Thanks for the response.
This is exactly what I am trying to achieve. May be I was not able to
put my questions clearly.
I am clustering on few variables of Customer/User(except their
Suneel,
Thanks for the information.
I am using 0.7 packaged with CDH .
On Tue, Feb 18, 2014 at 2:14 PM, Suneel Marthi suneel_mar...@yahoo.com wrote:
On Tuesday, February 18, 2014 3:37 AM, Bikash Gupta
bikash.gupt...@gmail.com wrote:
Ted/Peter,
Thanks for the response.
This is
Bikash,
Don't use that version. Use a more recent release. We can't help that
Cloudera has an old version.
On Tue, Feb 18, 2014 at 1:26 AM, Bikash Gupta bikash.gupt...@gmail.comwrote:
Suneel,
Thanks for the information.
I am using 0.7 packaged with CDH .
On Tue, Feb 18, 2014 at 2:14
Yeah Tedseems there is major change in 0.9
In 0.9 I found that clsuteredPoint data are getting written in
PairKey,Vector rather than only Vector. Its good.
Thanks to everyone to answer correctly for an unframed question :)
On Tue, Feb 18, 2014 at 7:36 PM, Ted Dunning ted.dunn...@gmail.com
FYI, CDH5 includes version 0.8 + patches. But 0.9 should work fine
with CDH4. You do have to build with the Hadoop 2.x profile, as usual.
On Tue, Feb 18, 2014 at 2:06 PM, Ted Dunning ted.dunn...@gmail.com wrote:
Bikash,
Don't use that version. Use a more recent release. We can't help that
Thanks Sean.
I will check how to support 0.9 with CDH4.
However 0.9 has solved my problem.
On Tue, Feb 18, 2014 at 7:45 PM, Sean Owen sro...@gmail.com wrote:
FYI, CDH5 includes version 0.8 + patches. But 0.9 should work fine
with CDH4. You do have to build with the Hadoop 2.x profile, as
Hi,
Just to clear my below question I am citing an another example
Let say I will be clustering on any User's monthly summarized data
UserID, Transaction, Quantity, Discount
Question 1) If I input UserID, Transaction, Quantity, Discount in
Kmeans, will the output would be accurate as ideally
Think about the question in terms of whether this will define a reasonable
kind of distance between items or users.
Can you first define what you want to do? Are you clustering users? Are
you clustering items?
If users, how could the data you provide give any kind of idea about which
users are
Hi Ted,
Thanks, this helped to think align. However a new question came into mind.
Let say I am clustering users, I am providing their profile data to
discover similarity between two user.
So my input would be [UserId, Location, Age, Gender, Time Created ]
Now if my UserId length is of minimum
On Mon, Feb 17, 2014 at 9:00 AM, Bikash Gupta bikash.gupt...@gmail.comwrote:
Let say I am clustering users, I am providing their profile data to
discover similarity between two user.
So my input would be [UserId, Location, Age, Gender, Time Created ]
Now if my UserId length is of minimum 10
Ok...so UserId is not a good field for this combination, but if I want
User Clustering, what should be combination(just for
understanding).
On Tue, Feb 18, 2014 at 1:44 AM, Ted Dunning ted.dunn...@gmail.com wrote:
On Mon, Feb 17, 2014 at 9:00 AM, Bikash Gupta bikash.gupt...@gmail.comwrote:
That really depends on what you want to do.
What is it that you want?
On Mon, Feb 17, 2014 at 12:25 PM, Bikash Gupta bikash.gupt...@gmail.comwrote:
Ok...so UserId is not a good field for this combination, but if I want
User Clustering, what should be combination(just for
understanding).
Basically I am trying to achieve customer segmentation.
Now to measure customer similarity within a cluster I need to
understand which two customer are similar.
Assumption: To understand these customer uniquely I need to provide
their CustomerId
Is my assumption correct? If yes then, will
Bikash,
As Ted pointed out already..
You can cluster on all variables except your customer_id. That's your
identifier.
Customers within a cluster are 'similar'; how similar depends on the
fidelity of your clustering.
The clustering algorithm uses (you'll choose) some kind of distance, or
Bikash,
Peter is just right.
Yes, you can cluster on these few variables that you have. Probably you
should translate location to x,y,z coordinates so that you don't have
strange geometry problems, but location, gender and age are quite
reasonable characteristics. You will get a fairly weak
16 matches
Mail list logo