Re: Clustering user profiles

2012-01-15 Thread Raviv Pavel
Ted, Thanks again for your detailed response. For the sake of this discussion let's assume that there is a finite set of interests (1..n) and *interested_in*(i) is either 0 or 1 A user is either interested or not is a subject, this is why I though of placing a one or zero in the dimensions

Re: Clustering user profiles

2012-01-15 Thread Ted Dunning
On Sun, Jan 15, 2012 at 2:13 PM, Raviv Pavel ra...@gigya-inc.com wrote: If I understand correctly, in normalization option #2 you mean that each interest is encoded to value so that the sum of all interests is 1? Yes. Also, What do you mean by normalize the interests to have unit vector

Re: Clustering user profiles

2012-01-13 Thread Jeff Eastman
Just remember that Longitude is a spherical coordinate and +179 is closer to -179 than their numeric difference. Latitude is spherical too but +89 is indeed quite far from -89. On 1/13/12 4:36 AM, StreetCat wrote: The raw data had location expressed as strings such as Paris, France and I

Re: Clustering user profiles

2012-01-13 Thread Raviv Pavel
True. That's why I think need a different distance measure for each attribute of the user. The distance between coordinates on earth is different from distance between ages which in turn is different from the distance between two sets of values I think the only solution would be do develop a

Re: Clustering user profiles

2012-01-13 Thread Sean Owen
Certainly not the only solution. As I've been saying: what would it mean to have n distance measures -- how would you combine them? If you can answer that, you can likely just as easily transform the input so that the result is meaningful when all dimensions are combined by one metric. This is

Re: Clustering user profiles

2012-01-13 Thread Ted Dunning
I usually prefer to represent location as an xyz triple on a unit sphere. That allows Euclidean distance to be useful. On the 1 of n encoded values. Euclidean works as well. For gender, it also works fine. The only issue is how to combine these with reasonable weightings. An easy way to do