Ted,
Thanks again for your detailed response.
For the sake of this discussion let's assume that there is a finite set of
interests (1..n) and *interested_in*(i) is either 0 or 1
A user is either interested or not is a subject, this is why I though of
placing a one or zero in the dimensions
On Sun, Jan 15, 2012 at 2:13 PM, Raviv Pavel ra...@gigya-inc.com wrote:
If I understand correctly, in normalization option #2 you mean that each
interest is encoded to value so that the sum of all interests is 1?
Yes.
Also, What do you mean by normalize the interests to have unit vector
Just remember that Longitude is a spherical coordinate and +179 is
closer to -179 than their numeric difference. Latitude is spherical too
but +89 is indeed quite far from -89.
On 1/13/12 4:36 AM, StreetCat wrote:
The raw data had location expressed as strings such as Paris, France and
I
True.
That's why I think need a different distance measure for each attribute of
the user.
The distance between coordinates on earth is different from distance
between ages which in turn is different from the distance between two sets
of values
I think the only solution would be do develop a
Certainly not the only solution. As I've been saying: what would it
mean to have n distance measures -- how would you combine them?
If you can answer that, you can likely just as easily transform the
input so that the result is meaningful when all dimensions are
combined by one metric.
This is
I usually prefer to represent location as an xyz triple on a unit sphere.
That allows Euclidean distance to be useful.
On the 1 of n encoded values. Euclidean works as well. For gender, it also
works fine.
The only issue is how to combine these with reasonable weightings. An easy
way to do