Clustering user profiles

2012-01-12 Thread Raviv Pavel
Hi, I'm new to Mahout (and machine learning) but did quite a lot of reading, especially "Mahout in Action". I'm trying to cluster users based on their profiles. By profile I mean attributes such as: age, gender, location and set of interests All the examples I saw so far were about vectors h

Clustering user profiles

2012-01-12 Thread Raviv Pavel
Hi, I'm new to Mahout (and machine learning) but did quite a lot of reading, especially "Mahout in Action". I'm trying to cluster users based on their profiles. By profile I mean attributes such as: age, gender, location and set of interests All the examples I saw so far were about vectors havin

Re: Clustering user profiles

2012-01-12 Thread Raviv Pavel
Looking at the problem from a developers perspective, the questing it this: Can I develop a custom vector where each dimension has a different data type (where the type can complex, e.g. Set) and use a different distance measure class for each dimension? -- View this message in context: http://

Re: Clustering user profiles

2012-01-13 Thread Raviv Pavel
My initial plan was to do exactly that, use 0 & 1 for gender, age as is, lat & lon in two dimensions, and one dimension holding 0 or 1 per possible interest (each value is mapped to an offset in the dimension) For simplicity let's assume I have 3 types of interests, so a vector of a person would lo

Re: Clustering user profiles

2012-01-13 Thread Raviv Pavel
True. That's why I think need a different distance measure for each attribute of the user. The distance between coordinates on earth is different from distance between ages which in turn is different from the distance between two sets of values I think the only solution would be do develop a custo

Re: Clustering user profiles

2012-01-15 Thread Raviv Pavel
sions representing the specific interest. If I understand correctly, in normalization option #2 you mean that each interest is encoded to value so that the sum of all interests is 1? Also, What do you mean by "normalize the interests to have unit vector magnitude"? On Sat, Jan 14, 2012 at

Re: Twitter Classification

2012-01-17 Thread Raviv Pavel
Where are you taking the list of topics from ? * * *--*Raviv On Tue, Jan 17, 2012 at 12:05 AM, tdguest wrote: > > Very Interesting discussion! Are there are any tweet classification > systems existing right now? Would love to check them out! > > I have a site called http://www.tweetdynami

Running K-Means in memory

2012-01-22 Thread Raviv Pavel
Hi, I'm running K-Means in memory (testing different distance measures, normalization and weights) After the clusterer is done, how do I know which vector belongs to which cluster? Thanks, Raviv. -- View this message in context: http://lucene.472066.n3.nabble.com/Running-K-Means-in-memory-tp36

Re: Running K-Means in memory

2012-01-23 Thread Raviv Pavel
use ClusterDumper, else use > ClusterOutputPostProcessor. > > From: Raviv Pavel [street...@gmail.com] > Sent: Sunday, January 22, 2012 11:17 PM > To: mahout-u...@lucene.apache.org > Subject: Running K-Means in memory > > Hi, > &

Re: Running K-Means in memory

2012-01-23 Thread Raviv Pavel
On Mon, Jan 23, 2012 at 1:34 PM, Paritosh Ranjan wrote: > Check out ClusterOutputPostProcessorTest, its doing it in memory ( for > Canopy ), same code will work for K-Means also). > > Paritosh > ____ > From: Raviv Pavel [ra...@gigya-inc.com

Re: Running K-Means in memory

2012-01-23 Thread Raviv Pavel
gt; In cases 1 and 2. The ClusterDumper or ClusterOutputPostProcessor would > work as desired. > > Paritosh > > ____ > From: Raviv Pavel [ra...@gigya-inc.com] > Sent: Monday, January 23, 2012 2:02 PM > To: user@mahout.apache.org > Cc: mahout-u...@l

Finding max/min distance

2012-01-25 Thread Raviv Pavel
Given a list of vectors and a distance measure, what's the fastest way to find the maximum & minimum distances in the list ? Thanks. *--*Raviv

Re: Finding max/min distance

2012-01-25 Thread Raviv Pavel
pair of vectors which are "farthest" and "nearest" > ? > If yes, I think you will have to compare all pairs using > > double distance(Vector v1, Vector v2); > > of DistanceMeasure. > > I don't think Mahout has something in built for this. >

Re: Finding max/min distance

2012-01-25 Thread Raviv Pavel
Then compare the vectors of farthest and nearest "clusters" to find the > farthest and nearest "vector pairs". > > > From: Raviv Pavel [ra...@gigya-inc.com] > Sent: Wednesday, January 25, 2012 4:44 PM > To: user@mah