Re: RecommenderJob and NaN

2011-10-14 Thread Sebastian Schelter
Only got the raw data, how did you convert it to our standard recommender input? --sebastian On 14.10.2011 01:17, Grant Ingersoll wrote: Were you able to get the data, Sebastian? On Oct 13, 2011, at 4:01 AM, Sebastian Schelter wrote: Grant, Can you share a little more details about the

Re: RecommenderJob and NaN

2011-10-14 Thread Lance Norskog
cd mahout/examples/bin ./build-asf-email.sh content/ out/ over/ select 1 for recommender where content/ is content/coccoon.apache.org content/commons.apache.org and out/ and over/ are output directories. Run the shell script with -x as you will probably have to tweak it. Lance On Thu, Oct 13,

Point-to-line / point-to-vector?

2011-10-14 Thread Lance Norskog
Is there an n-dimensional point-to-line or point-to-vector or point-to-ray distance method somewhere? Is this the easiest to do? http://mathforum.org/kb/message.jspa?messageID=1072518tstart=0 For the truly obsessed: N-dimensional CGI artifacts such as normals, bounding boxes, etc.

Re: Point-to-line / point-to-vector?

2011-10-14 Thread Sean Owen
I always remember the point-to-line formula in vector-land by thinking of a point A and line through points B and C. The norm of AB x AC is twice the area of the triangle ABC. Twice the area of the triangle is also base times height; base is the norm of BC, and height is what you want. So the

Re: text classification using mahout and lucene index

2011-10-14 Thread David Rahman
Ok, thanks. Just to make it clear to me: I take the date with the lucene vectors and operate a training Alg. on them. And this should result into a model. I don't need some preprocessing steps or anything else? Another question: your book MiA gives a good explanation and overview about mahout.

Re: text classification using mahout and lucene index

2011-10-14 Thread David Rahman
Ok, I discovered that I have to check, if my data contains TermFreq vectors. That has to wait until next week, I think... Do I have to convert the lucene index files into lucene vector files, in order to use the data for training? Regards, David 2011/10/14 David Rahman

About frequent pattern mining

2011-10-14 Thread 戴清灏
Hi, all, I am reading mahout's fp-growth code now, and I am a little bit confusing about its implementation. The part of grouping. Why grouping? And during the parallel fp-growth, why it ignore some items? Example: a transaction: A,B,C,D If A and B belongs to the same group,

Re: RecommenderJob and NaN

2011-10-14 Thread Grant Ingersoll
FYI, I think I see the problem. Working on a fix. On Oct 14, 2011, at 2:28 AM, Lance Norskog wrote: cd mahout/examples/bin ./build-asf-email.sh content/ out/ over/ select 1 for recommender where content/ is content/coccoon.apache.org content/commons.apache.org and out/ and over/ are

Re: RecommenderJob and NaN

2011-10-14 Thread Grant Ingersoll
OK, I believe I checked in a fix. The issue came down to me generalizing the SeqFilesFromMailArchives in terms of the metadata extraction (from, to, references, etc.) and the fact that the code I use to extract preferences (MailToRecMapper) depended on things being in a specific order. On Oct

com/google/common/base/CharMatcher change?

2011-10-14 Thread Steven Bourke
Hi Guys, I just downloaded the latest SVN and updated my java project to point towards the newly compiled JAR's. I'm getting the following error when I run my code, can anyone point me in the right direction as to what may have changed? I've included the various dependencies etc. as per usual.

Re: com/google/common/base/CharMatcher change?

2011-10-14 Thread Sean Owen
It sounds like there's a Guava version conflict. I think we use the fairly recent r09. That update happened... maybe 5-6 months ago? Not recent, but there was an update at some point. On Fri, Oct 14, 2011 at 4:13 PM, Steven Bourke sbou...@gmail.com wrote: Hi Guys, I just downloaded the latest

Re: com/google/common/base/CharMatcher change?

2011-10-14 Thread manjunaths
Correct. To resolve the issue in my setup, I had to remove the older guava jar library file. On Oct 14, 2011, at 11:25 AM, Sean Owen sro...@gmail.com wrote: It sounds like there's a Guava version conflict. I think we use the fairly recent r09. That update happened... maybe 5-6 months ago?

Re: com/google/common/base/CharMatcher change?

2011-10-14 Thread Manju
Steven, Here is the link that talks briefly on the topic. http://code.google.com/p/google-collections/ I saw the jar references in the build path and stored in trunk examples target dependency folder hope this helps. From: manjuna...@yahoo.com

Re: Point-to-line / point-to-vector?

2011-10-14 Thread Ted Dunning
This form is equivalent to a dot product: n \cdot x = c where n is the normalized vector n = (A, B, ...) / | (A, B, ...) |, x is the vector form of the point and c = Z / | n | The vector n is unit length and orthogonal to the line and c is the shortest distance to the origin. The

Re: Point-to-line / point-to-vector?

2011-10-14 Thread Ted Dunning
Argh... Make that c - n \cdot p It always helps to check that points on a line are zero distance from the line. On Fri, Oct 14, 2011 at 9:57 AM, Ted Dunning ted.dunn...@gmail.com wrote: This form is equivalent to a dot product: n \cdot x = c where n is the normalized vector n

input data parameter with RecommenderJob in 0.5

2011-10-14 Thread Josh Patterson
I was having trouble with the -i parameter for the RecommenderJob in Mahout 0.5 ie: mahout recommenditembased -i /hdfs/dir kept telling me that I had not given it a hdfs directory. When i used the full: mahout recommenditembased --input /hdfs/dir it ran the job. Anyone else seen

Re: input data parameter with RecommenderJob in 0.5

2011-10-14 Thread Sean Owen
I see the problem. Actually, this job is accidentally mixing up two options by giving them the same short name, -i. You are correctly using -i as short for -input (which works, good), but it's also using -i for -itemsFile which is something else. Sebastian, how about I just remove their short

Re: input data parameter with RecommenderJob in 0.5

2011-10-14 Thread Sebastian Schelter
Sebastian, how about I just remove their short forms, for -itemsFile and -usersFile? Good idea. --sebastian

Re: RecommenderJob and NaN

2011-10-14 Thread Lance Norskog
Bingo, I'm getting recs now. On Fri, Oct 14, 2011 at 8:10 AM, Grant Ingersoll gsing...@apache.orgwrote: OK, I believe I checked in a fix. The issue came down to me generalizing the SeqFilesFromMailArchives in terms of the metadata extraction (from, to, references, etc.) and the fact that the

Re: Point-to-line / point-to-vector?

2011-10-14 Thread Lance Norskog
Where did Z come from? On Fri, Oct 14, 2011 at 9:58 AM, Ted Dunning ted.dunn...@gmail.com wrote: Argh... Make that c - n \cdot p It always helps to check that points on a line are zero distance from the line. On Fri, Oct 14, 2011 at 9:57 AM, Ted Dunning ted.dunn...@gmail.com