Re: Point-to-line / point-to-vector?

2011-10-14 Thread Sean Owen
I always remember the point-to-line formula in vector-land by thinking of a point A and line through points B and C. The norm of AB x AC is twice the area of the triangle ABC. Twice the area of the triangle is also base times height; base is the norm of BC, and height is what you want. So the dista

Re: text classification using mahout and lucene index

2011-10-14 Thread David Rahman
Ok, thanks. Just to make it clear to me: I take the date with the lucene vectors and operate a training Alg. on them. And this should result into a model. I don't need some preprocessing steps or anything else? Another question: your book MiA gives a good explanation and overview about mahout. Can

Re: text classification using mahout and lucene index

2011-10-14 Thread David Rahman
Ok, I discovered that I have to check, if my data contains TermFreq vectors. That has to wait until next week, I think... Do I have to convert the lucene index files into lucene vector files, in order to use the data for training? Regards, David 2011/10/14 David Rahman > Ok, thanks. > Just to

About frequent pattern mining

2011-10-14 Thread 戴清灏
Hi, all, I am reading mahout's fp-growth code now, and I am a little bit confusing about its implementation. The part of grouping. Why grouping? And during the parallel fp-growth, why it ignore some items? Example: a transaction: A,B,C,D If A and B belongs to the same group, the

Re: RecommenderJob and NaN

2011-10-14 Thread Grant Ingersoll
FYI, I think I see the problem. Working on a fix. On Oct 14, 2011, at 2:28 AM, Lance Norskog wrote: > cd mahout/examples/bin > ./build-asf-email.sh content/ out/ over/ > select 1 for recommender > > where content/ is > content/coccoon.apache.org > content/commons.apache.org > > and out/ and ov

Re: RecommenderJob and NaN

2011-10-14 Thread Grant Ingersoll
OK, I believe I checked in a fix. The issue came down to me generalizing the SeqFilesFromMailArchives in terms of the metadata extraction (from, to, references, etc.) and the fact that the code I use to extract preferences (MailToRecMapper) depended on things being in a specific order. On Oct

com/google/common/base/CharMatcher change?

2011-10-14 Thread Steven Bourke
Hi Guys, I just downloaded the latest SVN and updated my java project to point towards the newly compiled JAR's. I'm getting the following error when I run my code, can anyone point me in the right direction as to what may have changed? I've included the various dependencies etc. as per usual. Ex

Re: com/google/common/base/CharMatcher change?

2011-10-14 Thread Sean Owen
It sounds like there's a Guava version conflict. I think we use the fairly recent r09. That update happened... maybe 5-6 months ago? Not recent, but there was an update at some point. On Fri, Oct 14, 2011 at 4:13 PM, Steven Bourke wrote: > Hi Guys, > > I just downloaded the latest SVN and updated

Re: com/google/common/base/CharMatcher change?

2011-10-14 Thread manjunaths
Correct. To resolve the issue in my setup, I had to remove the older guava jar library file. On Oct 14, 2011, at 11:25 AM, Sean Owen wrote: > It sounds like there's a Guava version conflict. I think we use the > fairly recent r09. That update happened... maybe 5-6 months ago? Not > recent,

Re: com/google/common/base/CharMatcher change?

2011-10-14 Thread Manju
Steven, Here is the link that talks briefly on the topic. http://code.google.com/p/google-collections/ I saw the jar references in the build path and stored in << trunk > examples > target > dependency >> folder hope this helps. From: "manjuna...@yahoo.com"

Re: Point-to-line / point-to-vector?

2011-10-14 Thread Ted Dunning
This form is equivalent to a dot product: n \cdot x = c where n is the normalized vector n = (A, B, ...) / | (A, B, ...) |, x is the vector form of the point and c = Z / | n | The vector n is unit length and orthogonal to the line and c is the shortest distance to the origin. The distance

Re: Point-to-line / point-to-vector?

2011-10-14 Thread Ted Dunning
Argh... Make that c - n \cdot p It always helps to check that points on a line are zero distance from the line. On Fri, Oct 14, 2011 at 9:57 AM, Ted Dunning wrote: > This form is equivalent to a dot product: > > n \cdot x = c > > where n is the normalized vector n = (A, B, ...) /

input data parameter with RecommenderJob in 0.5

2011-10-14 Thread Josh Patterson
I was having trouble with the "-i" parameter for the RecommenderJob in Mahout 0.5 ie: mahout recommenditembased -i /hdfs/dir kept telling me that I had not given it a hdfs directory. When i used the full: mahout recommenditembased --input /hdfs/dir it ran the job. Anyone else seen th

Re: input data parameter with RecommenderJob in 0.5

2011-10-14 Thread Sean Owen
I see the problem. Actually, this job is accidentally mixing up two options by giving them the same short name, -i. You are correctly using -i as short for -input (which works, good), but it's also using -i for -itemsFile which is something else. Sebastian, how about I just remove their short form

Re: input data parameter with RecommenderJob in 0.5

2011-10-14 Thread Sebastian Schelter
> Sebastian, how about I just remove their short forms, for -itemsFile > and -usersFile? Good idea. --sebastian

Re: text classification using mahout and lucene index

2011-10-14 Thread Lance Norskog
If you are using the trunk, look at examples/bin/build-asf-email.sh. This does the "three C's": classification, clustering, and collaborative filtering all on archive of apache.org mailing lists. The 'classification' path at the end goes through the high-level jobs. It should show you how to get t

Re: text classification using mahout and lucene index

2011-10-14 Thread Lance Norskog
The 'standard' bayes classifier does not work (for me); it assigns all mail to one newsgroup. The complementary algorithm does better. On Fri, Oct 14, 2011 at 7:57 PM, Lance Norskog wrote: > If you are using the trunk, look at examples/bin/build-asf-email.sh. This > does the "three C's": classif

Re: com/google/common/base/CharMatcher change?

2011-10-14 Thread Lance Norskog
It helps to remove your .m2 maven repository every so often and download all of the current stuff. On Fri, Oct 14, 2011 at 9:24 AM, Manju wrote: > Steven, > > Here is the link that talks briefly on the topic. > http://code.google.com/p/google-collections/ > > I saw the jar references in the bui

Re: RecommenderJob and NaN

2011-10-14 Thread Lance Norskog
Bingo, I'm getting recs now. On Fri, Oct 14, 2011 at 8:10 AM, Grant Ingersoll wrote: > OK, I believe I checked in a fix. The issue came down to me generalizing > the SeqFilesFromMailArchives in terms of the metadata extraction (from, to, > references, etc.) and the fact that the code I use to ex

Re: Point-to-line / point-to-vector?

2011-10-14 Thread Lance Norskog
Where did Z come from? On Fri, Oct 14, 2011 at 9:58 AM, Ted Dunning wrote: > Argh... > > Make that > > c - n \cdot p > > It always helps to check that points on a line are zero distance from the > line. > > On Fri, Oct 14, 2011 at 9:57 AM, Ted Dunning > wrote: > > > This form is equivalent