I always remember the point-to-line formula in vector-land by thinking
of a point A and line through points B and C. The norm of AB x AC is
twice the area of the triangle ABC. Twice the area of the triangle is
also base times height; base is the norm of BC, and height is what you
want. So the dista
Ok, thanks.
Just to make it clear to me: I take the date with the lucene vectors and
operate a training Alg. on them. And this should result into a model. I
don't need some preprocessing steps or anything else?
Another question: your book MiA gives a good explanation and overview about
mahout. Can
Ok, I discovered that I have to check, if my data contains TermFreq vectors.
That has to wait until next week, I think...
Do I have to convert the lucene index files into lucene vector files, in
order to use the data for training?
Regards,
David
2011/10/14 David Rahman
> Ok, thanks.
> Just to
Hi, all,
I am reading mahout's fp-growth code now, and I am a little bit
confusing about its implementation.
The part of grouping. Why grouping? And during the parallel fp-growth,
why it ignore some items?
Example: a transaction: A,B,C,D
If A and B belongs to the same group, the
FYI, I think I see the problem. Working on a fix.
On Oct 14, 2011, at 2:28 AM, Lance Norskog wrote:
> cd mahout/examples/bin
> ./build-asf-email.sh content/ out/ over/
> select 1 for recommender
>
> where content/ is
> content/coccoon.apache.org
> content/commons.apache.org
>
> and out/ and ov
OK, I believe I checked in a fix. The issue came down to me generalizing the
SeqFilesFromMailArchives in terms of the metadata extraction (from, to,
references, etc.) and the fact that the code I use to extract preferences
(MailToRecMapper) depended on things being in a specific order.
On Oct
Hi Guys,
I just downloaded the latest SVN and updated my java project to point
towards the newly compiled JAR's. I'm getting the following error when I run
my code, can anyone point me in the right direction as to what may have
changed? I've included the various dependencies etc. as per usual.
Ex
It sounds like there's a Guava version conflict. I think we use the
fairly recent r09. That update happened... maybe 5-6 months ago? Not
recent, but there was an update at some point.
On Fri, Oct 14, 2011 at 4:13 PM, Steven Bourke wrote:
> Hi Guys,
>
> I just downloaded the latest SVN and updated
Correct.
To resolve the issue in my setup, I had to remove the older guava jar library
file.
On Oct 14, 2011, at 11:25 AM, Sean Owen wrote:
> It sounds like there's a Guava version conflict. I think we use the
> fairly recent r09. That update happened... maybe 5-6 months ago? Not
> recent,
Steven,
Here is the link that talks briefly on the topic.
http://code.google.com/p/google-collections/
I saw the jar references in the build path and stored in << trunk > examples >
target > dependency >> folder
hope this helps.
From: "manjuna...@yahoo.com"
This form is equivalent to a dot product:
n \cdot x = c
where n is the normalized vector n = (A, B, ...) / | (A, B, ...) |, x is the
vector form of the point and c = Z / | n |
The vector n is unit length and orthogonal to the line and c is the shortest
distance to the origin.
The distance
Argh...
Make that
c - n \cdot p
It always helps to check that points on a line are zero distance from the
line.
On Fri, Oct 14, 2011 at 9:57 AM, Ted Dunning wrote:
> This form is equivalent to a dot product:
>
> n \cdot x = c
>
> where n is the normalized vector n = (A, B, ...) /
I was having trouble with the "-i" parameter for the RecommenderJob in
Mahout 0.5
ie:
mahout recommenditembased -i /hdfs/dir
kept telling me that I had not given it a hdfs directory. When i used the full:
mahout recommenditembased --input /hdfs/dir
it ran the job. Anyone else seen th
I see the problem. Actually, this job is accidentally mixing up two
options by giving them the same short name, -i. You are correctly
using -i as short for -input (which works, good), but it's also using
-i for -itemsFile which is something else.
Sebastian, how about I just remove their short form
> Sebastian, how about I just remove their short forms, for -itemsFile
> and -usersFile?
Good idea.
--sebastian
If you are using the trunk, look at examples/bin/build-asf-email.sh. This
does the "three C's": classification, clustering, and collaborative
filtering all on archive of apache.org mailing lists.
The 'classification' path at the end goes through the high-level jobs. It
should show you how to get t
The 'standard' bayes classifier does not work (for me); it assigns all mail
to one newsgroup. The complementary algorithm does better.
On Fri, Oct 14, 2011 at 7:57 PM, Lance Norskog wrote:
> If you are using the trunk, look at examples/bin/build-asf-email.sh. This
> does the "three C's": classif
It helps to remove your .m2 maven repository every so often and download
all of the current stuff.
On Fri, Oct 14, 2011 at 9:24 AM, Manju wrote:
> Steven,
>
> Here is the link that talks briefly on the topic.
> http://code.google.com/p/google-collections/
>
> I saw the jar references in the bui
Bingo, I'm getting recs now.
On Fri, Oct 14, 2011 at 8:10 AM, Grant Ingersoll wrote:
> OK, I believe I checked in a fix. The issue came down to me generalizing
> the SeqFilesFromMailArchives in terms of the metadata extraction (from, to,
> references, etc.) and the fact that the code I use to ex
Where did Z come from?
On Fri, Oct 14, 2011 at 9:58 AM, Ted Dunning wrote:
> Argh...
>
> Make that
>
> c - n \cdot p
>
> It always helps to check that points on a line are zero distance from the
> line.
>
> On Fri, Oct 14, 2011 at 9:57 AM, Ted Dunning
> wrote:
>
> > This form is equivalent
20 matches
Mail list logo