Hello all,
I checked out the latest mahout 0.8 code this morning but get an error when
I run seq2sparse.
$ mahout seq2sparse -i in -o out --namedVector --weight tf
hadoop binary is not in PATH,HADOOP_HOME/bin,HADOOP_PREFIX/bin, running
locally
27-Feb-2013 17:08:58 org.slf4j.impl.JCLLoggerAdapter
Yes, we need the snapshot because of the streaming k-means mapper and
reducer tests.
Specifically, we need to add more than one input to the mappers (we
need the entire set of points). Only the mrunit SNAPSHOT has this
feature.
On Wed, Feb 27, 2013 at 12:04 AM, Ted Dunning wrote:
> The problem h
Sorry for the confusion, I meant the same thing. I'm also looking at the
content of my clusteredPoints/part-m-0 file.
I'm having trouble filtering outliers from my clusters too. Depending on
the clusterClassificationThreshold value, either all or none of my points
are classified. I think it's
A common measure of cluster coherence is the mean distance or mean squared
difference between the members and the cluster centroid. It sounds like
this is the kind of thing you're measuring with this all-pairs distances.
That could be a measure too; I've usually seen that done by taking the
maximum
Clustering for me worked, (sorry if I didn't make that part clear) it's the
empty clusteredPoints/part-m-0 file is the problem I'm having.
Any value greater than 0.025 and the clusteredPoints/part-m-0 is empty and
I use that file to map the document to the cluster it ended up in.
If I c
Thanks for the details!
I don't believe it's a memory issue cause our dataset is smaller than 1GB.
Anyhow I will go ahead and will try to execute it on much smaller dataset, just
to be sure.
As for my second question... How could I extract the 2 small matrices (U*K &
I*K) into CSV's using Mahout
Hmmm, you may have to dumb things down for me here. I have don't have much of a
background in the area of ML and I'm just piecing things together and learning
as I go.
So I don't really understand what you mean by "Coherence against an external
standard? Or internal consistency/homogeneity?" or
It's true, although many of the algorithms will by nature not emphasize
popular items.
There is an old and semi-deprecated class in the project
called InverseUserFrequency, which you can use to manually de-emphasize
popular items internally. I wouldn't really recommend it.
You can always use IDRes
Consider using IDRescorer to penalize or skip items.
On Mon, Feb 4, 2013 at 6:54 PM, Zia mel wrote:
> Hi , is there a current way to remove the popular items in the
> recommendations? Something like STOP words.
> Thanks !
>
Hi,
I am looking at how to use mahout for web page categorization.
Idea is to have various categories like
Adult
Arts
Business
Computers
Games
Health
Home
Kids
News
Recreation
Reference
Science
Shopping
Society
Sports
and classify given web page into specific category.
After going through some
The difference is that the job used a reduce-side join to join feature
vectors and ratings in 0.5 which is scalable but very slow.
We changed this to a broadcast join in later versions, which can be
executed using a single map-only job. However, each of the feature
matrices has to fit into the map
Yes I'm sure.
We used some code of us that execute the specific ParallelALSFactorizationJob.
Same execution worked for mahout0.5 but not for 0.6 \ 0.7.
Is there anything different in the way this job is activated?
-Original Message-
From: Sebastian Schelter [mailto:s...@apache.org]
Sent
Hell Razon,
this a strange bug that should not happen. It seems that some of the
vectors supplied to the solver are null. Are you sure that there no
exceptions previous to this one?
Best,
Sebastian
On 27.02.2013 09:53, Razon, Oren wrote:
> Hi there,
> I'm using Hadoop-core 0.20.3 and I want to u
Hi there,
I'm using Hadoop-core 0.20.3 and I want to use mahout ALS algorithm.
My purpose is to run the ALS model and extract the decomposed matrices for
further usage in my application (I want to create 2 different csv files:
[UserId, latentFeatureId, Value] and [ItemId, latentFeatureId, Value])
14 matches
Mail list logo