The size should not matter, you should get output, what do you exactly
mean by "it has null"?
--sebastian
On 02.08.2013 03:44, hahn jiang wrote:
> The version of Mahout which I used is 0.7-cdh4.3.1 and I am sure that no
> errors occur. I check the output but it has null.
> I think the problem is
I would also be fine with keeping if there is demand. I just proposed to
deprecate it and nobody voted against that at that point in time.
--sebastian
On 02.08.2013 03:12, Dmitriy Lyubimov wrote:
> There's a part of Nathan Halko's dissertation referenced on algorithm page
> running comparison.
The version of Mahout which I used is 0.7-cdh4.3.1 and I am sure that no
errors occur. I check the output but it has null.
I think the problem is my data set.
Is it too small about my item set that only 200 elements?
On Thu, Aug 1, 2013 at 9:57 PM, Sebastian Schelter wrote:
> Which version of
There's a part of Nathan Halko's dissertation referenced on algorithm page
running comparison. In particular, he was not able to compute more than 40
eigenvectors with Lanczos on wikipedia dataset. You may refer to that
study.
On the accuracy part, it was not observed that it was a problem, assum
Yes, storing the similar_items in a field, cross_action_similar_items in
another field all on the same doc ided by item ID. Agree that there may be
other fields.
Storing the rows of [B'B] is ok because it's symmetric. However we did talk
about the [B'A] case and I thought we agreed to store the
I am wondering about row/column confusion as well - fleshing out the
doc/design with more specifics (which Pat is kind of doing, basically)
should make things obvious eventually, imo.
The way Pat had phrased it got me to wondering what rationale you use to
rank the results when you are querying th
On Thu, Aug 1, 2013 at 11:58 AM, Pat Ferrel wrote:
> Sorry to be dense but I think there is some miscommunication. The most
> important question is: am I writing the item-item similarity matrix DRM out
> to Solr, one row = one Solr doc?
Each row = one *field* in a Solr doc. Different DRM's pro
I have talked to one user who had ~60,000 classes and they were able to use
OLR with success.
The way that they did this was to arrange the output classes into a
multi-level tree. Then the trained classifiers at each level of the tree.
At any level, if there was a dominating result, then only th
Thanks for pointing that out. I corrected the Wiki page.
From: Marco
To: "user@mahout.apache.org"
Sent: Thursday, August 1, 2013 3:08 PM
Subject: Re: k-means issues
thanks a lot. will try your suggestions asap.
i was sort of following this http://goo.gl/u
thanks a lot. will try your suggestions asap.
i was sort of following this http://goo.gl/u8VFZN
- Messaggio originale -
Da: Jeff Eastman
A: user@mahout.apache.org
Cc:
Inviato: Giovedì 1 Agosto 2013 21:02
Oggetto: Re: k-means issues
The clustering arguments are usually directories, not
The clustering arguments are usually directories, not files. Try:
mahout clusterdump -d mahout/vectors/dictionary.file-0 -dt sequencefile -i
mahout/kmeans-clusters/clusters-1-final -n 20 -b 100 -o cdump.txt -p
mahout/kmeans-clusters/clusteredPoints
On 8/1/13 2:51 PM, Marco wrote:
mahout
You also need to specify the distance measure '-dm' to clusterdump. This is the
Distance Measure that was used for clustering.
(Again look at the example in /examples/bin/cluster-reuters.sh - it has all the
steps u r trying to accomplish)
From: Marco
To: "u
Sorry to be dense but I think there is some miscommunication. The most
important question is: am I writing the item-item similarity matrix DRM out to
Solr, one row = one Solr doc? For the mapreduce Mahout Item-based recommender
this is in "tmp/similarityMatrix". If not then please stop me. If I'
mahout clusterdump -d mahout/vectors/dictionary.file-0 -dt sequencefile -i
mahout/kmeans-clusters/clusters-1-final/part-r-0 -n 20 -b 100 -o cdump.txt
-p mahout/kmeans-clusters/clusteredPoints
- Messaggio originale -
Da: Suneel Marthi
A: "user@mahout.apache.org" ; Marco
Cc:
Inv
Say that I am trying to determine which customers buy particular candy
bars. So I want to classify training data consisting of candy bar
attributes (an N dimensional vector of variables) into customer attributes
(an M dimensional vector of customer attributes).
Is there a preferred method when N a
On Thu, Aug 1, 2013 at 7:08 AM, Sebastian Schelter wrote:
> IIRC the main reasons for deprecating Lanczos was that in contrast to
> SSVD, it does not use a constant number of MapReduce jobs and that our
> implementation has the constraint that all the resulting vectors have to
> fit into the memo
On Thu, Aug 1, 2013 at 8:46 AM, Pat Ferrel wrote:
>
> For item similarities there is no need to do more than fetch one doc that
> contains the similarities, right? I've successfully used this method with
> the Mahout recommender but please correct me if something above is wrong.
No.
First, you
Setting it to the maximum number should be enough. Would be great if you
can share your dataset and tests.
2013/8/1 Rafal Lukawiecki
> Should I have set that parameter to a value much much larger than the
> maximum number of actually expressed preferences by a user?
>
> I'm working on an anonymi
Should I have set that parameter to a value much much larger than the maximum
number of actually expressed preferences by a user?
I'm working on an anonymised data set. If it works as an error test case, I'd
be happy to share it for your re-test. I am still hoping it is my error, not
Mahout's.
Ok, please file a bug report detailing what you've tested and what results
you got.
Just to clarify, setting maxPrefsPerUser to a high number still does not
help? That surprises me.
2013/8/1 Rafal Lukawiecki
> Hi Sebastian,
>
> I've rechecked the results, and, I'm afraid that the issue has not
Hi Sebastian,
I've rechecked the results, and, I'm afraid that the issue has not gone away,
contrary to my yesterday's enthusiastic response. Using 0.8 I have retested
with and without --maxPrefsPerUser 9000 parameter (no user has more than 5000
prefs). I have also supplied the prefs file, with
Not following so…
Here so is what I've done in probably too much detail:
1) ingest raw log files and split them up by action
2) turn these into Mahout preference files using Mahout type IDs, keeping a map
of IDs
3) run the Mahout Item-based recommender using LLR for similarity
4) created a Mahou
Galit, yes this does sound like this is related, and as Matt said, you can test
this by setting the max split size on the CLI. I didn't personally find this
to be a reliable and efficient method, so I wrote the -m parameter to my job to
set it right every time. It seems that this would be usef
Could u post the Command line u r using for clusterdump?
From: Marco
To: "user@mahout.apache.org" ; Suneel Marthi
Sent: Thursday, August 1, 2013 10:29 AM
Subject: Re: k-means issues
ok i did put -cl and got clusteredPoints, but then I do clusterdump an
The original motivation of spectral clustering talks about graphs.
But the idea of clustering the reduced dimension form of a matrix simply
depends on the fact[1] that the metric is approximately preserved by the
reduced form and is thus applicable to any matrix.
[1] Johnson-Lindenstrauss yet ag
On Thu, Aug 1, 2013 at 5:49 AM, Stuti Awasthi wrote:
> I think there is a problem because of NamedVector as after some search I
> get this Jira. https://issues.apache.org/jira/browse/MAHOUT-1067
>
Note also that this bug is fixed in 0.8
Oops, I'm sorry. I had one too many zeros there, should be
'-Dmapred.max.split.size=10'
Just (input size)/(desired number of mappers)
One trick to getting more mappers on a job when running from the command
line is to pass a '-Dmapred.max.split.size=' argument. The is a
size in bytes. So if you have some hypothetical 10MB input set, but you
want to force ~100 mappers, use '-Dmapred.max.split.size=100'
On Wed, Jul 3
ok i did put -cl and got clusteredPoints, but then I do clusterdump and always
get "Wrote 0 clusters"
- Messaggio originale -
Da: Suneel Marthi
A: "user@mahout.apache.org" ; Marco
Cc:
Inviato: Giovedì 1 Agosto 2013 16:04
Oggetto: Re: k-means issues
Check examples/bin/cluster_reute
IIRC the main reasons for deprecating Lanczos was that in contrast to
SSVD, it does not use a constant number of MapReduce jobs and that our
implementation has the constraint that all the resulting vectors have to
fit into the memory of the driver machine.
Best,
Sebastian
On 01.08.2013 12:15, Fer
Check examples/bin/cluster_reuters.sh for kmeans (it exists in Mahout 0.7 too
:))
You need to specify the clustering option -cl in your kmeans command.
From: Marco
To: "user@mahout.apache.org"
Sent: Thursday, August 1, 2013 9:55 AM
Subject: k-means issu
Which version of Mahout are you using? Did you check the output, are you
sure that no errors occur?
Best,
Sebastian
On 01.08.2013 09:59, hahn jiang wrote:
> Hi all,
>
>
> I have a question when I use RecommenderJob for item-based recommendation.
>
> My input data format is "userid,itemid,1", s
So I've got 13000 text files representing topics in certain newspaper articles.
Each file is just a tab-separated list of topics (so something like "china
japan senkaku dispute" or "italy lampedusa immgration").
I want to run k-means clusteriazion on them.
Here's what I do (i'm ac
CALL FOR PARTICIPATION: CHEMDNER task: Chemical compound and drug name
recognition task (see
http://www.biocreative.org/tasks/biocreative-iv/chemdner)
(1) The CHEMDNER task (part of The BioCreative IV competition) is a
community challenge on named entity recognition of chemical compounds. The
goal
Maybe someone can clarify this issue but the spectral clustering
implementation assumes an affinity graph, am I correct? Are there direct
ways of going from a list of feature vectors to an affinity matrix in order
to then implement spectral clustering?
On Thu, Aug 1, 2013 at 8:49 AM, Stuti Awast
Thanks Ted, Dmitriy
Il check the Spectral Clustering as well PCA option but first with normal
approach I want to execute it once.
Here is what I am doing with Mahout 0.7:
1. seqdirectory :
~/mahout-distribution-0.7/bin/mahout seqdirectory -i
/stuti/SSVD/ClusteringInput -o /stuti/SSVD/data-seq
Simon, my apologies for my dumb question. I found the web site for prediction
IO—I did not realise it was a separate project, and I was looking for info in
the existing Mahout documentation. I will research it now for our use case.
--
Rafal Lukawiecki
Strategic Consultant and Director
Project Bo
Hi everyone,
Sorry if I duplicate the question but I've been looking for an answer and I
haven't found an explanation other than it's not being used (together with
some other algorithms). If it's been discussed in depth before maybe you
can point me to some link with the discussion.
I have succes
Simon, is there any documentation available, or more info on PredictionIO?
--
Rafal Lukawiecki
Pardon brevity, mobile device.
On 1 Aug 2013, at 09:13, "Simon Chan" wrote:
> We are building PredictionIO that helps to handle a number of business
> logics. Recommending only items that the user has
We are building PredictionIO that helps to handle a number of business
logics. Recommending only items that the user has never expressed a
preference before is supported.
It is a layer on top of Mahout. Hope it is helpful.
Simon
On Wed, Jul 31, 2013 at 4:57 PM, Ted Dunning wrote:
> Go with 0.8
Hi all,
I have a question when I use RecommenderJob for item-based recommendation.
My input data format is "userid,itemid,1", so I set booleanData option is
true.
The length of users is 9,000,000 but the length of item is 200.
When I run the RecommenderJob, the result is null. I try many time
41 matches
Mail list logo