the correct Dmapred is -Dmapred.dir=output, not --Dmapred.dir=output
--
View this message in context:
http://lucene.472066.n3.nabble.com/Error-Running-mahout-core-0-5-job-jar-tp3846385p3847789.html
Sent from the Mahout User List mailing list archive at Nabble.com.
Hi,
As a data mining developer who need to build a recommender engine POC (Proof Of
Concept) to support several future use cases, I've found Mahout framework as an
appealing place to start with. But as I'm new to Mahout and Hadoop in general
I've a couple of questions...
1. In Mahout in
1. These are the JDBC-related classes. For example see
MySQLJDBCDiffStorage or MySQLJDBCDataModel in integration/
2. The distributed and non-distributed code are quite separate. At
this scale I don't think you can use the non-distributed code to a
meaningful degree. For example you could
Hi Sean,
Thanks for your fast response, I really appreciate the quality of your book
(Mahout in action), and the support you give in such forums.
Just to clear my second question...
I want to build a recommender framework that will support different use cases.
So my intention is to have both
Hi,
I think I can answer this question...
Yes, you can run a clustering algorithm on your local machine without using
Hadoop. Just include the mahout jar files in your classpath and start using
it as just another java library. I am currently experimenting with
TreeClusteringRecommender but you
Most of the Mahout clustering algorithms have an -xm sequential CLI
option that runs locally in-memory from/to Hadoop-style sequence files.
And, as below, you can also call the Java driver methods directly from
your program.
On 3/22/12 9:22 AM, Ahmed Abdeen Hamed wrote:
Hi,
I think I can
A distributed and non-distributed recommender are really quite
separate. They perform the same task in quite different ways. I don't
think you would mix them per se.
Depends on what you mean by a model-based recommender... I would call
the matrix-factorization-based and clustering-based
Thanks so much tianwild for pointing out the typo. Now it's running but I got
a different error msg:
Exception in thread main
org.apache.hadoop.mapred.FileAlreadyExistsException: Output directory
temp/itemIDIndex already exists
Any idea how to resolve this issue?
Many thanks.
--
View this
That pretty much means what it says = delete temp.
On Thu, Mar 22, 2012 at 6:06 PM, jeanbabyxu jessica...@aexp.com wrote:
Thanks so much tianwild for pointing out the typo. Now it's running but I got
a different error msg:
Exception in thread main
Yes. This prevents accidental overwrite, and mimics how Hadoop/HDFS
generally act.
On Thu, Mar 22, 2012 at 6:58 PM, jeanbabyxu jessica...@aexp.com wrote:
I was able to manually clear out the output directory by using
bin/hadoop dfs -rmr output.
But do we have to remove all content in the
You can also use HadoopUtil.delete(conf, paths) api or use the -ow
(override) flag ( if available for that job).
On 23-03-2012 00:28, jeanbabyxu wrote:
I was able to manually clear out the output directory by using
bin/hadoop dfs -rmr output.
But do we have to remove all content in the
It is wherever you compiled your own classes -- it's up to you.
SIMILARITY_EUCLEDEAN_DISTANCE is not a class.
You should use 0.6 anyway. While you may find you have to make minor
modifications if following the book, it's 99% compatible.
On Thu, Mar 22, 2012 at 8:07 PM, jeanbabyxu
What do you mean that you have a user-item association from a
log-likelihood metric?
Combining two values is easy in the sense that you can average them or
something, but only if they are in the same units. Log likelihood
may be viewed as a probability. The distance function you derive from
it --
You're implementing userSimilarity(), but appear to be computing
item-item similarity. Halfway through, you use the item IDs as user
IDs. I can't see what this is intending to do as a result?
On Thu, Mar 22, 2012 at 9:33 PM, Ahmed Abdeen Hamed
ahmed.elma...@gmail.com wrote:
Hello Sean,
I am
You are correct. In a previous post, I inquired about the use of
TreeClusteringRecommender which is based upon a UserSimilarity metrix. My
question was whether I can use it for ItemSimialrity, and your answer was
yes, just feed the itemID as a userID and vice versa and that's what I am
doing in it
Yes, but you can't use it as both things at once. I meant that you
swap them at the broadest level -- at your original input. So all
items are really users and vice versa. At the least you need two
separate implementations, encapsulating two different notions of
similarity.
Similarity is
Most of the clustering methods have non-map-reduce versions. Check out
the Display series of programs: DisplayKMeans etc. in the
mahout/example source code.
On Thu, Mar 22, 2012 at 8:41 AM, Jeff Eastman
j...@windwardsolutions.com wrote:
Most of the Mahout clustering algorithms have an -xm
17 matches
Mail list logo