Re: Error Running mahout-core-0.5-job.jar

2012-03-22 Thread tianwild
the correct Dmapred is -Dmapred.dir=output, not --Dmapred.dir=output -- View this message in context: http://lucene.472066.n3.nabble.com/Error-Running-mahout-core-0-5-job-jar-tp3846385p3847789.html Sent from the Mahout User List mailing list archive at Nabble.com.

Mahout beginner questions...

2012-03-22 Thread Razon, Oren
Hi, As a data mining developer who need to build a recommender engine POC (Proof Of Concept) to support several future use cases, I've found Mahout framework as an appealing place to start with. But as I'm new to Mahout and Hadoop in general I've a couple of questions... 1. In Mahout in

Re: Mahout beginner questions...

2012-03-22 Thread Sean Owen
1. These are the JDBC-related classes. For example see MySQLJDBCDiffStorage or MySQLJDBCDataModel in integration/ 2. The distributed and non-distributed code are quite separate. At this scale I don't think you can use the non-distributed code to a meaningful degree. For example you could

RE: Mahout beginner questions...

2012-03-22 Thread Razon, Oren
Hi Sean, Thanks for your fast response, I really appreciate the quality of your book (Mahout in action), and the support you give in such forums. Just to clear my second question... I want to build a recommender framework that will support different use cases. So my intention is to have both

Re: is hadoop necessary for clustering in mahout?

2012-03-22 Thread Ahmed Abdeen Hamed
Hi, I think I can answer this question... Yes, you can run a clustering algorithm on your local machine without using Hadoop. Just include the mahout jar files in your classpath and start using it as just another java library. I am currently experimenting with TreeClusteringRecommender but you

Re: is hadoop necessary for clustering in mahout?

2012-03-22 Thread Jeff Eastman
Most of the Mahout clustering algorithms have an -xm sequential CLI option that runs locally in-memory from/to Hadoop-style sequence files. And, as below, you can also call the Java driver methods directly from your program. On 3/22/12 9:22 AM, Ahmed Abdeen Hamed wrote: Hi, I think I can

Re: Mahout beginner questions...

2012-03-22 Thread Sean Owen
A distributed and non-distributed recommender are really quite separate. They perform the same task in quite different ways. I don't think you would mix them per se. Depends on what you mean by a model-based recommender... I would call the matrix-factorization-based and clustering-based

Re: Error Running mahout-core-0.5-job.jar

2012-03-22 Thread jeanbabyxu
Thanks so much tianwild for pointing out the typo. Now it's running but I got a different error msg: Exception in thread main org.apache.hadoop.mapred.FileAlreadyExistsException: Output directory temp/itemIDIndex already exists Any idea how to resolve this issue? Many thanks. -- View this

Re: Error Running mahout-core-0.5-job.jar

2012-03-22 Thread Sean Owen
That pretty much means what it says = delete temp. On Thu, Mar 22, 2012 at 6:06 PM, jeanbabyxu jessica...@aexp.com wrote: Thanks so much tianwild for pointing out the typo. Now it's running but I got a different error msg: Exception in thread main

Re: Error Running mahout-core-0.5-job.jar

2012-03-22 Thread Sean Owen
Yes. This prevents accidental overwrite, and mimics how Hadoop/HDFS generally act. On Thu, Mar 22, 2012 at 6:58 PM, jeanbabyxu jessica...@aexp.com wrote: I was able to manually clear out the output directory by using bin/hadoop dfs -rmr output. But do we have to remove all content in the

Re: Error Running mahout-core-0.5-job.jar

2012-03-22 Thread Paritosh Ranjan
You can also use HadoopUtil.delete(conf, paths) api or use the -ow (override) flag ( if available for that job). On 23-03-2012 00:28, jeanbabyxu wrote: I was able to manually clear out the output directory by using bin/hadoop dfs -rmr output. But do we have to remove all content in the

Re: How to add classes into mahout-score-0.5-job.jar?

2012-03-22 Thread Sean Owen
It is wherever you compiled your own classes -- it's up to you. SIMILARITY_EUCLEDEAN_DISTANCE is not a class. You should use 0.6 anyway. While you may find you have to make minor modifications if following the book, it's 99% compatible. On Thu, Mar 22, 2012 at 8:07 PM, jeanbabyxu

Re: Merging similarities from two different approaches

2012-03-22 Thread Sean Owen
What do you mean that you have a user-item association from a log-likelihood metric? Combining two values is easy in the sense that you can average them or something, but only if they are in the same units. Log likelihood may be viewed as a probability. The distance function you derive from it --

Re: Merging similarities from two different approaches

2012-03-22 Thread Sean Owen
You're implementing userSimilarity(), but appear to be computing item-item similarity. Halfway through, you use the item IDs as user IDs. I can't see what this is intending to do as a result? On Thu, Mar 22, 2012 at 9:33 PM, Ahmed Abdeen Hamed ahmed.elma...@gmail.com wrote: Hello Sean, I am

Re: Merging similarities from two different approaches

2012-03-22 Thread Ahmed Abdeen Hamed
You are correct. In a previous post, I inquired about the use of TreeClusteringRecommender which is based upon a UserSimilarity metrix. My question was whether I can use it for ItemSimialrity, and your answer was yes, just feed the itemID as a userID and vice versa and that's what I am doing in it

Re: Merging similarities from two different approaches

2012-03-22 Thread Sean Owen
Yes, but you can't use it as both things at once. I meant that you swap them at the broadest level -- at your original input. So all items are really users and vice versa. At the least you need two separate implementations, encapsulating two different notions of similarity. Similarity is

Re: is hadoop necessary for clustering in mahout?

2012-03-22 Thread Lance Norskog
Most of the clustering methods have non-map-reduce versions. Check out the Display series of programs: DisplayKMeans etc. in the mahout/example source code. On Thu, Mar 22, 2012 at 8:41 AM, Jeff Eastman j...@windwardsolutions.com wrote: Most of the Mahout clustering algorithms have an -xm