RE: Mahout's K-means returns error when processing output/clusters-2

2012-04-02 Thread WangRamon
Check this line of log: 12/04/01 17:23:32 WARN mapreduce.Job: Error reading task output Serverreturned HTTP response code: 400 for URL:http://2668G1U:8080/tasklog?plaintext=trueattemptid=attempt_1333286828058_0009_r_00_0filter=stdoutHow did you submit the job? It seems this node 2668G1U is

RE: Cancel running distributed RecommenderJob

2012-04-02 Thread WangRamon
I think you also need to stop/kill the process which submit the RecommenderJob to hadoop too. RegardsRamon Date: Mon, 2 Apr 2012 19:05:27 +0100 Subject: Re: Cancel running distributed RecommenderJob From: sro...@gmail.com To: user@mahout.apache.org You can use the Hadoop interface itself

RE: Why there is Infinity values for the vector of a K-Means cluster center point?

2012-03-16 Thread WangRamon
of the center elements to be infinity. We check for that case so it is unlikely. Can you narrow it down a bit more? How are you getting the kmeans prior? By sampling input vectors (-k) or using Canopy? Are there any infinity values in clusters-0? On 3/15/12 10:11 PM, WangRamon wrote

Canopy Job failed processing, Error: Java heap space

2012-03-14 Thread WangRamon
Hi All I'm using the Canopy driver to find the cluster center points, the mapred.child.java.opts parameter for Hadoop is set to 1024M, I'm processing 11000 records, I was supprised to got the Java heap space error during cluster, did i miss something? Thanks. BTW, i did succeed for some

RE: Canopy Job failed processing, Error: Java heap space

2012-03-14 Thread WangRamon
Here is the detail stack trace: 2012-03-15 09:51:40,817 INFO org.apache.hadoop.mapred.ReduceTask: Merged 9 segments, 136745366 bytes to disk to satisfy reduce memory limit 2012-03-15 09:51:40,818 INFO org.apache.hadoop.mapred.ReduceTask: Merging 1 files, 136745354 bytes from disk 2012-03-15

RE: Canopy Job failed processing, Error: Java heap space

2012-03-14 Thread WangRamon
of: a) The vector dimension is really large. b) Too many clusters i.e. cluster size is very small. On 15-03-2012 07:39, WangRamon wrote: Here is the detail stack trace: 2012-03-15 09:51:40,817 INFO org.apache.hadoop.mapred.ReduceTask: Merged 9 segments, 136745366 bytes to disk

What will be a better value for T1 and T2 of a CosineDistanceMeasure

2012-03-14 Thread WangRamon
Hi All I'm tunning the cluster number of some news input with CosineDistanceMeasure, the input data is about 11000 rows, so i tried different settings for t1 and t2, here is a list: 1) with t1: 0.6 t2: 0.9, i got Reduce output records=60 2) with t1: 0.6 t2: 0.8, i got Reduce output

RE: Not all Mapper/Reducer slots are taken when running K-Means cluster

2012-03-11 Thread WangRamon
, CompositeInputFormat was being used as input, which used to fix the block size to 64 MB, and hence, only few reducers were activated. So, trying different block sizes might give some clue. On 11-03-2012 11:04, WangRamon wrote: Here is the configuration: property

Not all Mapper/Reducer slots are taken when running K-Means cluster

2012-03-10 Thread WangRamon
Hi All I submit 5 K-Means Jobs simultaneously, my Hadoop cluster have 42 map and 42 reduce slots configured, I set the default reduce task per job as 73 (42 * 1.75), I find there are always about 12 of the reduce tasks are running at any time although there are 73 reduce tasks created for

RE: Not all Mapper/Reducer slots are taken when running K-Means cluster

2012-03-10 Thread WangRamon
: Can you run K-means jobs again ( all with the same block size ) and give same statistics for : a) only 1 job running b) 2 jobs running simultaneously c) 5 jobs running simultaneously On 10-03-2012 21:08, WangRamon wrote: Hi All I submit 5 K-Means Jobs simultaneously, my Hadoop

RE: Not all Mapper/Reducer slots are taken when running K-Means cluster

2012-03-10 Thread WangRamon
cluster What's your Hadoop config in terms of the maximum number of reducers? It's a function of your available RAM on each node and numbers of nodes. On 3/10/12 8:55 PM, WangRamon wrote: Hi ParitoshI did the tests with 1 job and 5 jobs, they all have the same problem, the job i'm running

User based CF

2011-11-09 Thread WangRamon
Hi All Dose mahout provide a user based CF implementation on Hadoop? Currently i only see an item based hadoop implementations. Thanks. CheersRamon

RE: User based CF

2011-11-09 Thread WangRamon
approach is usually both faster and more accurate. --sebastian On 10.11.2011 08:34, WangRamon wrote: Hi All Dose mahout provide a user based CF implementation on Hadoop? Currently i only see an item based hadoop implementations. Thanks. CheersRamon

Method observe in AbstractCluster

2011-11-05 Thread WangRamon
Hi Mahout users I'm evaluating the cluster feature and reading AbstractCluster class, can you tell me where can i find more documents/explaination about the observe methods in this class and the s0, s1, s2 parameters, thanks in advance. Cheers Ramon

RE: How to find which point belongs which cluster after running KMeansClusterer

2011-11-04 Thread WangRamon
which cluster after running KMeansClusterer Transform your vector in a NamedVector. On 04-11-2011 08:02, WangRamon wrote: OK, me again, I checked the KMeansDriver code for output points information, following is the code: MapText, Text props = new HashMapText, Text

RE: How to find which point belongs which cluster after running KMeansClusterer

2011-11-04 Thread WangRamon
Subject: Re: How to find which point belongs which cluster after running KMeansClusterer From: gsing...@apache.org Date: Fri, 4 Nov 2011 06:49:49 -0400 To: user@mahout.apache.org On Nov 4, 2011, at 3:28 AM, WangRamon wrote: Thanks, that's what i need. I have another question

Can anybody explain the distance method in SquaredEuclideanDistanceMeasure?

2011-11-04 Thread WangRamon
Hi All I'm reading the code of SquaredEuclideanDistanceMeasure, the distance(double centroidLengthSquare, Vector centroid, Vector v) method confused me a lot, i don't know why we choose this expression centroidLengthSquare - 2 * v.dot(centroid) + v.getLengthSquared() to calculate the

RE: Can anybody explain the distance method in SquaredEuclideanDistanceMeasure?

2011-11-04 Thread WangRamon
On 04.11.2011 15:58, WangRamon wrote: Hi All I'm reading the code of SquaredEuclideanDistanceMeasure, the distance(double centroidLengthSquare, Vector centroid, Vector v) method confused me a lot, i don't know why we choose this expression centroidLengthSquare - 2 * v.dot(centroid

How to find which point belongs which cluster after running KMeansClusterer

2011-11-03 Thread WangRamon
Hi All I'm using KMeansClusterer, I will use KMeansDriver on a Hadoop environment later, but I think it will be easy to understand it by using KMeansClusterer, OK, so the question is i cannot find a way to find the cluster a point should belong to after running KMeansClusterer, I expect I

RE: How to find which point belongs which cluster after running KMeansClusterer

2011-11-03 Thread WangRamon
to find which point belongs which cluster after running KMeansClusterer Pe 03.11.2011 10:53, WangRamon a scris: Hi All I'm using KMeansClusterer, I will use KMeansDriver on a Hadoop environment later, but I think it will be easy to understand it by using KMeansClusterer, OK, so

RE: How to find which point belongs which cluster after running KMeansClusterer

2011-11-03 Thread WangRamon
was what I needed. Hope this helps. Regards, Paritosh On 03-11-2011 14:23, WangRamon wrote: Hi All I'm using KMeansClusterer, I will use KMeansDriver on a Hadoop environment later, but I think it will be easy to understand it by using KMeansClusterer, OK, so the question is i

RE: How to find which point belongs which cluster after running KMeansClusterer

2011-11-03 Thread WangRamon
]: Point:\n\t); for (IteratorWeightedVectorWritable iterator = points.iterator(); iterator.hasNext(); ) { WeightedVectorWritable point = iterator.next(); writer.write(String.valueOf(point.getWeight())); On Nov 3, 2011, at 5:48 AM, WangRamon wrote: Yes, Paritosh

RE: Recommend result contains item which user has already given preference, is that correct?

2011-10-30 Thread WangRamon
, WangRamon wrote: Hi Sebastian I made the following change to resolve the issue in my local, it's in Mahout 0.5, maybe i were wrong, but the test result is correct: 1) I add a int itemIdIndex property with getter/setter methods in class PrefAndSimilarityColumnWritable, it will hold

Get the most similar users with ItemSimilarityJob.

2011-10-30 Thread WangRamon
Hi All I have a use case which is trying to find the most similarity users. So the input data is still in format [USER_ID, ITEM_ID, PREF], I think I can use ItemSimilarityJob in Mahout 0.6, although the name is Item Similarity, by reading the code I find if I change the input to be

RE: Recommend result contains item which user has already given preference, is that correct?

2011-10-21 Thread WangRamon
preference, is that correct? As I already said multiple times, please use Mahout 0.6. It contains bug fixes and performance improvements for this particular job. --sebastian On 21.10.2011 09:04, WangRamon wrote: Hi Sebastian I made the following change to resolve the issue in my local

Recommend result contains item which user has already given preference, is that correct?

2011-10-20 Thread WangRamon
Hi Guys I finished running the RecommenderJob today on the two nodes cluster, finally. But what suprised me is that the final recommend output of the RecommenderJob contains item which user has already given preference, I'm not sure is that correct? If it was wrong how can I resolve this

RE: Recommend result contains item which user has already given preference, is that correct?

2011-10-20 Thread WangRamon
guys. Cheers Ramon Date: Thu, 20 Oct 2011 13:59:28 +0100 Subject: Re: Recommend result contains item which user has already given preference, is that correct? From: sro...@gmail.com To: user@mahout.apache.org Ah OK, figured as much. WangRamon does that answer your question and/or can you

RE: Recommend result contains item which user has already given preference, is that correct?

2011-10-20 Thread WangRamon
, am i right? Thanks Ramon Date: Thu, 20 Oct 2011 17:04:20 +0200 From: s...@apache.org To: user@mahout.apache.org Subject: Re: Recommend result contains item which user has already given preference, is that correct? On 20.10.2011 16:57, WangRamon wrote: Hi Sebastian and Sean Thanks

Exception during running RowSimilarityJob-Mapper-EntriesToVectorsReducer job

2011-10-19 Thread WangRamon
Hi Guys I'm continuing running the test case with a 1GB data file which contains 60 users and 200 items, all the jobs are running in a 2 two nodes cluster, each node has 32GB RAM and 8 core CPU, the RecommenderJob running until it reach

RE: Exception during running RowSimilarityJob-Mapper-EntriesToVectorsReducer job

2011-10-19 Thread WangRamon
to your input data. You should also try to get access to a cluster with more than 2 machines. --sebastian On 19.10.2011 10:16, WangRamon wrote: Hi Guys I'm continuing running the test case with a 1GB data file which contains 60 users and 200 items, all the jobs

RE: Exception during running RowSimilarityJob-Mapper-EntriesToVectorsReducer job

2011-10-19 Thread WangRamon
and believe me that the best thing is to move to 0.6 immediately. --sebastian On 19.10.2011 10:27, WangRamon wrote: Yes, I'm still using version 0.5, the plan is to verify it can work on 0.5 and get some benchmark first then moving forward to 0.6, Sebastian, do you think it's a problem

RE: Exception during running RowSimilarityJob-Mapper-EntriesToVectorsReducer job

2011-10-19 Thread WangRamon
should save yourself some time and believe me that the best thing is to move to 0.6 immediately. --sebastian On 19.10.2011 10:27, WangRamon wrote: Yes, I'm still using version 0.5, the plan is to verify it can work on 0.5 and get some benchmark first then moving forward to 0.6

RE: Exception during running RowSimilarityJob-Mapper-EntriesToVectorsReducer job

2011-10-19 Thread WangRamon
. You may be seeing such a bug. Sent from my iPhone On Oct 19, 2011, at 2:27, WangRamon ramon_w...@hotmail.com wrote: Yes, I'm still using version 0.5, the plan is to verify it can work on 0.5 and get some benchmark first then moving forward to 0.6, Sebastian, do you think it's

Has anyone tried Spark with Mahout?

2011-10-19 Thread WangRamon
Hi All I was told today that Spark is a much better platform for cluster computing, better than Hadoop at least at Recommendation computing way, I'm still very new at this area, if anyone has done some investigation on Spark, can you please share your idea here, thank you very much. Thanks

Any general performance tips for job RowSimilarityJob-CooccurrencesMapper-SimilarityReducer?

2011-10-18 Thread WangRamon
Hi All I'm running a recommend job on a Hadoop environment with about 60 users and 200 items, the total user-pref records is about 6626, the data file is of 1GB size. I found the RowSimilarityJob-CooccurrencesMapper-SimilarityReducer job is very slow, and get a lot of logs like

RE: Any general performance tips for job RowSimilarityJob-CooccurrencesMapper-SimilarityReducer?

2011-10-18 Thread WangRamon
, WangRamon wrote: Hi All I'm running a recommend job on a Hadoop environment with about 60 users and 200 items, the total user-pref records is about 6626, the data file is of 1GB size. I found the RowSimilarityJob-CooccurrencesMapper-SimilarityReducer job is very

RE: Any general performance tips for job RowSimilarityJob-CooccurrencesMapper-SimilarityReducer?

2011-10-18 Thread WangRamon
conf.setInt(io.sort.mb, Math.min(assumedHeapSize / 2, 1024)); // For some reason the Merger doesn't report status for a long time; increase // timeout when running these jobs conf.setInt(mapred.task.timeout, 60 * 60 * 1000); } 2011/10/18 WangRamon ramon_w...@hotmail.com

RE: Any general performance tips for job RowSimilarityJob-CooccurrencesMapper-SimilarityReducer?

2011-10-18 Thread WangRamon
. 2011/10/18 WangRamon ramon_w...@hotmail.com: Hi Sebastian Thanks for your quick reply. As far as i know latest Mahout release is: Mahout 0.5. Mahout 0.6 is still under development, please correct me if i were wrong, so i'm not sure can i use Mahout 0.6 in a product

RE: Does Mahout provide a way to evaluate a distributed Recommender running on Hadoop?

2011-10-17 Thread WangRamon
data, and then compute the value of that withheld data and compare with the original. 2011/10/17 WangRamon ramon_w...@hotmail.com: Hi Guys We're going to evaluate how good a distributed (on Hadoop) recommender is, i found Mahout provides some stand alone implementation

RE: Does Mahout provide a way to evaluate a distributed Recommender running on Hadoop?

2011-10-17 Thread WangRamon
as a starting point for implementing evaluation of RecommenderJob. --sebastian On 17.10.2011 09:39, WangRamon wrote: Hi Sean Do you mean that I should take the concept from the standalone one, keep some real data, let's say 20% of all data, do recommend computation on the other 80

RE: Does Mahout provide a way to evaluate a distributed Recommender running on Hadoop?

2011-10-17 Thread WangRamon
for top items. You may not see an estimate for the items in your test set. Yes also look at Sebastian's suggestion. 2011/10/17 WangRamon ramon_w...@hotmail.com: Hi Sean Thanks for the quick reply. I'm running org.apache.mahout.cf.taste.hadoop.item.RecommenderJob on Hadoop, actually

RE: Does Mahout provide a way to evaluate a distributed Recommender running on Hadoop?

2011-10-17 Thread WangRamon
Recommender running on Hadoop? From: sro...@gmail.com To: user@mahout.apache.org Yes that's actually probably a easy and quick way to get what you want. 2011/10/17 WangRamon ramon_w...@hotmail.com: Hi Sean It seems in order to get the estimated pref values for compartion in a distributed

Does Mahout provide a way to evaluate a distributed Recommender running on Hadoop?

2011-10-16 Thread WangRamon
Hi Guys We're going to evaluate how good a distributed (on Hadoop) recommender is, i found Mahout provides some stand alone implementation to evaluate a recommender, so is there a distributed implementation we can use in a Hadoop environment, thanks a lot. BTW, if there is not such an