Check this line of log:
12/04/01 17:23:32 WARN mapreduce.Job: Error reading task output Serverreturned
HTTP response code: 400 for
URL:http://2668G1U:8080/tasklog?plaintext=trueattemptid=attempt_1333286828058_0009_r_00_0filter=stdoutHow
did you submit the job? It seems this node 2668G1U is
I think you also need to stop/kill the process which submit the RecommenderJob
to hadoop too.
RegardsRamon
Date: Mon, 2 Apr 2012 19:05:27 +0100
Subject: Re: Cancel running distributed RecommenderJob
From: sro...@gmail.com
To: user@mahout.apache.org
You can use the Hadoop interface itself
of the center elements to be infinity. We
check for that case so it is unlikely.
Can you narrow it down a bit more? How are you getting the kmeans prior?
By sampling input vectors (-k) or using Canopy? Are there any infinity
values in clusters-0?
On 3/15/12 10:11 PM, WangRamon wrote
Hi All I'm using the Canopy driver to find the cluster center points, the
mapred.child.java.opts parameter for Hadoop is set to 1024M, I'm processing
11000 records, I was supprised to got the Java heap space error during cluster,
did i miss something? Thanks. BTW, i did succeed for some
Here is the detail stack trace: 2012-03-15 09:51:40,817 INFO
org.apache.hadoop.mapred.ReduceTask: Merged 9 segments, 136745366 bytes to disk
to satisfy reduce memory limit
2012-03-15 09:51:40,818 INFO org.apache.hadoop.mapred.ReduceTask: Merging 1
files, 136745354 bytes from disk
2012-03-15
of:
a) The vector dimension is really large.
b) Too many clusters i.e. cluster size is very small.
On 15-03-2012 07:39, WangRamon wrote:
Here is the detail stack trace: 2012-03-15 09:51:40,817 INFO
org.apache.hadoop.mapred.ReduceTask: Merged 9 segments, 136745366 bytes to
disk
Hi All I'm tunning the cluster number of some news input with
CosineDistanceMeasure, the input data is about 11000 rows, so i tried different
settings for t1 and t2, here is a list: 1) with t1: 0.6 t2: 0.9, i got Reduce
output records=60 2) with t1: 0.6 t2: 0.8, i got Reduce output
, CompositeInputFormat was
being used as input, which used to fix the block size to 64 MB, and
hence, only few reducers were activated. So, trying different block
sizes might give some clue.
On 11-03-2012 11:04, WangRamon wrote:
Here is the configuration: property
Hi All I submit 5 K-Means Jobs simultaneously, my Hadoop cluster have 42 map
and 42 reduce slots configured, I set the default reduce task per job as 73 (42
* 1.75), I find there are always about 12 of the reduce tasks are running at
any time although there are 73 reduce tasks created for
:
Can you run K-means jobs again ( all with the same block size ) and give
same statistics for :
a) only 1 job running
b) 2 jobs running simultaneously
c) 5 jobs running simultaneously
On 10-03-2012 21:08, WangRamon wrote:
Hi All I submit 5 K-Means Jobs simultaneously, my Hadoop
cluster
What's your Hadoop config in terms of the maximum number of reducers?
It's a function of your available RAM on each node and numbers of nodes.
On 3/10/12 8:55 PM, WangRamon wrote:
Hi ParitoshI did the tests with 1 job and 5 jobs, they all have the
same problem, the job i'm running
Hi All Dose mahout provide a user based CF implementation on Hadoop? Currently
i only see an item based hadoop implementations. Thanks. CheersRamon
approach is usually both faster and more accurate.
--sebastian
On 10.11.2011 08:34, WangRamon wrote:
Hi All Dose mahout provide a user based CF implementation on Hadoop?
Currently i only see an item based hadoop implementations. Thanks.
CheersRamon
Hi Mahout users I'm evaluating the cluster feature and reading AbstractCluster
class, can you tell me where can i find more documents/explaination about the
observe methods in this class and the s0, s1, s2 parameters, thanks in
advance. Cheers Ramon
which cluster after running
KMeansClusterer
Transform your vector in a NamedVector.
On 04-11-2011 08:02, WangRamon wrote:
OK, me again, I checked the KMeansDriver code for output points
information, following is the code: MapText, Text props = new
HashMapText, Text
Subject: Re: How to find which point belongs which cluster after running
KMeansClusterer
From: gsing...@apache.org
Date: Fri, 4 Nov 2011 06:49:49 -0400
To: user@mahout.apache.org
On Nov 4, 2011, at 3:28 AM, WangRamon wrote:
Thanks, that's what i need. I have another question
Hi All I'm reading the code of SquaredEuclideanDistanceMeasure, the
distance(double centroidLengthSquare, Vector centroid, Vector v) method
confused me a lot, i don't know why we choose this expression
centroidLengthSquare - 2 * v.dot(centroid) + v.getLengthSquared() to
calculate the
On 04.11.2011 15:58, WangRamon wrote:
Hi All I'm reading the code of SquaredEuclideanDistanceMeasure, the
distance(double centroidLengthSquare, Vector centroid, Vector v) method
confused me a lot, i don't know why we choose this expression
centroidLengthSquare - 2 * v.dot(centroid
Hi All I'm using KMeansClusterer, I will use KMeansDriver on a Hadoop
environment later, but I think it will be easy to understand it by using
KMeansClusterer, OK, so the question is i cannot find a way to find the cluster
a point should belong to after running KMeansClusterer, I expect I
to find which point belongs which cluster after running
KMeansClusterer
Pe 03.11.2011 10:53, WangRamon a scris:
Hi All I'm using KMeansClusterer, I will use KMeansDriver on a Hadoop
environment later, but I think it will be easy to understand it by using
KMeansClusterer, OK, so
was what I needed.
Hope this helps.
Regards,
Paritosh
On 03-11-2011 14:23, WangRamon wrote:
Hi All I'm using KMeansClusterer, I will use KMeansDriver on a Hadoop
environment later, but I think it will be easy to understand it by using
KMeansClusterer, OK, so the question is i
]: Point:\n\t);
for (IteratorWeightedVectorWritable iterator = points.iterator();
iterator.hasNext(); ) {
WeightedVectorWritable point = iterator.next();
writer.write(String.valueOf(point.getWeight()));
On Nov 3, 2011, at 5:48 AM, WangRamon wrote:
Yes, Paritosh
, WangRamon wrote:
Hi Sebastian I made the following change to resolve the issue in my local,
it's in Mahout 0.5, maybe i were wrong, but the test result is correct: 1)
I add a int itemIdIndex property with getter/setter methods in class
PrefAndSimilarityColumnWritable, it will hold
Hi All I have a use case which is trying to find the most similarity users. So
the input data is still in format [USER_ID, ITEM_ID, PREF], I think I can use
ItemSimilarityJob in Mahout 0.6, although the name is Item Similarity, by
reading the code I find if I change the input to be
preference, is that correct?
As I already said multiple times, please use Mahout 0.6. It contains bug
fixes and performance improvements for this particular job.
--sebastian
On 21.10.2011 09:04, WangRamon wrote:
Hi Sebastian I made the following change to resolve the issue in my local
Hi Guys I finished running the RecommenderJob today on the two nodes cluster,
finally. But what suprised me is that the final recommend output of the
RecommenderJob contains item which user has already given preference, I'm not
sure is that correct? If it was wrong how can I resolve this
guys.
Cheers Ramon
Date: Thu, 20 Oct 2011 13:59:28 +0100
Subject: Re: Recommend result contains item which user has already given
preference, is that correct?
From: sro...@gmail.com
To: user@mahout.apache.org
Ah OK, figured as much. WangRamon does that answer your question
and/or can you
, am i right?
Thanks
Ramon
Date: Thu, 20 Oct 2011 17:04:20 +0200
From: s...@apache.org
To: user@mahout.apache.org
Subject: Re: Recommend result contains item which user has already given
preference, is that correct?
On 20.10.2011 16:57, WangRamon wrote:
Hi Sebastian and Sean
Thanks
Hi Guys I'm continuing running the test case with a 1GB data file which
contains 60 users and 200 items, all the jobs are running in a 2 two
nodes cluster, each node has 32GB RAM and 8 core CPU, the RecommenderJob
running until it reach
to your input data. You
should also try to get access to a cluster with more than 2 machines.
--sebastian
On 19.10.2011 10:16, WangRamon wrote:
Hi Guys I'm continuing running the test case with a 1GB data file which
contains 60 users and 200 items, all the jobs
and believe me that the best thing is to move to 0.6 immediately.
--sebastian
On 19.10.2011 10:27, WangRamon wrote:
Yes, I'm still using version 0.5, the plan is to verify it can work on 0.5
and get some benchmark first then moving forward to 0.6, Sebastian, do you
think it's a problem
should save yourself some time
and believe me that the best thing is to move to 0.6 immediately.
--sebastian
On 19.10.2011 10:27, WangRamon wrote:
Yes, I'm still using version 0.5, the plan is to verify it can work on 0.5
and get some benchmark first then moving forward to 0.6
. You may be seeing such a bug.
Sent from my iPhone
On Oct 19, 2011, at 2:27, WangRamon ramon_w...@hotmail.com wrote:
Yes, I'm still using version 0.5, the plan is to verify it can work on 0.5
and get some benchmark first then moving forward to 0.6, Sebastian, do you
think it's
Hi All I was told today that Spark is a much better platform for cluster
computing, better than Hadoop at least at Recommendation computing way, I'm
still very new at this area, if anyone has done some investigation on Spark,
can you please share your idea here, thank you very much. Thanks
Hi All I'm running a recommend job on a Hadoop environment with about 60
users and 200 items, the total user-pref records is about 6626, the
data file is of 1GB size. I found the
RowSimilarityJob-CooccurrencesMapper-SimilarityReducer job is very slow, and
get a lot of logs like
, WangRamon wrote:
Hi All I'm running a recommend job on a Hadoop environment with about
60 users and 200 items, the total user-pref records is about
6626, the data file is of 1GB size. I found the
RowSimilarityJob-CooccurrencesMapper-SimilarityReducer job is very
conf.setInt(io.sort.mb, Math.min(assumedHeapSize / 2, 1024));
// For some reason the Merger doesn't report status for a long
time; increase
// timeout when running these jobs
conf.setInt(mapred.task.timeout, 60 * 60 * 1000);
}
2011/10/18 WangRamon ramon_w...@hotmail.com
.
2011/10/18 WangRamon ramon_w...@hotmail.com:
Hi Sebastian
Thanks for your quick reply.
As far as i know latest Mahout release is: Mahout 0.5. Mahout 0.6 is still
under development, please correct me if i were wrong, so i'm not sure can
i use Mahout 0.6 in a product
data, and
then compute the value of that withheld data and compare with the
original.
2011/10/17 WangRamon ramon_w...@hotmail.com:
Hi Guys
We're going to evaluate how good a distributed (on Hadoop) recommender is,
i found Mahout provides some stand alone implementation
as a starting point for implementing evaluation of
RecommenderJob.
--sebastian
On 17.10.2011 09:39, WangRamon wrote:
Hi Sean Do you mean that I should take the concept from the standalone one,
keep some real data, let's say 20% of all data, do recommend computation on
the other 80
for top items. You may not see an
estimate for the items in your test set.
Yes also look at Sebastian's suggestion.
2011/10/17 WangRamon ramon_w...@hotmail.com:
Hi Sean Thanks for the quick reply. I'm running
org.apache.mahout.cf.taste.hadoop.item.RecommenderJob on Hadoop, actually
Recommender
running on Hadoop?
From: sro...@gmail.com
To: user@mahout.apache.org
Yes that's actually probably a easy and quick way to get what you want.
2011/10/17 WangRamon ramon_w...@hotmail.com:
Hi Sean
It seems in order to get the estimated pref values for compartion in a
distributed
Hi Guys
We're going to evaluate how good a distributed (on Hadoop) recommender is, i
found Mahout provides some stand alone implementation to evaluate a
recommender, so is there a distributed implementation we can use in a Hadoop
environment, thanks a lot.
BTW, if there is not such an
43 matches
Mail list logo