Hi Sean.

I'll give those suggestions a try and let you know how it went.

Thanks,
Kris



2011/4/11 Sean Owen <sro...@gmail.com>

> Yes, the combiner phase can be a solid bottleneck in some cases here.
> It helps to increase both io.sort.factor and io.sort.mb to make it do
> a, say, 100-way merge instead of 10 and use maybe half your worker's
> heap for spills in memory rather than the default 100MB.
>
> RecommenderJob already does this on the aggregate-and-recommend step
> -- see that call to "setIOSort()". You could apply the same call to
> this step and see what happens. If it's helpful we can commit that.
>
> There are a load of factors involved in tuning this and yes it seems
> like this is running way slower than it should. Having less than 1
> worker per machine can't be optimal I think, so you may  have to
> tinker with that setting too.
>
> I'd also up the child worker's memory if you have 24GB RAM on each
> machine. That, plus the change to io.sort.mb, lets it do much more
> merging in memory.
>
>
>
> On Mon, Apr 11, 2011 at 2:49 PM, Kris Jack <mrkrisj...@gmail.com> wrote:
> > Sure.  I called the mahout 0.5 job with these parameters:
> >
> > $ jar mahout-core-0.5-SNAPSHOT-job.jar
> > org.apache.mahout.cf.taste.hadoop.item.RecommenderJob
> > -Dmapred.input.dir=/user/kjack/input/
> > -Dmapred.output.dir=/user/kjack/output/ --similarityClassname
> >
> org.apache.mahout.math.hadoop.similarity.vector.DistributedUncenteredCosineVectorSimilarity
> > --booleanData
> >
> > The data set is quite sparse.  ~3M items have been rated by at least 2
> > users.  ~50K items have been rated by at least 10 users.  ~500 items have
> > been rated by at least 50 users.
> >
> > Running on a cluster with 15 nodes, 24GB ram each.
> >
> > It's not that the job didn't finish, it's just that it ran very slowly.
>  By
> > letting Hadoop assign mappers and reducers automatically, only 9 mappers
> > were assigned and 1 reducer.  I might have to up these too.
> >
> > I looked at the output from the MaybePruneRowsMapper and
> CooccurrencesMapper
> > jobs and can't find any counters with those names.  Just got the basic
> ones
> > (e.g. FILE_BYTES_READ, HDFS_BYTES_READ, FILE_BYTES_WRITTEN, etc.).  I'm
> > looking at the counters from the job output through the jobtracker.  Am I
> > looking in the wrong place?
> >
> > Thanks,
> > Kris
> >
> >
> >
> > 2011/4/11 Sebastian Schelter <s...@apache.org>
> >
> >> Could you give us some more details? Like all the parameters you called
> the
> >> job with and some hints about the cluster you ran this on? And some
> hints
> >> about your data, especially how dense/sparse it is?
> >>
> >> My first suspicion would be that much too much items need to be compared
> >> with each other, are using Mahout 0.5 (you should because 0.4 has a bug
> in
> >> this job) and can you give us the values of the following counters?
> >>
> >>  - MaybePruneRowsMapper Elements.USED and ELEMENTS.NEGLECTED
> >>  - CooccurrencesMapper Counter.COOCCURRENCES
> >>
> >> --sebastian
> >>
> >>
> >>
> >> On 11.04.2011 14:17, Kris Jack wrote:
> >>
> >>> Hi everyone,
> >>>
> >>> I'm running the RecommenderJob with a fairly large dataset (e.g. 175K+
> >>> users, 3M+ items, 40M+ preferences).
> >>>
> >>> When it gets to the
> RowSimilarityJob-CooccurrencesMapper-SimilarityReducer
> >>> job, it takes a long time to run.
> >>>
> >>> I think that the main problem is that there's a lot of spilling going
> on
> >>> when records are full.  Here's extract from the logs:
> >>>
> >>>
> >>> 2011-04-01 21:53:24,245 WARN org.apache.hadoop.conf.Configuration:
> >>>
> >>>
> /srv/hadoop/disk3/cache/mapred/taskTracker/jobcache/job_201103291618_0156/attempt_201103291618_0156_m_000000_1/job.xml:a
> >>> attempt to override final parameter: mapred.local.dir;  Ignoring.
> >>> 2011-04-01 21:53:25,630 INFO org.apache.hadoop.metrics.jvm.JvmMetrics:
> >>> Initializing JVM Metrics with processName=MAP, sessionId=
> >>> 2011-04-01 21:53:25,792 WARN org.apache.hadoop.conf.Configuration:
> >>>
> >>>
> /srv/hadoop/disk3/cache/mapred/taskTracker/jobcache/job_201103291618_0156/attempt_201103291618_0156_m_000000_1/job.xml:a
> >>> attempt to override final parameter: mapred.local.dir;  Ignoring.
> >>> 2011-04-01 21:53:25,968 INFO org.apache.hadoop.mapred.MapTask:
> io.sort.mb
> >>> = 100
> >>> 2011-04-01 21:53:26,040 INFO org.apache.hadoop.mapred.MapTask: data
> >>> buffer = 79691776/99614720
> >>> 2011-04-01 21:53:26,041 INFO org.apache.hadoop.mapred.MapTask: record
> >>> buffer = 262144/327680
> >>> 2011-04-01 21:53:26,757 INFO org.apache.hadoop.mapred.MapTask:
> >>> Spilling map output: record full = true
> >>> 2011-04-01 21:53:26,757 INFO org.apache.hadoop.mapred.MapTask:
> >>> bufstart = 0; bufend = 10492787; bufvoid = 99614720
> >>> 2011-04-01 21:53:26,757 INFO org.apache.hadoop.mapred.MapTask: kvstart
> >>> = 0; kvend = 262144; length = 327680
> >>> 2011-04-01 21:53:31,420 INFO org.apache.hadoop.util.NativeCodeLoader:
> >>> Loaded the native-hadoop library
> >>> 2011-04-01 21:53:31,422 INFO
> >>> org.apache.hadoop.io.compress.zlib.ZlibFactory: Successfully loaded&
> >>> initialized native-zlib library
> >>> 2011-04-01 21:53:31,424 INFO org.apache.hadoop.io.compress.CodecPool:
> >>> Got brand-new compressor
> >>> 2011-04-01 21:53:32,747 INFO org.apache.hadoop.mapred.MapTask: Finished
> >>> spill 0
> >>> 2011-04-01 21:53:33,026 INFO org.apache.hadoop.mapred.MapTask:
> >>> Spilling map output: record full = true
> >>> 2011-04-01 21:53:33,026 INFO org.apache.hadoop.mapred.MapTask:
> >>> bufstart = 10492787; bufend = 21060166; bufvoid = 99614720
> >>> 2011-04-01 21:53:33,026 INFO org.apache.hadoop.mapred.MapTask: kvstart
> >>> = 262144; kvend = 196607; length = 327680
> >>> 2011-04-01 21:53:40,004 INFO org.apache.hadoop.mapred.MapTask: Finished
> >>> spill 1
> >>> 2011-04-01 21:53:40,235 INFO org.apache.hadoop.mapred.MapTask:
> >>> Spilling map output: record full = true
> >>> 2011-04-01 21:53:40,235 INFO org.apache.hadoop.mapred.MapTask:
> >>> bufstart = 21060166; bufend = 31021600; bufvoid = 99614720
> >>> 2011-04-01 21:53:40,235 INFO org.apache.hadoop.mapred.MapTask: kvstart
> >>> = 196607; kvend = 131070; length = 327680
> >>> 2011-04-01 21:53:54,636 INFO org.apache.hadoop.mapred.MapTask: Finished
> >>> spill 2
> >>> 2011-04-01 21:53:54,921 INFO org.apache.hadoop.mapred.MapTask:
> >>> Spilling map output: record full = true
> >>> 2011-04-01 21:53:54,922 INFO org.apache.hadoop.mapred.MapTask:
> >>> bufstart = 31021600; bufend = 41605937; bufvoid = 99614720
> >>> 2011-04-01 21:53:54,922 INFO org.apache.hadoop.mapred.MapTask: kvstart
> >>> = 131070; kvend = 65533; length = 327680
> >>> 2011-04-01 21:54:04,370 INFO org.apache.hadoop.mapred.MapTask: Finished
> >>> spill 3
> >>> 2011-04-01 21:54:04,667 INFO org.apache.hadoop.mapred.MapTask:
> >>> Spilling map output: record full = true
> >>> 2011-04-01 21:54:04,667 INFO org.apache.hadoop.mapred.MapTask:
> >>> bufstart = 41605937; bufend = 52550080; bufvoid = 99614720
> >>> 2011-04-01 21:54:04,667 INFO org.apache.hadoop.mapred.MapTask: kvstart
> >>> = 65533; kvend = 327677; length = 327680
> >>> 2011-04-01 21:54:17,717 INFO org.apache.hadoop.mapred.MapTask: Finished
> >>> spill 4
> >>> 2011-04-01 21:54:17,997 INFO org.apache.hadoop.mapred.MapTask:
> >>> Spilling map output: record full = true
> >>> 2011-04-01 21:54:17,997 INFO org.apache.hadoop.mapred.MapTask:
> >>> bufstart = 52550080; bufend = 63431004; bufvoid = 99614720
> >>> 2011-04-01 21:54:17,997 INFO org.apache.hadoop.mapred.MapTask: kvstart
> >>> = 327677; kvend = 262140; length = 327680
> >>> 2011-04-01 21:54:24,260 INFO org.apache.hadoop.mapred.MapTask: Finished
> >>> spill 5
> >>>
> >>> .
> >>>
> >>>
> >>> .
> >>>
> >>> .
> >>>
> >>>
> >>> 2011-04-06 11:18:09,932 INFO org.apache.hadoop.mapred.MapTask:
> >>> Finished spill 42207
> >>> 2011-04-06 11:18:10,250 INFO org.apache.hadoop.mapred.MapTask:
> >>> Spilling map output: record full = true
> >>> 2011-04-06 11:18:10,250 INFO org.apache.hadoop.mapred.MapTask:
> >>> bufstart = 53614451; bufend = 64698505; bufvoid = 99614720
> >>> 2011-04-06 11:18:10,250 INFO org.apache.hadoop.mapred.MapTask: kvstart
> >>> = 97306; kvend = 31769; length = 327680
> >>> 2011-04-06 11:18:18,905 INFO org.apache.hadoop.mapred.MapTask:
> >>> Finished spill 42208
> >>> 2011-04-06 11:18:19,193 INFO org.apache.hadoop.mapred.MapTask:
> >>> Spilling map output: record full = true
> >>> 2011-04-06 11:18:19,193 INFO org.apache.hadoop.mapred.MapTask:
> >>> bufstart = 64698505; bufend = 75588890; bufvoid = 99614720
> >>> 2011-04-06 11:18:19,193 INFO org.apache.hadoop.mapred.MapTask: kvstart
> >>> = 31769; kvend = 293913; length = 327680
> >>> 2011-04-06 11:18:32,523 INFO org.apache.hadoop.mapred.MapTask:
> >>> Finished spill 42209
> >>> 2011-04-06 11:18:32,683 INFO org.apache.hadoop.mapred.MapTask:
> >>> Starting flush of map output
> >>> 2011-04-06 11:18:35,894 INFO org.apache.hadoop.mapred.MapTask:
> >>> Finished spill 42210
> >>> 2011-04-06 11:19:11,530 INFO org.apache.hadoop.mapred.Merger: Merging
> >>> 42211 sorted segments
> >>> 2011-04-06 11:19:11,595 INFO org.apache.hadoop.io.compress.CodecPool:
> >>> Got brand-new decompressor
> >>> 2011-04-06 11:19:11,605 INFO org.apache.hadoop.io.compress.CodecPool:
> >>> Got brand-new decompressor
> >>> 2011-04-06 11:19:11,621 INFO org.apache.hadoop.io.compress.CodecPool:
> >>> Got brand-new decompressor
> >>> 2011-04-06 11:19:11,636 INFO org.apache.hadoop.io.compress.CodecPool:
> >>> Got brand-new decompressor
> >>> 2011-04-06 11:19:11,646 INFO org.apache.hadoop.io.compress.CodecPool:
> >>> Got brand-new decompressor
> >>> 2011-04-06 11:19:11,656 INFO org.apache.hadoop.io.compress.CodecPool:
> >>> Got brand-new decompressor
> >>> 2011-04-06 11:19:11,667 INFO org.apache.hadoop.io.compress.CodecPool:
> >>> Got brand-new decompressor
> >>> 2011-04-06 11:19:11,685 INFO org.apache.hadoop.io.compress.CodecPool:
> >>> Got brand-new decompressor
> >>> 2011-04-06 11:19:11,686 INFO org.apache.hadoop.io.compress.CodecPool:
> >>> Got brand-new decompressor
> >>> 2011-04-06 11:19:11,704 INFO org.apache.hadoop.io.compress.CodecPool:
> >>> Got brand-new decompressor
> >>> 2011-04-06 11:19:11,705 INFO org.apache.hadoop.mapred.Merger: Merging
> >>> 10 intermediate segments out of a total of 42211
> >>> 2011-04-06 11:19:37,409 INFO org.apache.hadoop.mapred.Merger: Merging
> >>> 10 intermediate segments out of a total of 42202
> >>> 2011-04-06 11:19:59,087 INFO org.apache.hadoop.mapred.Merger: Merging
> >>> 10 intermediate segments out of a total of 42193
> >>> 2011-04-06 11:20:19,060 INFO org.apache.hadoop.mapred.Merger: Merging
> >>> 10 intermediate segments out of a total of 42184
> >>> 2011-04-06 11:20:36,387 INFO org.apache.hadoop.mapred.Merger: Merging
> >>> 10 intermediate segments out of a total of 42175
> >>> 2011-04-06 11:21:00,826 INFO org.apache.hadoop.mapred.Merger: Merging
> >>> 10 intermediate segments out of a total of 42166
> >>> 2011-04-06 11:21:20,049 INFO org.apache.hadoop.mapred.Merger: Merging
> >>> 10 intermediate segments out of a total of 42157
> >>> 2011-04-06 11:21:42,120 INFO org.apache.hadoop.mapred.Merger: Merging
> >>> 10 intermediate segments out of a total of 42148
> >>> 2011-04-06 11:21:58,431 INFO org.apache.hadoop.mapred.Merger: Merging
> >>> 10 intermediate segments out of a total of 42139
> >>> 2011-04-06 11:22:15,594 INFO org.apache.hadoop.mapred.Merger: Merging
> >>> 10 intermediate segments out of a total of 4213
> >>>
> >>> .
> >>>
> >>> .
> >>>
> >>> .
> >>>
> >>>
> >>> 2011-04-09 00:43:01,590 INFO org.apache.hadoop.mapred.Merger: Merging
> >>> 10 intermediate segments out of a total of 73
> >>> 2011-04-09 01:20:59,688 INFO org.apache.hadoop.mapred.Merger: Merging
> >>> 10 intermediate segments out of a total of 64
> >>> 2011-04-09 01:57:23,938 INFO org.apache.hadoop.mapred.Merger: Merging
> >>> 10 intermediate segments out of a total of 55
> >>> 2011-04-09 02:26:39,086 INFO org.apache.hadoop.mapred.Merger: Merging
> >>> 10 intermediate segments out of a total of 46
> >>> 2011-04-09 05:08:55,609 INFO org.apache.hadoop.mapred.Merger: Merging
> >>> 10 intermediate segments out of a total of 37
> >>> 2011-04-09 09:59:05,399 INFO org.apache.hadoop.mapred.Merger: Merging
> >>> 10 intermediate segments out of a total of 28
> >>> 2011-04-09 14:59:58,730 INFO org.apache.hadoop.mapred.Merger: Merging
> >>> 10 intermediate segments out of a total of 19
> >>> 2011-04-09 19:28:18,921 INFO org.apache.hadoop.mapred.Merger: Down to
> >>> the last merge-pass, with 10 segments left of total size: 24786384532
> >>> bytes
> >>> 2011-04-10 14:55:46,726 INFO org.apache.hadoop.mapred.TaskRunner:
> >>> Task:attempt_201103291618_0156_m_000000_1 is done. And is in the
> >>> process of commiting
> >>> 2011-04-10 14:55:46,763 INFO org.apache.hadoop.mapred.TaskRunner: Task
> >>> 'attempt_201103291618_0156_m_000000_1' done.
> >>>
> >>>
> >>> This task took 10 days to run.  I'm guessing that it should run faster?
> >>>
> >>> Any general advice for speeding this up?
> >>>
> >>> Should I increase the io.sort.mb parameter?  It's currently set at
> >>> 100, should it be more like 1,000?  I see that mahout attempts to
> >>> increase the parameter for the last job in RecommenderJob but it
> >>> doesn't seem to for the preceding jobs.  Is that right?
> >>>
> >>> Thanks,
> >>>
> >>> Kris
> >>>
> >>>
> >
> >
> > --
> > Dr Kris Jack,
> > http://www.mendeley.com/profiles/kris-jack/
> >
>



-- 
Dr Kris Jack,
http://www.mendeley.com/profiles/kris-jack/

Reply via email to