Hi Sean. I'll give those suggestions a try and let you know how it went.
Thanks, Kris 2011/4/11 Sean Owen <sro...@gmail.com> > Yes, the combiner phase can be a solid bottleneck in some cases here. > It helps to increase both io.sort.factor and io.sort.mb to make it do > a, say, 100-way merge instead of 10 and use maybe half your worker's > heap for spills in memory rather than the default 100MB. > > RecommenderJob already does this on the aggregate-and-recommend step > -- see that call to "setIOSort()". You could apply the same call to > this step and see what happens. If it's helpful we can commit that. > > There are a load of factors involved in tuning this and yes it seems > like this is running way slower than it should. Having less than 1 > worker per machine can't be optimal I think, so you may have to > tinker with that setting too. > > I'd also up the child worker's memory if you have 24GB RAM on each > machine. That, plus the change to io.sort.mb, lets it do much more > merging in memory. > > > > On Mon, Apr 11, 2011 at 2:49 PM, Kris Jack <mrkrisj...@gmail.com> wrote: > > Sure. I called the mahout 0.5 job with these parameters: > > > > $ jar mahout-core-0.5-SNAPSHOT-job.jar > > org.apache.mahout.cf.taste.hadoop.item.RecommenderJob > > -Dmapred.input.dir=/user/kjack/input/ > > -Dmapred.output.dir=/user/kjack/output/ --similarityClassname > > > org.apache.mahout.math.hadoop.similarity.vector.DistributedUncenteredCosineVectorSimilarity > > --booleanData > > > > The data set is quite sparse. ~3M items have been rated by at least 2 > > users. ~50K items have been rated by at least 10 users. ~500 items have > > been rated by at least 50 users. > > > > Running on a cluster with 15 nodes, 24GB ram each. > > > > It's not that the job didn't finish, it's just that it ran very slowly. > By > > letting Hadoop assign mappers and reducers automatically, only 9 mappers > > were assigned and 1 reducer. I might have to up these too. > > > > I looked at the output from the MaybePruneRowsMapper and > CooccurrencesMapper > > jobs and can't find any counters with those names. Just got the basic > ones > > (e.g. FILE_BYTES_READ, HDFS_BYTES_READ, FILE_BYTES_WRITTEN, etc.). I'm > > looking at the counters from the job output through the jobtracker. Am I > > looking in the wrong place? > > > > Thanks, > > Kris > > > > > > > > 2011/4/11 Sebastian Schelter <s...@apache.org> > > > >> Could you give us some more details? Like all the parameters you called > the > >> job with and some hints about the cluster you ran this on? And some > hints > >> about your data, especially how dense/sparse it is? > >> > >> My first suspicion would be that much too much items need to be compared > >> with each other, are using Mahout 0.5 (you should because 0.4 has a bug > in > >> this job) and can you give us the values of the following counters? > >> > >> - MaybePruneRowsMapper Elements.USED and ELEMENTS.NEGLECTED > >> - CooccurrencesMapper Counter.COOCCURRENCES > >> > >> --sebastian > >> > >> > >> > >> On 11.04.2011 14:17, Kris Jack wrote: > >> > >>> Hi everyone, > >>> > >>> I'm running the RecommenderJob with a fairly large dataset (e.g. 175K+ > >>> users, 3M+ items, 40M+ preferences). > >>> > >>> When it gets to the > RowSimilarityJob-CooccurrencesMapper-SimilarityReducer > >>> job, it takes a long time to run. > >>> > >>> I think that the main problem is that there's a lot of spilling going > on > >>> when records are full. Here's extract from the logs: > >>> > >>> > >>> 2011-04-01 21:53:24,245 WARN org.apache.hadoop.conf.Configuration: > >>> > >>> > /srv/hadoop/disk3/cache/mapred/taskTracker/jobcache/job_201103291618_0156/attempt_201103291618_0156_m_000000_1/job.xml:a > >>> attempt to override final parameter: mapred.local.dir; Ignoring. > >>> 2011-04-01 21:53:25,630 INFO org.apache.hadoop.metrics.jvm.JvmMetrics: > >>> Initializing JVM Metrics with processName=MAP, sessionId= > >>> 2011-04-01 21:53:25,792 WARN org.apache.hadoop.conf.Configuration: > >>> > >>> > /srv/hadoop/disk3/cache/mapred/taskTracker/jobcache/job_201103291618_0156/attempt_201103291618_0156_m_000000_1/job.xml:a > >>> attempt to override final parameter: mapred.local.dir; Ignoring. > >>> 2011-04-01 21:53:25,968 INFO org.apache.hadoop.mapred.MapTask: > io.sort.mb > >>> = 100 > >>> 2011-04-01 21:53:26,040 INFO org.apache.hadoop.mapred.MapTask: data > >>> buffer = 79691776/99614720 > >>> 2011-04-01 21:53:26,041 INFO org.apache.hadoop.mapred.MapTask: record > >>> buffer = 262144/327680 > >>> 2011-04-01 21:53:26,757 INFO org.apache.hadoop.mapred.MapTask: > >>> Spilling map output: record full = true > >>> 2011-04-01 21:53:26,757 INFO org.apache.hadoop.mapred.MapTask: > >>> bufstart = 0; bufend = 10492787; bufvoid = 99614720 > >>> 2011-04-01 21:53:26,757 INFO org.apache.hadoop.mapred.MapTask: kvstart > >>> = 0; kvend = 262144; length = 327680 > >>> 2011-04-01 21:53:31,420 INFO org.apache.hadoop.util.NativeCodeLoader: > >>> Loaded the native-hadoop library > >>> 2011-04-01 21:53:31,422 INFO > >>> org.apache.hadoop.io.compress.zlib.ZlibFactory: Successfully loaded& > >>> initialized native-zlib library > >>> 2011-04-01 21:53:31,424 INFO org.apache.hadoop.io.compress.CodecPool: > >>> Got brand-new compressor > >>> 2011-04-01 21:53:32,747 INFO org.apache.hadoop.mapred.MapTask: Finished > >>> spill 0 > >>> 2011-04-01 21:53:33,026 INFO org.apache.hadoop.mapred.MapTask: > >>> Spilling map output: record full = true > >>> 2011-04-01 21:53:33,026 INFO org.apache.hadoop.mapred.MapTask: > >>> bufstart = 10492787; bufend = 21060166; bufvoid = 99614720 > >>> 2011-04-01 21:53:33,026 INFO org.apache.hadoop.mapred.MapTask: kvstart > >>> = 262144; kvend = 196607; length = 327680 > >>> 2011-04-01 21:53:40,004 INFO org.apache.hadoop.mapred.MapTask: Finished > >>> spill 1 > >>> 2011-04-01 21:53:40,235 INFO org.apache.hadoop.mapred.MapTask: > >>> Spilling map output: record full = true > >>> 2011-04-01 21:53:40,235 INFO org.apache.hadoop.mapred.MapTask: > >>> bufstart = 21060166; bufend = 31021600; bufvoid = 99614720 > >>> 2011-04-01 21:53:40,235 INFO org.apache.hadoop.mapred.MapTask: kvstart > >>> = 196607; kvend = 131070; length = 327680 > >>> 2011-04-01 21:53:54,636 INFO org.apache.hadoop.mapred.MapTask: Finished > >>> spill 2 > >>> 2011-04-01 21:53:54,921 INFO org.apache.hadoop.mapred.MapTask: > >>> Spilling map output: record full = true > >>> 2011-04-01 21:53:54,922 INFO org.apache.hadoop.mapred.MapTask: > >>> bufstart = 31021600; bufend = 41605937; bufvoid = 99614720 > >>> 2011-04-01 21:53:54,922 INFO org.apache.hadoop.mapred.MapTask: kvstart > >>> = 131070; kvend = 65533; length = 327680 > >>> 2011-04-01 21:54:04,370 INFO org.apache.hadoop.mapred.MapTask: Finished > >>> spill 3 > >>> 2011-04-01 21:54:04,667 INFO org.apache.hadoop.mapred.MapTask: > >>> Spilling map output: record full = true > >>> 2011-04-01 21:54:04,667 INFO org.apache.hadoop.mapred.MapTask: > >>> bufstart = 41605937; bufend = 52550080; bufvoid = 99614720 > >>> 2011-04-01 21:54:04,667 INFO org.apache.hadoop.mapred.MapTask: kvstart > >>> = 65533; kvend = 327677; length = 327680 > >>> 2011-04-01 21:54:17,717 INFO org.apache.hadoop.mapred.MapTask: Finished > >>> spill 4 > >>> 2011-04-01 21:54:17,997 INFO org.apache.hadoop.mapred.MapTask: > >>> Spilling map output: record full = true > >>> 2011-04-01 21:54:17,997 INFO org.apache.hadoop.mapred.MapTask: > >>> bufstart = 52550080; bufend = 63431004; bufvoid = 99614720 > >>> 2011-04-01 21:54:17,997 INFO org.apache.hadoop.mapred.MapTask: kvstart > >>> = 327677; kvend = 262140; length = 327680 > >>> 2011-04-01 21:54:24,260 INFO org.apache.hadoop.mapred.MapTask: Finished > >>> spill 5 > >>> > >>> . > >>> > >>> > >>> . > >>> > >>> . > >>> > >>> > >>> 2011-04-06 11:18:09,932 INFO org.apache.hadoop.mapred.MapTask: > >>> Finished spill 42207 > >>> 2011-04-06 11:18:10,250 INFO org.apache.hadoop.mapred.MapTask: > >>> Spilling map output: record full = true > >>> 2011-04-06 11:18:10,250 INFO org.apache.hadoop.mapred.MapTask: > >>> bufstart = 53614451; bufend = 64698505; bufvoid = 99614720 > >>> 2011-04-06 11:18:10,250 INFO org.apache.hadoop.mapred.MapTask: kvstart > >>> = 97306; kvend = 31769; length = 327680 > >>> 2011-04-06 11:18:18,905 INFO org.apache.hadoop.mapred.MapTask: > >>> Finished spill 42208 > >>> 2011-04-06 11:18:19,193 INFO org.apache.hadoop.mapred.MapTask: > >>> Spilling map output: record full = true > >>> 2011-04-06 11:18:19,193 INFO org.apache.hadoop.mapred.MapTask: > >>> bufstart = 64698505; bufend = 75588890; bufvoid = 99614720 > >>> 2011-04-06 11:18:19,193 INFO org.apache.hadoop.mapred.MapTask: kvstart > >>> = 31769; kvend = 293913; length = 327680 > >>> 2011-04-06 11:18:32,523 INFO org.apache.hadoop.mapred.MapTask: > >>> Finished spill 42209 > >>> 2011-04-06 11:18:32,683 INFO org.apache.hadoop.mapred.MapTask: > >>> Starting flush of map output > >>> 2011-04-06 11:18:35,894 INFO org.apache.hadoop.mapred.MapTask: > >>> Finished spill 42210 > >>> 2011-04-06 11:19:11,530 INFO org.apache.hadoop.mapred.Merger: Merging > >>> 42211 sorted segments > >>> 2011-04-06 11:19:11,595 INFO org.apache.hadoop.io.compress.CodecPool: > >>> Got brand-new decompressor > >>> 2011-04-06 11:19:11,605 INFO org.apache.hadoop.io.compress.CodecPool: > >>> Got brand-new decompressor > >>> 2011-04-06 11:19:11,621 INFO org.apache.hadoop.io.compress.CodecPool: > >>> Got brand-new decompressor > >>> 2011-04-06 11:19:11,636 INFO org.apache.hadoop.io.compress.CodecPool: > >>> Got brand-new decompressor > >>> 2011-04-06 11:19:11,646 INFO org.apache.hadoop.io.compress.CodecPool: > >>> Got brand-new decompressor > >>> 2011-04-06 11:19:11,656 INFO org.apache.hadoop.io.compress.CodecPool: > >>> Got brand-new decompressor > >>> 2011-04-06 11:19:11,667 INFO org.apache.hadoop.io.compress.CodecPool: > >>> Got brand-new decompressor > >>> 2011-04-06 11:19:11,685 INFO org.apache.hadoop.io.compress.CodecPool: > >>> Got brand-new decompressor > >>> 2011-04-06 11:19:11,686 INFO org.apache.hadoop.io.compress.CodecPool: > >>> Got brand-new decompressor > >>> 2011-04-06 11:19:11,704 INFO org.apache.hadoop.io.compress.CodecPool: > >>> Got brand-new decompressor > >>> 2011-04-06 11:19:11,705 INFO org.apache.hadoop.mapred.Merger: Merging > >>> 10 intermediate segments out of a total of 42211 > >>> 2011-04-06 11:19:37,409 INFO org.apache.hadoop.mapred.Merger: Merging > >>> 10 intermediate segments out of a total of 42202 > >>> 2011-04-06 11:19:59,087 INFO org.apache.hadoop.mapred.Merger: Merging > >>> 10 intermediate segments out of a total of 42193 > >>> 2011-04-06 11:20:19,060 INFO org.apache.hadoop.mapred.Merger: Merging > >>> 10 intermediate segments out of a total of 42184 > >>> 2011-04-06 11:20:36,387 INFO org.apache.hadoop.mapred.Merger: Merging > >>> 10 intermediate segments out of a total of 42175 > >>> 2011-04-06 11:21:00,826 INFO org.apache.hadoop.mapred.Merger: Merging > >>> 10 intermediate segments out of a total of 42166 > >>> 2011-04-06 11:21:20,049 INFO org.apache.hadoop.mapred.Merger: Merging > >>> 10 intermediate segments out of a total of 42157 > >>> 2011-04-06 11:21:42,120 INFO org.apache.hadoop.mapred.Merger: Merging > >>> 10 intermediate segments out of a total of 42148 > >>> 2011-04-06 11:21:58,431 INFO org.apache.hadoop.mapred.Merger: Merging > >>> 10 intermediate segments out of a total of 42139 > >>> 2011-04-06 11:22:15,594 INFO org.apache.hadoop.mapred.Merger: Merging > >>> 10 intermediate segments out of a total of 4213 > >>> > >>> . > >>> > >>> . > >>> > >>> . > >>> > >>> > >>> 2011-04-09 00:43:01,590 INFO org.apache.hadoop.mapred.Merger: Merging > >>> 10 intermediate segments out of a total of 73 > >>> 2011-04-09 01:20:59,688 INFO org.apache.hadoop.mapred.Merger: Merging > >>> 10 intermediate segments out of a total of 64 > >>> 2011-04-09 01:57:23,938 INFO org.apache.hadoop.mapred.Merger: Merging > >>> 10 intermediate segments out of a total of 55 > >>> 2011-04-09 02:26:39,086 INFO org.apache.hadoop.mapred.Merger: Merging > >>> 10 intermediate segments out of a total of 46 > >>> 2011-04-09 05:08:55,609 INFO org.apache.hadoop.mapred.Merger: Merging > >>> 10 intermediate segments out of a total of 37 > >>> 2011-04-09 09:59:05,399 INFO org.apache.hadoop.mapred.Merger: Merging > >>> 10 intermediate segments out of a total of 28 > >>> 2011-04-09 14:59:58,730 INFO org.apache.hadoop.mapred.Merger: Merging > >>> 10 intermediate segments out of a total of 19 > >>> 2011-04-09 19:28:18,921 INFO org.apache.hadoop.mapred.Merger: Down to > >>> the last merge-pass, with 10 segments left of total size: 24786384532 > >>> bytes > >>> 2011-04-10 14:55:46,726 INFO org.apache.hadoop.mapred.TaskRunner: > >>> Task:attempt_201103291618_0156_m_000000_1 is done. And is in the > >>> process of commiting > >>> 2011-04-10 14:55:46,763 INFO org.apache.hadoop.mapred.TaskRunner: Task > >>> 'attempt_201103291618_0156_m_000000_1' done. > >>> > >>> > >>> This task took 10 days to run. I'm guessing that it should run faster? > >>> > >>> Any general advice for speeding this up? > >>> > >>> Should I increase the io.sort.mb parameter? It's currently set at > >>> 100, should it be more like 1,000? I see that mahout attempts to > >>> increase the parameter for the last job in RecommenderJob but it > >>> doesn't seem to for the preceding jobs. Is that right? > >>> > >>> Thanks, > >>> > >>> Kris > >>> > >>> > > > > > > -- > > Dr Kris Jack, > > http://www.mendeley.com/profiles/kris-jack/ > > > -- Dr Kris Jack, http://www.mendeley.com/profiles/kris-jack/