The log shows that there are 2 map tasks and 10 reduce tasks.
How can there be 10 reduce tasks when I set parameter
I would like to increase the amount of concurrent map tasks. Any parameter
suggestions for that?

It seems that configuration parameter
'' doesn't grow the number of
concurrently running map tasks...

Some log rows from mahout cvb:

12/12/03 10:30:23 INFO mapred.JobClient: Job complete: job_201212011004_0432
12/12/03 10:30:23 INFO mapred.JobClient: Counters: 32
12/12/03 10:30:23 INFO mapred.JobClient:   File System Counters
12/12/03 10:30:23 INFO mapred.JobClient:     FILE: Number of bytes
12/12/03 10:30:23 INFO mapred.JobClient:     FILE: Number of bytes
12/12/03 10:30:23 INFO mapred.JobClient:     FILE: Number of read
12/12/03 10:30:23 INFO mapred.JobClient:     FILE: Number of large read
12/12/03 10:30:23 INFO mapred.JobClient:     FILE: Number of write
12/12/03 10:30:23 INFO mapred.JobClient:     HDFS: Number of bytes
12/12/03 10:30:23 INFO mapred.JobClient:     HDFS: Number of bytes
12/12/03 10:30:23 INFO mapred.JobClient:     HDFS: Number of read
12/12/03 10:30:23 INFO mapred.JobClient:     HDFS: Number of large read
12/12/03 10:30:23 INFO mapred.JobClient:     HDFS: Number of write
12/12/03 10:30:23 INFO mapred.JobClient:   Job Counters
12/12/03 10:30:23 INFO mapred.JobClient:     Launched map tasks=2
12/12/03 10:30:23 INFO mapred.JobClient:     Launched reduce tasks=10
12/12/03 10:30:23 INFO mapred.JobClient:     Data-local map tasks=2
12/12/03 10:30:23 INFO mapred.JobClient:     Total time spent by all maps
in occupied slots (ms)=456617
12/12/03 10:30:23 INFO mapred.JobClient:     Total time spent by all
reduces in occupied slots (ms)=108715
12/12/03 10:30:23 INFO mapred.JobClient:     Total time spent by all maps
waiting after reserving slots (ms)=0
12/12/03 10:30:23 INFO mapred.JobClient:     Total time spent by all
reduces waiting after reserving slots (ms)=0
12/12/03 10:30:23 INFO mapred.JobClient:   Map-Reduce Framework
12/12/03 10:30:23 INFO mapred.JobClient:     Map input records=77332
12/12/03 10:30:23 INFO mapred.JobClient:     Map output records=100
12/12/03 10:30:23 INFO mapred.JobClient:     Map output bytes=8075900
12/12/03 10:30:23 INFO mapred.JobClient:     Input split bytes=288
12/12/03 10:30:23 INFO mapred.JobClient:     Combine input records=100
12/12/03 10:30:23 INFO mapred.JobClient:     Combine output records=100
12/12/03 10:30:23 INFO mapred.JobClient:     Reduce input groups=50
12/12/03 10:30:23 INFO mapred.JobClient:     Reduce shuffle bytes=8076520
12/12/03 10:30:23 INFO mapred.JobClient:     Reduce input records=100
12/12/03 10:30:23 INFO mapred.JobClient:     Reduce output records=50
12/12/03 10:30:23 INFO mapred.JobClient:     Spilled Records=200
12/12/03 10:30:23 INFO mapred.JobClient:     CPU time spent (ms)=570850
12/12/03 10:30:23 INFO mapred.JobClient:     Physical memory (bytes)
12/12/03 10:30:23 INFO mapred.JobClient:     Virtual memory (bytes)
12/12/03 10:30:23 INFO mapred.JobClient:     Total committed heap usage

Cheers, Markus

2012/12/3 Markus Paaso <>

> Hi,
> I have some problems to utilize all available CPU power for 'mahout cvb'
> command.
> The CPU usage is just about 35% and IO wait ~0%.
> I have 8 cores and 28 GB memory in a single computer that is running
> Mahout 0.7-cdh-4.1.2 with Hadoop 2.0.0-cdh4.1.2 in pseudo-distributed mode.
> How can I take advantage of all the CPU power for a single 'mahout cvb'
> task?
> I use following parameters to run mahout cvb:
> mahout cvb
> -Ddfs.namenode.handler.count=32
> -Dmapred.job.tracker.handler.count=32
> -Dio.sort.factor=30
> -Dio.sort.mb=500
> -Dio.file.buffer.size=65536
> -Dmapred.job.reuse.jvm.num.tasks=-1
> -Dmapred.reduce.tasks=7
> -Dmapred.max.split.size=3145728
> -Dmapred.min.split.size=3145728
> -Dmapred.tasktracker.reduce.tasks.maximum=7
> -Dmapred.tasktracker.tasks.maximum=7
>   --input ~/mahout-files/mydatavectors_int
>   --output ~/mahout-files/topics
>   --num_terms 10078
>   --num_topics 50
>   --doc_topic_output ~/mahout-files/doc-topics
>   --maxIter 50
>   --num_update_threads 8
>   --num_train_threads 8
>   -block 1
>   --test_set_fraction 0.1
>   --convergenceDelta 0.0000001
>   --tempDir ~/mahout-files/cvb-temp
> Linux top command says:
> Cpu(s): 33.9%us,  1.1%sy,  0.0%ni, 65.0%id,  0.0%wa,  0.0%hi,  0.0%si,
> 0.0%st
> Mem:  28479224k total, 16398624k used, 12080600k free,   899576k buffers
> Swap: 28942332k total,        0k used, 28942332k free,  5733368k cached
> 19765 mapred    20   0 2811m 650m  16m S  129  2.3   3:59.06 java
> 19721 mapred    20   0 2812m 650m  16m S  125  2.3   3:53.70 java
> So just 2.5 / 8 cores are fully in use.
> Regards, Markus

Markus Paaso
Developer, Sagire Software Oy

