The log shows that there are 2 map tasks and 10 reduce tasks.
How can there be 10 reduce tasks when I set parameter
'-Dmapred.tasktracker.reduce.tasks.maximum=7'?
I would like to increase the amount of concurrent map tasks. Any parameter
suggestions for that?

It seems that configuration parameter
'mapred.tasktracker.map.tasks.maximum' doesn't grow the number of
concurrently running map tasks...


Some log rows from mahout cvb:

12/12/03 10:30:23 INFO mapred.JobClient: Job complete: job_201212011004_0432
12/12/03 10:30:23 INFO mapred.JobClient: Counters: 32
12/12/03 10:30:23 INFO mapred.JobClient:   File System Counters
12/12/03 10:30:23 INFO mapred.JobClient:     FILE: Number of bytes
read=8076460
12/12/03 10:30:23 INFO mapred.JobClient:     FILE: Number of bytes
written=18396152
12/12/03 10:30:23 INFO mapred.JobClient:     FILE: Number of read
operations=0
12/12/03 10:30:23 INFO mapred.JobClient:     FILE: Number of large read
operations=0
12/12/03 10:30:23 INFO mapred.JobClient:     FILE: Number of write
operations=0
12/12/03 10:30:23 INFO mapred.JobClient:     HDFS: Number of bytes
read=14054985
12/12/03 10:30:23 INFO mapred.JobClient:     HDFS: Number of bytes
written=4040120
12/12/03 10:30:23 INFO mapred.JobClient:     HDFS: Number of read
operations=166
12/12/03 10:30:23 INFO mapred.JobClient:     HDFS: Number of large read
operations=0
12/12/03 10:30:23 INFO mapred.JobClient:     HDFS: Number of write
operations=91
12/12/03 10:30:23 INFO mapred.JobClient:   Job Counters
12/12/03 10:30:23 INFO mapred.JobClient:     Launched map tasks=2
12/12/03 10:30:23 INFO mapred.JobClient:     Launched reduce tasks=10
12/12/03 10:30:23 INFO mapred.JobClient:     Data-local map tasks=2
12/12/03 10:30:23 INFO mapred.JobClient:     Total time spent by all maps
in occupied slots (ms)=456617
12/12/03 10:30:23 INFO mapred.JobClient:     Total time spent by all
reduces in occupied slots (ms)=108715
12/12/03 10:30:23 INFO mapred.JobClient:     Total time spent by all maps
waiting after reserving slots (ms)=0
12/12/03 10:30:23 INFO mapred.JobClient:     Total time spent by all
reduces waiting after reserving slots (ms)=0
12/12/03 10:30:23 INFO mapred.JobClient:   Map-Reduce Framework
12/12/03 10:30:23 INFO mapred.JobClient:     Map input records=77332
12/12/03 10:30:23 INFO mapred.JobClient:     Map output records=100
12/12/03 10:30:23 INFO mapred.JobClient:     Map output bytes=8075900
12/12/03 10:30:23 INFO mapred.JobClient:     Input split bytes=288
12/12/03 10:30:23 INFO mapred.JobClient:     Combine input records=100
12/12/03 10:30:23 INFO mapred.JobClient:     Combine output records=100
12/12/03 10:30:23 INFO mapred.JobClient:     Reduce input groups=50
12/12/03 10:30:23 INFO mapred.JobClient:     Reduce shuffle bytes=8076520
12/12/03 10:30:23 INFO mapred.JobClient:     Reduce input records=100
12/12/03 10:30:23 INFO mapred.JobClient:     Reduce output records=50
12/12/03 10:30:23 INFO mapred.JobClient:     Spilled Records=200
12/12/03 10:30:23 INFO mapred.JobClient:     CPU time spent (ms)=570850
12/12/03 10:30:23 INFO mapred.JobClient:     Physical memory (bytes)
snapshot=3334303744
12/12/03 10:30:23 INFO mapred.JobClient:     Virtual memory (bytes)
snapshot=35329503232
12/12/03 10:30:23 INFO mapred.JobClient:     Total committed heap usage
(bytes)=6070009856


Cheers, Markus


2012/12/3 Markus Paaso <markus.pa...@sagire.fi>

> Hi,
>
> I have some problems to utilize all available CPU power for 'mahout cvb'
> command.
> The CPU usage is just about 35% and IO wait ~0%.
> I have 8 cores and 28 GB memory in a single computer that is running
> Mahout 0.7-cdh-4.1.2 with Hadoop 2.0.0-cdh4.1.2 in pseudo-distributed mode.
> How can I take advantage of all the CPU power for a single 'mahout cvb'
> task?
>
>
> I use following parameters to run mahout cvb:
>
> mahout cvb
> -Ddfs.namenode.handler.count=32
> -Dmapred.job.tracker.handler.count=32
> -Dio.sort.factor=30
> -Dio.sort.mb=500
> -Dio.file.buffer.size=65536
> -Dmapred.child.java.opts=-Xmx2g
> -Dmapred.map.child.java.opts=-Xmx2g
> -Dmapred.reduce.child.java.opts=-Xmx2g
> -Dmapred.job.reuse.jvm.num.tasks=-1
> -Dmapred.map.tasks=7
> -Dmapred.reduce.tasks=7
> -Dmapred.max.split.size=3145728
> -Dmapred.min.split.size=3145728
> -Dmapred.tasktracker.map.tasks.maximum=7
> -Dmapred.tasktracker.reduce.tasks.maximum=7
> -Dmapred.tasktracker.tasks.maximum=7
>   --input ~/mahout-files/mydatavectors_int
>   --output ~/mahout-files/topics
>   --num_terms 10078
>   --num_topics 50
>   --doc_topic_output ~/mahout-files/doc-topics
>   --maxIter 50
>   --num_update_threads 8
>   --num_train_threads 8
>   -block 1
>   --test_set_fraction 0.1
>   --convergenceDelta 0.0000001
>   --tempDir ~/mahout-files/cvb-temp
>
>
> Linux top command says:
>
> Cpu(s): 33.9%us,  1.1%sy,  0.0%ni, 65.0%id,  0.0%wa,  0.0%hi,  0.0%si,
> 0.0%st
> Mem:  28479224k total, 16398624k used, 12080600k free,   899576k buffers
> Swap: 28942332k total,        0k used, 28942332k free,  5733368k cached
>   PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
> 19765 mapred    20   0 2811m 650m  16m S  129  2.3   3:59.06 java
> 19721 mapred    20   0 2812m 650m  16m S  125  2.3   3:53.70 java
>
> So just 2.5 / 8 cores are fully in use.
>
>
> Regards, Markus
>



-- 
Markus Paaso
Developer, Sagire Software Oy
http://sagire.fi/

Reply via email to