Re: Why my tests shows Yarn is worse than MRv1 for terasort?

Jian Fang Tue, 22 Oct 2013 14:45:31 -0700

Hi,

I have the same problem. I compared Hadoop 2.2.0 with Hadoop 1.0.3 and it
turned out that the terasort for MR2 is 2 times slower than that in MR1. I
cannot really believe it.


The cluster has 20 nodes with 19 data nodes.  My Hadoop 2.2.0 cluster
configurations are as follows.

        mapreduce.map.java.opts = "-Xmx512m";
        mapreduce.reduce.java.opts = "-Xmx1536m";
        mapreduce.map.memory.mb = "768";
        mapreduce.reduce.memory.mb = "2048";

        yarn.scheduler.minimum-allocation-mb = "256";
        yarn.scheduler.maximum-allocation-mb = "8192";
        yarn.nodemanager.resource.memory-mb = "12288";
        yarn.nodemanager.resource.cpu-vcores = "16";

        mapreduce.reduce.shuffle.parallelcopies = "20";
        mapreduce.task.io.sort.factor = "48";
        mapreduce.task.io.sort.mb = "200";
        mapreduce.map.speculative = "true";
        mapreduce.reduce.speculative = "true";
        mapreduce.framework.name = "yarn";
        yarn.app.mapreduce.am.job.task.listener.thread-count = "60";
        mapreduce.map.cpu.vcores = "1";
        mapreduce.reduce.cpu.vcores = "2";

        mapreduce.job.jvm.numtasks = "20";
        mapreduce.map.output.compress = "true";
        mapreduce.map.output.compress.codec =
"org.apache.hadoop.io.compress.SnappyCodec";

        yarn.resourcemanager.client.thread-count = "64";
        yarn.resourcemanager.scheduler.client.thread-count = "64";
        yarn.resourcemanager.resource-tracker.client.thread-count = "64";
        yarn.resourcemanager.scheduler.class =
"org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler";
        yarn.nodemanager.aux-services = "mapreduce_shuffle";
        yarn.nodemanager.aux-services.mapreduce.shuffle.class =
"org.apache.hadoop.mapred.ShuffleHandler";
        yarn.nodemanager.vmem-pmem-ratio = "5";
        yarn.nodemanager.container-executor.class =
"org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor";
        yarn.nodemanager.container-manager.thread-count = "64";
        yarn.nodemanager.localizer.client.thread-count = "20";
        yarn.nodemanager.localizer.fetch.thread-count = "20";

My Hadoop 1.0.3 has the same memory/disks/cores and almost the same other
configurations. In MR1, the 1TB terasort took about 45 minutes, but it took
around 90 minutes in MR2.

Does anyone know what is wrong here? Or do I need some special
configurations for terasort to work better in MR2?

Thanks in advance,

John


On Tue, Jun 18, 2013 at 3:11 AM, Michel Segel <michael_se...@hotmail.com>wrote:

> Sam,
> I think your cluster is too small for any meaningful conclusions to be
> made.
>
>
> Sent from a remote device. Please excuse any typos...
>
> Mike Segel
>
> On Jun 18, 2013, at 3:58 AM, sam liu <samliuhad...@gmail.com> wrote:
>
> Hi Harsh,
>
> Thanks for your detailed response! Now, the efficiency of my Yarn cluster
> improved a lot after increasing the reducer number(mapreduce.job.reduces)
> in mapred-site.xml. But I still have some questions about the way of Yarn
> to execute MRv1 job:
>
> 1.In Hadoop 1.x, a job will be executed by map task and reduce task
> together, with a typical process(map > shuffle > reduce). In Yarn, as I
> know, a MRv1 job will be executed only by ApplicationMaster.
> - Yarn could run multiple kinds of jobs(MR, MPI, ...), but, MRv1 job has
> special execution process(map > shuffle > reduce) in Hadoop 1.x, and how
> Yarn execute a MRv1 job? still include some special MR steps in Hadoop 1.x,
> like map, sort, merge, combine and shuffle?
> - Do the MRv1 parameters still work for Yarn? Like
> mapreduce.task.io.sort.mb and mapreduce.map.sort.spill.percent?
> - What's the general process for ApplicationMaster of Yarn to execute a
> job?
>
> 2. In Hadoop 1.x, we can set the map/reduce slots by setting
> 'mapred.tasktracker.map.tasks.maximum' and
> 'mapred.tasktracker.reduce.tasks.maximum'
> - For Yarn, above tow parameter do not work any more, as yarn uses
> container instead, right?
> - For Yarn, we can set the whole physical mem for a NodeManager using '
> yarn.nodemanager.resource.memory-mb'. But how to set the default size of
> physical mem of a container?
> - How to set the maximum size of physical mem of a container? By the
> parameter of 'mapred.child.java.opts'?
>
> Thanks as always!
>
> 2013/6/9 Harsh J <ha...@cloudera.com>
>
>> Hi Sam,
>>
>> > - How to know the container number? Why you say it will be 22
>> containers due to a 22 GB memory?
>>
>> The MR2's default configuration requests 1 GB resource each for Map
>> and Reduce containers. It requests 1.5 GB for the AM container that
>> runs the job, additionally. This is tunable using the properties
>> Sandy's mentioned in his post.
>>
>> > - My machine has 32 GB memory, how many memory is proper to be assigned
>> to containers?
>>
>> This is a general question. You may use the same process you took to
>> decide optimal number of slots in MR1 to decide this here. Every
>> container is a new JVM, and you're limited by the CPUs you have there
>> (if not the memory). Either increase memory requests from jobs, to
>> lower # of concurrent containers at a given time (runtime change), or
>> lower NM's published memory resources to control the same (config
>> change).
>>
>> > - In mapred-site.xml, if I set 'mapreduce.framework.name' to be
>> 'yarn', will other parameters for mapred-site.xml still work in yarn
>> framework? Like 'mapreduce.task.io.sort.mb' and
>> 'mapreduce.map.sort.spill.percent'
>>
>> Yes, all of these properties will still work. Old properties specific
>> to JobTracker or TaskTracker (usually found as a keyword in the config
>> name) will not apply anymore.
>>
>> On Sun, Jun 9, 2013 at 2:21 PM, sam liu <samliuhad...@gmail.com> wrote:
>> > Hi Harsh,
>> >
>> > According to above suggestions, I removed the duplication of setting,
>> and
>> > reduce the value of 'yarn.nodemanager.resource.cpu-cores',
>> > 'yarn.nodemanager.vcores-pcores-ratio' and
>> > 'yarn.nodemanager.resource.memory-mb' to 16, 8 and 12000. Ant then, the
>> > efficiency improved about 18%.  I have questions:
>> >
>> > - How to know the container number? Why you say it will be 22
>> containers due
>> > to a 22 GB memory?
>> > - My machine has 32 GB memory, how many memory is proper to be assigned
>> to
>> > containers?
>> > - In mapred-site.xml, if I set 'mapreduce.framework.name' to be
>> 'yarn', will
>> > other parameters for mapred-site.xml still work in yarn framework? Like
>> > 'mapreduce.task.io.sort.mb' and 'mapreduce.map.sort.spill.percent'
>> >
>> > Thanks!
>> >
>> >
>> >
>> > 2013/6/8 Harsh J <ha...@cloudera.com>
>> >>
>> >> Hey Sam,
>> >>
>> >> Did you get a chance to retry with Sandy's suggestions? The config
>> >> appears to be asking NMs to use roughly 22 total containers (as
>> >> opposed to 12 total tasks in MR1 config) due to a 22 GB memory
>> >> resource. This could impact much, given the CPU is still the same for
>> >> both test runs.
>> >>
>> >> On Fri, Jun 7, 2013 at 12:23 PM, Sandy Ryza <sandy.r...@cloudera.com>
>> >> wrote:
>> >> > Hey Sam,
>> >> >
>> >> > Thanks for sharing your results.  I'm definitely curious about what's
>> >> > causing the difference.
>> >> >
>> >> > A couple observations:
>> >> > It looks like you've got yarn.nodemanager.resource.memory-mb in there
>> >> > twice
>> >> > with two different values.
>> >> >
>> >> > Your max JVM memory of 1000 MB is (dangerously?) close to the default
>> >> > mapreduce.map/reduce.memory.mb of 1024 MB. Are any of your tasks
>> getting
>> >> > killed for running over resource limits?
>> >> >
>> >> > -Sandy
>> >> >
>> >> >
>> >> > On Thu, Jun 6, 2013 at 10:21 PM, sam liu <samliuhad...@gmail.com>
>> wrote:
>> >> >>
>> >> >> The terasort execution log shows that reduce spent about 5.5 mins
>> from
>> >> >> 33%
>> >> >> to 35% as below.
>> >> >> 13/06/10 08:02:22 INFO mapreduce.Job:  map 100% reduce 31%
>> >> >> 13/06/10 08:02:25 INFO mapreduce.Job:  map 100% reduce 32%
>> >> >> 13/06/10 08:02:46 INFO mapreduce.Job:  map 100% reduce 33%
>> >> >> 13/06/10 08:08:16 INFO mapreduce.Job:  map 100% reduce 35%
>> >> >> 13/06/10 08:08:19 INFO mapreduce.Job:  map 100% reduce 40%
>> >> >> 13/06/10 08:08:22 INFO mapreduce.Job:  map 100% reduce 43%
>> >> >>
>> >> >> Any way, below are my configurations for your reference. Thanks!
>> >> >> (A) core-site.xml
>> >> >> only define 'fs.default.name' and 'hadoop.tmp.dir'
>> >> >>
>> >> >> (B) hdfs-site.xml
>> >> >>   <property>
>> >> >>     <name>dfs.replication</name>
>> >> >>     <value>1</value>
>> >> >>   </property>
>> >> >>
>> >> >>   <property>
>> >> >>     <name>dfs.name.dir</name>
>> >> >>     <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/dfs_name_dir</value>
>> >> >>   </property>
>> >> >>
>> >> >>   <property>
>> >> >>     <name>dfs.data.dir</name>
>> >> >>     <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/dfs_data_dir</value>
>> >> >>   </property>
>> >> >>
>> >> >>   <property>
>> >> >>     <name>dfs.block.size</name>
>> >> >>     <value>134217728</value><!-- 128MB -->
>> >> >>   </property>
>> >> >>
>> >> >>   <property>
>> >> >>     <name>dfs.namenode.handler.count</name>
>> >> >>     <value>64</value>
>> >> >>   </property>
>> >> >>
>> >> >>   <property>
>> >> >>     <name>dfs.datanode.handler.count</name>
>> >> >>     <value>10</value>
>> >> >>   </property>
>> >> >>
>> >> >> (C) mapred-site.xml
>> >> >>   <property>
>> >> >>     <name>mapreduce.cluster.temp.dir</name>
>> >> >>
>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/mapreduce_temp</value>
>> >> >>     <description>No description</description>
>> >> >>     <final>true</final>
>> >> >>   </property>
>> >> >>
>> >> >>   <property>
>> >> >>     <name>mapreduce.cluster.local.dir</name>
>> >> >>
>> >> >>
>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/mapreduce_local_dir</value>
>> >> >>     <description>No description</description>
>> >> >>     <final>true</final>
>> >> >>   </property>
>> >> >>
>> >> >> <property>
>> >> >>   <name>mapreduce.child.java.opts</name>
>> >> >>   <value>-Xmx1000m</value>
>> >> >> </property>
>> >> >>
>> >> >> <property>
>> >> >>     <name>mapreduce.framework.name</name>
>> >> >>     <value>yarn</value>
>> >> >>    </property>
>> >> >>
>> >> >>  <property>
>> >> >>     <name>mapreduce.tasktracker.map.tasks.maximum</name>
>> >> >>     <value>8</value>
>> >> >>   </property>
>> >> >>
>> >> >>   <property>
>> >> >>     <name>mapreduce.tasktracker.reduce.tasks.maximum</name>
>> >> >>     <value>4</value>
>> >> >>   </property>
>> >> >>
>> >> >>
>> >> >>   <property>
>> >> >>     <name>mapreduce.tasktracker.outofband.heartbeat</name>
>> >> >>     <value>true</value>
>> >> >>   </property>
>> >> >>
>> >> >> (D) yarn-site.xml
>> >> >>  <property>
>> >> >>     <name>yarn.resourcemanager.resource-tracker.address</name>
>> >> >>     <value>node1:18025</value>
>> >> >>     <description>host is the hostname of the resource manager and
>> >> >>     port is the port on which the NodeManagers contact the Resource
>> >> >> Manager.
>> >> >>     </description>
>> >> >>   </property>
>> >> >>
>> >> >>   <property>
>> >> >>     <description>The address of the RM web
>> application.</description>
>> >> >>     <name>yarn.resourcemanager.webapp.address</name>
>> >> >>     <value>node1:18088</value>
>> >> >>   </property>
>> >> >>
>> >> >>
>> >> >>   <property>
>> >> >>     <name>yarn.resourcemanager.scheduler.address</name>
>> >> >>     <value>node1:18030</value>
>> >> >>     <description>host is the hostname of the resourcemanager and
>> port
>> >> >> is
>> >> >> the port
>> >> >>     on which the Applications in the cluster talk to the Resource
>> >> >> Manager.
>> >> >>     </description>
>> >> >>   </property>
>> >> >>
>> >> >>
>> >> >>   <property>
>> >> >>     <name>yarn.resourcemanager.address</name>
>> >> >>     <value>node1:18040</value>
>> >> >>     <description>the host is the hostname of the ResourceManager and
>> >> >> the
>> >> >> port is the port on
>> >> >>     which the clients can talk to the Resource Manager.
>> </description>
>> >> >>   </property>
>> >> >>
>> >> >>   <property>
>> >> >>     <name>yarn.nodemanager.local-dirs</name>
>> >> >>
>> >> >> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/yarn_nm_local_dir</value>
>> >> >>     <description>the local directories used by the
>> >> >> nodemanager</description>
>> >> >>   </property>
>> >> >>
>> >> >>   <property>
>> >> >>     <name>yarn.nodemanager.address</name>
>> >> >>     <value>0.0.0.0:18050</value>
>> >> >>     <description>the nodemanagers bind to this port</description>
>> >> >>   </property>
>> >> >>
>> >> >>   <property>
>> >> >>     <name>yarn.nodemanager.resource.memory-mb</name>
>> >> >>     <value>10240</value>
>> >> >>     <description>the amount of memory on the NodeManager in
>> >> >> GB</description>
>> >> >>   </property>
>> >> >>
>> >> >>   <property>
>> >> >>     <name>yarn.nodemanager.remote-app-log-dir</name>
>> >> >>
>> <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/yarn_nm_app-logs</value>
>> >> >>     <description>directory on hdfs where the application logs are
>> moved
>> >> >> to
>> >> >> </description>
>> >> >>   </property>
>> >> >>
>> >> >>    <property>
>> >> >>     <name>yarn.nodemanager.log-dirs</name>
>> >> >>     <value>/opt/hadoop-2.0.4-alpha/temp/hadoop/yarn_nm_log</value>
>> >> >>     <description>the directories used by Nodemanagers as log
>> >> >> directories</description>
>> >> >>   </property>
>> >> >>
>> >> >>   <property>
>> >> >>     <name>yarn.nodemanager.aux-services</name>
>> >> >>     <value>mapreduce.shuffle</value>
>> >> >>     <description>shuffle service that needs to be set for Map
>> Reduce to
>> >> >> run </description>
>> >> >>   </property>
>> >> >>
>> >> >>   <property>
>> >> >>     <name>yarn.resourcemanager.client.thread-count</name>
>> >> >>     <value>64</value>
>> >> >>   </property>
>> >> >>
>> >> >>  <property>
>> >> >>     <name>yarn.nodemanager.resource.cpu-cores</name>
>> >> >>     <value>24</value>
>> >> >>   </property>
>> >> >>
>> >> >> <property>
>> >> >>     <name>yarn.nodemanager.vcores-pcores-ratio</name>
>> >> >>     <value>3</value>
>> >> >>   </property>
>> >> >>
>> >> >>  <property>
>> >> >>     <name>yarn.nodemanager.resource.memory-mb</name>
>> >> >>     <value>22000</value>
>> >> >>   </property>
>> >> >>
>> >> >>  <property>
>> >> >>     <name>yarn.nodemanager.vmem-pmem-ratio</name>
>> >> >>     <value>2.1</value>
>> >> >>   </property>
>> >> >>
>> >> >>
>> >> >>
>> >> >> 2013/6/7 Harsh J <ha...@cloudera.com>
>> >> >>>
>> >> >>> Not tuning configurations at all is wrong. YARN uses memory
>> resource
>> >> >>> based scheduling and hence MR2 would be requesting 1 GB minimum by
>> >> >>> default, causing, on base configs, to max out at 8 (due to 8 GB NM
>> >> >>> memory resource config) total containers. Do share your configs as
>> at
>> >> >>> this point none of us can tell what it is.
>> >> >>>
>> >> >>> Obviously, it isn't our goal to make MR2 slower for users and to
>> not
>> >> >>> care about such things :)
>> >> >>>
>> >> >>> On Fri, Jun 7, 2013 at 8:45 AM, sam liu <samliuhad...@gmail.com>
>> >> >>> wrote:
>> >> >>> > At the begining, I just want to do a fast comparision of MRv1 and
>> >> >>> > Yarn.
>> >> >>> > But
>> >> >>> > they have many differences, and to be fair for comparison I did
>> not
>> >> >>> > tune
>> >> >>> > their configurations at all.  So I got above test results. After
>> >> >>> > analyzing
>> >> >>> > the test result, no doubt, I will configure them and do
>> comparison
>> >> >>> > again.
>> >> >>> >
>> >> >>> > Do you have any idea on current test result? I think, to compare
>> >> >>> > with
>> >> >>> > MRv1,
>> >> >>> > Yarn is better on Map phase(teragen test), but worse on Reduce
>> >> >>> > phase(terasort test).
>> >> >>> > And any detailed suggestions/comments/materials on Yarn
>> performance
>> >> >>> > tunning?
>> >> >>> >
>> >> >>> > Thanks!
>> >> >>> >
>> >> >>> >
>> >> >>> > 2013/6/7 Marcos Luis Ortiz Valmaseda <marcosluis2...@gmail.com>
>> >> >>> >>
>> >> >>> >> Why not to tune the configurations?
>> >> >>> >> Both frameworks have many areas to tune:
>> >> >>> >> - Combiners, Shuffle optimization, Block size, etc
>> >> >>> >>
>> >> >>> >>
>> >> >>> >>
>> >> >>> >> 2013/6/6 sam liu <samliuhad...@gmail.com>
>> >> >>> >>>
>> >> >>> >>> Hi Experts,
>> >> >>> >>>
>> >> >>> >>> We are thinking about whether to use Yarn or not in the near
>> >> >>> >>> future,
>> >> >>> >>> and
>> >> >>> >>> I ran teragen/terasort on Yarn and MRv1 for comprison.
>> >> >>> >>>
>> >> >>> >>> My env is three nodes cluster, and each node has similar
>> hardware:
>> >> >>> >>> 2
>> >> >>> >>> cpu(4 core), 32 mem. Both Yarn and MRv1 cluster are set on the
>> >> >>> >>> same
>> >> >>> >>> env. To
>> >> >>> >>> be fair, I did not make any performance tuning on their
>> >> >>> >>> configurations, but
>> >> >>> >>> use the default configuration values.
>> >> >>> >>>
>> >> >>> >>> Before testing, I think Yarn will be much better than MRv1, if
>> >> >>> >>> they
>> >> >>> >>> all
>> >> >>> >>> use default configuration, because Yarn is a better framework
>> than
>> >> >>> >>> MRv1.
>> >> >>> >>> However, the test result shows some differences:
>> >> >>> >>>
>> >> >>> >>> MRv1: Hadoop-1.1.1
>> >> >>> >>> Yarn: Hadoop-2.0.4
>> >> >>> >>>
>> >> >>> >>> (A) Teragen: generate 10 GB data:
>> >> >>> >>> - MRv1: 193 sec
>> >> >>> >>> - Yarn: 69 sec
>> >> >>> >>> Yarn is 2.8 times better than MRv1
>> >> >>> >>>
>> >> >>> >>> (B) Terasort: sort 10 GB data:
>> >> >>> >>> - MRv1: 451 sec
>> >> >>> >>> - Yarn: 1136 sec
>> >> >>> >>> Yarn is 2.5 times worse than MRv1
>> >> >>> >>>
>> >> >>> >>> After a fast analysis, I think the direct cause might be that
>> Yarn
>> >> >>> >>> is
>> >> >>> >>> much faster than MRv1 on Map phase, but much worse on Reduce
>> >> >>> >>> phase.
>> >> >>> >>>
>> >> >>> >>> Here I have two questions:
>> >> >>> >>> - Why my tests shows Yarn is worse than MRv1 for terasort?
>> >> >>> >>> - What's the stratage for tuning Yarn performance? Is any
>> >> >>> >>> materials?
>> >> >>> >>>
>> >> >>> >>> Thanks!
>> >> >>> >>
>> >> >>> >>
>> >> >>> >>
>> >> >>> >>
>> >> >>> >> --
>> >> >>> >> Marcos Ortiz Valmaseda
>> >> >>> >> Product Manager at PDVSA
>> >> >>> >> http://about.me/marcosortiz
>> >> >>> >>
>> >> >>> >
>> >> >>>
>> >> >>>
>> >> >>>
>> >> >>> --
>> >> >>> Harsh J
>> >> >>
>> >> >>
>> >> >
>> >>
>> >>
>> >>
>> >> --
>> >> Harsh J
>> >
>> >
>>
>>
>>
>> --
>> Harsh J
>>
>
>

Re: Why my tests shows Yarn is worse than MRv1 for terasort?

Reply via email to