The data that the each map task will process is different from the memory
the task itself might require depending upon whatever processing that you
plan to do in the task.

Very trivial example: Let us say your map gets 128mb input data but your
task logic is such that it creates lots of String objects and ArrayList
objects, then wouldn't your memory requirement for the task be greater than
your input data?

I think you are confusing the size of the input data to the map/task with
the actual memory required by the map/task itself to do its work.

Regards,
Shahab

On Wed, Oct 15, 2014 at 9:44 AM, SACHINGUPTA <sac...@datametica.com> wrote:

>  it is still not clear to me
> lets suppose block size of my hdfs is 128 mb so every mapper will process
> only 128 mb of data
> then what is the meaning of setting the property mapreduce.map.memory.mb
> that is already known from the block size then why this property
>
>
>
> On Wednesday 15 October 2014 07:06 PM, Shahab Yunus wrote:
>
> Explanation here.
>
>
> http://stackoverflow.com/questions/24070557/what-is-the-relation-between-mapreduce-map-memory-mb-and-mapred-map-child-jav
>
> https://support.pivotal.io/hc/en-us/articles/201462036-Mapreduce-YARN-Memory-Parameters
>
> http://hadoop.apache.org/docs/r2.5.1/hadoop-project-dist/hadoop-common/ClusterSetup.html
> (scroll towards the end.)
>
>  Regards,
> Shahab
>
> On Wed, Oct 15, 2014 at 9:24 AM, SACHINGUPTA <sac...@datametica.com>
> wrote:
>
>>  I have one more doubt i was reading this
>>
>>
>> http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.0.6.0/bk_installing_manually_book/content/rpm-chap1-11.html
>>
>> there is one property as
>>
>>   mapreduce.map.memory.mb  = 2*1024 MB
>> mapreduce.reduce.memory.mb           = 2 * 2 = 4*1024 MB
>> what are these properties mapreduce.map.memory.mb and
>> mapreduce.reduce.memory.mb
>>
>> On Wednesday 15 October 2014 06:17 PM, Shahab Yunus wrote:
>>
>> It cannot run more mappers (tasks) in parallel than the underlying cores
>> available. Just like it cannot run multiple mappers in parallel if each
>> mapper's (task's) memory requirements are greater than allocated and
>> available container size configured on each node.
>>
>>  The links that I provided earlier...see the following section in that
>> one:
>> Section:"Configuring YARN"
>>
>>  Also this:
>> http://blog.cloudera.com/blog/2014/04/apache-hadoop-yarn-avoiding-6-time-consuming-gotchas/
>> Section "1. YARN Concurrency (aka “What Happened to Slots?”)"
>>
>>  This should help in putting things in perspective regarding how
>> resource allocation for each task, container and resources available on the
>> node relate to each other.
>>
>>  Regards,
>> Shahab
>>
>> On Wed, Oct 15, 2014 at 8:18 AM, SACHINGUPTA <sac...@datametica.com>
>> wrote:
>>
>>>  but Shahab if i have only 4 core machine then how yarn can run more
>>> then 4 mappers in parallel
>>> On Wednesday 15 October 2014 05:45 PM, Shahab Yunus wrote:
>>>
>>> It depends on memory settings as well, that how much you want to assign
>>> resources to each container. Then yarn will run as many mappers in parallel
>>> as possible.
>>>
>>>  See this:
>>> http://hortonworks.com/blog/how-to-plan-and-configure-yarn-in-hdp-2-0/
>>>
>>> http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.0.6.0/bk_installing_manually_book/content/rpm-chap1-11.html
>>>
>>>  Regards,
>>> Shahab
>>>
>>> On Wed, Oct 15, 2014 at 8:09 AM, SACHINGUPTA <sac...@datametica.com>
>>> wrote:
>>>
>>>> Hi guys
>>>>
>>>> I have situation in which i have machine with 4 processor and i have 5
>>>> containers so does it mean i can have only 4 mappers running parallely at a
>>>> time
>>>>
>>>> and number of mappers is not dependent on the number of containers in a
>>>> machine then what is the use of container concept
>>>>
>>>> sorry if i have asked anything obvious.
>>>>
>>>> --
>>>> Thanks
>>>> Sachin Gupta
>>>>
>>>>
>>>
>>> --
>>> Thanks
>>> Sachin Gupta
>>>
>>>
>>
>> --
>> Thanks
>> Sachin Gupta
>>
>>
>
> --
> Thanks
> Sachin Gupta
>
>

Reply via email to