Hi Fabian,

Thanks a lot for your response.

- How many task managers do you start? I assume more than one TM per
machine given that you assign only 4GB of memory out of 128GB to each TM.

Currently what we have done is start a 1 TM per machine with number of task
slot 48.

- What is the maximum parallelism of you program?

Paralleism is around 30 and 40.

- How many processing slots do you configure for each TM?
We configure 48 (#core) for each TM. One TM for each machine.

But i would like to ask another question. Is that better to start 48 task
manager in one machine with number of task slot 1 ? Any trade-off that we
should know etc ?




On Mon, Feb 22, 2016 at 5:26 PM, Fabian Hueske <fhue...@gmail.com> wrote:

> Hi Welly,
>
> sorry for the late response.
>
> The number of network buffers primarily depends on the maximum parallelism
> of your job.
> The given formula assumes a specific cluster configuration (1 task manager
> per machine, one parallel task per CPU).
> The formula can be translated to:
>
> taskmanager.network.numberOfBuffers: p ^ 2 * t * 4
>
> where p is the maximum parallelism of the job and t is the number of task
> manager.
> You can process more than one parallel task per TM if you configure more
> than one processing slot per machine ( taskmanager.numberOfTaskSlots).
> The TM will divide its memory among all its slots. So it would be possible
> to start one TM for each machine with 100GB+ memory and 48 slots each.
>
> We can compute the number of network buffers if you give a few more
> details about your setup:
> - How many task managers do you start? I assume more than one TM per
> machine given that you assign only 4GB of memory out of 128GB to each TM.
> - What is the maximum parallelism of you program?
> - How many processing slots do you configure for each TM?
>
> In general, pipelined shuffles with a high parallelism require a lot of
> memory.
> If you configure batch instead of pipelined transfer, the memory
> requirement goes down
> (ExecutionConfig.setExecutionMode(ExecutionMode.BATCH)).
>
> Eventually, we want to merge the network buffer and the managed memory
> pools. So the "taskmanager.network.numberOfBuffers" configuration whill
> hopefully disappear at some point in the future.
>
> Best, Fabian
>
> 2016-02-19 9:34 GMT+01:00 Welly Tambunan <if05...@gmail.com>:
>
>> Hi All,
>>
>> We are trying to running our job in cluster that has this information
>>
>> 1. # of machine: 16
>> 2. memory : 128 gb
>> 3. # of core : 48
>>
>> However when we try to run we have an exception.
>>
>> "insufficient number of network buffers. 48 required but only 10
>> available. the total number of network buffers is currently set to 2048"
>>
>> After looking at the documentation we set configuration based on docs
>>
>> taskmanager.network.numberOfBuffers: # core ^ 2 * # machine * 4
>>
>> However we face another error from JVM
>>
>> java.io.IOException: Cannot allocate network buffer pool: Could not
>> allocate enough memory segments for NetworkBufferPool (required (Mb): 2304,
>> allocated (Mb): 698, missing (Mb): 1606). Cause: Java heap space
>>
>> We fiddle the taskmanager.heap.mb: 4096
>>
>> Finally the cluster is running.
>>
>> However i'm still not sure about the configuration and fiddling in task
>> manager heap really fine tune. So my question is
>>
>>
>>    1. Am i doing it right for numberOfBuffers ?
>>    2. How much should we allocate on taskmanager.heap.mb given the
>>    information
>>    3. Any suggestion which configuration we need to set to make it
>>    optimal for the cluster ?
>>    4. Is there any chance that this will get automatically resolve by
>>    memory/network buffer manager ?
>>
>> Thanks a lot for the help
>>
>> Cheers
>>
>> --
>> Welly Tambunan
>> Triplelands
>>
>> http://weltam.wordpress.com
>> http://www.triplelands.com <http://www.triplelands.com/blog/>
>>
>
>


-- 
Welly Tambunan
Triplelands

http://weltam.wordpress.com
http://www.triplelands.com <http://www.triplelands.com/blog/>

Reply via email to