Hi Welly,

sorry for the late response.

The number of network buffers primarily depends on the maximum parallelism
of your job.
The given formula assumes a specific cluster configuration (1 task manager
per machine, one parallel task per CPU).
The formula can be translated to:

taskmanager.network.numberOfBuffers: p ^ 2 * t * 4

where p is the maximum parallelism of the job and t is the number of task
manager.
You can process more than one parallel task per TM if you configure more
than one processing slot per machine ( taskmanager.numberOfTaskSlots). The
TM will divide its memory among all its slots. So it would be possible to
start one TM for each machine with 100GB+ memory and 48 slots each.

We can compute the number of network buffers if you give a few more details
about your setup:
- How many task managers do you start? I assume more than one TM per
machine given that you assign only 4GB of memory out of 128GB to each TM.
- What is the maximum parallelism of you program?
- How many processing slots do you configure for each TM?

In general, pipelined shuffles with a high parallelism require a lot of
memory.
If you configure batch instead of pipelined transfer, the memory
requirement goes down
(ExecutionConfig.setExecutionMode(ExecutionMode.BATCH)).

Eventually, we want to merge the network buffer and the managed memory
pools. So the "taskmanager.network.numberOfBuffers" configuration whill
hopefully disappear at some point in the future.

Best, Fabian

2016-02-19 9:34 GMT+01:00 Welly Tambunan <if05...@gmail.com>:

> Hi All,
>
> We are trying to running our job in cluster that has this information
>
> 1. # of machine: 16
> 2. memory : 128 gb
> 3. # of core : 48
>
> However when we try to run we have an exception.
>
> "insufficient number of network buffers. 48 required but only 10
> available. the total number of network buffers is currently set to 2048"
>
> After looking at the documentation we set configuration based on docs
>
> taskmanager.network.numberOfBuffers: # core ^ 2 * # machine * 4
>
> However we face another error from JVM
>
> java.io.IOException: Cannot allocate network buffer pool: Could not
> allocate enough memory segments for NetworkBufferPool (required (Mb): 2304,
> allocated (Mb): 698, missing (Mb): 1606). Cause: Java heap space
>
> We fiddle the taskmanager.heap.mb: 4096
>
> Finally the cluster is running.
>
> However i'm still not sure about the configuration and fiddling in task
> manager heap really fine tune. So my question is
>
>
>    1. Am i doing it right for numberOfBuffers ?
>    2. How much should we allocate on taskmanager.heap.mb given the
>    information
>    3. Any suggestion which configuration we need to set to make it
>    optimal for the cluster ?
>    4. Is there any chance that this will get automatically resolve by
>    memory/network buffer manager ?
>
> Thanks a lot for the help
>
> Cheers
>
> --
> Welly Tambunan
> Triplelands
>
> http://weltam.wordpress.com
> http://www.triplelands.com <http://www.triplelands.com/blog/>
>

Reply via email to