Hi Welly, sorry for the late response.
The number of network buffers primarily depends on the maximum parallelism of your job. The given formula assumes a specific cluster configuration (1 task manager per machine, one parallel task per CPU). The formula can be translated to: taskmanager.network.numberOfBuffers: p ^ 2 * t * 4 where p is the maximum parallelism of the job and t is the number of task manager. You can process more than one parallel task per TM if you configure more than one processing slot per machine ( taskmanager.numberOfTaskSlots). The TM will divide its memory among all its slots. So it would be possible to start one TM for each machine with 100GB+ memory and 48 slots each. We can compute the number of network buffers if you give a few more details about your setup: - How many task managers do you start? I assume more than one TM per machine given that you assign only 4GB of memory out of 128GB to each TM. - What is the maximum parallelism of you program? - How many processing slots do you configure for each TM? In general, pipelined shuffles with a high parallelism require a lot of memory. If you configure batch instead of pipelined transfer, the memory requirement goes down (ExecutionConfig.setExecutionMode(ExecutionMode.BATCH)). Eventually, we want to merge the network buffer and the managed memory pools. So the "taskmanager.network.numberOfBuffers" configuration whill hopefully disappear at some point in the future. Best, Fabian 2016-02-19 9:34 GMT+01:00 Welly Tambunan <if05...@gmail.com>: > Hi All, > > We are trying to running our job in cluster that has this information > > 1. # of machine: 16 > 2. memory : 128 gb > 3. # of core : 48 > > However when we try to run we have an exception. > > "insufficient number of network buffers. 48 required but only 10 > available. the total number of network buffers is currently set to 2048" > > After looking at the documentation we set configuration based on docs > > taskmanager.network.numberOfBuffers: # core ^ 2 * # machine * 4 > > However we face another error from JVM > > java.io.IOException: Cannot allocate network buffer pool: Could not > allocate enough memory segments for NetworkBufferPool (required (Mb): 2304, > allocated (Mb): 698, missing (Mb): 1606). Cause: Java heap space > > We fiddle the taskmanager.heap.mb: 4096 > > Finally the cluster is running. > > However i'm still not sure about the configuration and fiddling in task > manager heap really fine tune. So my question is > > > 1. Am i doing it right for numberOfBuffers ? > 2. How much should we allocate on taskmanager.heap.mb given the > information > 3. Any suggestion which configuration we need to set to make it > optimal for the cluster ? > 4. Is there any chance that this will get automatically resolve by > memory/network buffer manager ? > > Thanks a lot for the help > > Cheers > > -- > Welly Tambunan > Triplelands > > http://weltam.wordpress.com > http://www.triplelands.com <http://www.triplelands.com/blog/> >