Number of concurrent workers

Walaa Eldin Moustafa Thu, 22 Jan 2015 15:44:34 -0800

Hi,

I am experimenting with a memory-intensive Giraph application on top of a
large graph (50 million nodes), on a 14 node cluster.


When setting the number of workers to a large number (500 in this example),
I get errors for not being able to fulfill the number of requested workers
(Please see the log excerpt below). To my understanding, this contradicts
with how Yarn/MR map tasks operate, as if the number of map tasks is more
than what is currently available in terms of resources, only a subset of
the maps are started, and new ones are assigned as new slots become
available. In other words, as many map tasks as possible can run
concurrently, and new ones are run as resources become available. Is not
this the case with Giraph workers? I expect it to be the case, since
workers are basically map tasks, so the same should apply to them. However,
the log below suggests otherwise, as based on my resources, 37 map tasks
(workers) could be created, but the application could not proceed without
creating all the 500 workers. Could you please help explaining what is
causing this?

Thanks,

Walaa.


Only found 37 responses of 500 needed to start superstep -1.  Reporting
every 30000 msecs, 296929 more msecs left before giving up.


2015-01-20 01:29:49,007 ERROR [org.apache.giraph.master.MasterThread]
org.apache.giraph.master.BspServiceMaster: checkWorkers: Did not receive
enough processes in time (only 37 of 500 required) after waiting
600000msecs).  This occurs if you do not have enough map tasks available
simultaneously on your Hadoop instance to fulfill the number of requested
workers.


2015-01-20 01:29:49,015 FATAL [org.apache.giraph.master.MasterThread]
org.apache.giraph.master.BspServiceMaster: failJob: Killing job
job_1421703431598_0006

2015-01-20 01:29:49,015 FATAL [org.apache.giraph.master.MasterThread]
org.apache.giraph.master.BspServiceMaster: failJob: exception
java.lang.IllegalStateException: Not enough healthy workers to create input
splits

Number of concurrent workers

Reply via email to