Re: Number of concurrent workers

Lukas Nalezenec Tue, 27 Jan 2015 01:13:07 -0800

On 23.1.2015 00:40, Walaa Eldin Moustafa wrote:

Hi,
I am experimenting with a memory-intensive Giraph application on topof a large graph (50 million nodes), on a 14 node cluster.
When setting the number of workers to a large number (500 in thisexample), I get errors for not being able to fulfill the number ofrequested workers (Please see the log excerpt below). To myunderstanding, this contradicts with how Yarn/MR map tasks operate, asif the number of map tasks is more than what is currently available interms of resources, only a subset of the maps are started, and newones are assigned as new slots become available. In other words, asmany map tasks as possible can run concurrently, and new ones are runas resources become available. Is not this the case with Giraphworkers? I expect it to be the case, since workers are basically maptasks, so the same should apply to them. However, the log belowsuggests otherwise, as based on my resources, 37 map tasks (workers)could be created, but the application could not proceed withoutcreating all the 500 workers. Could you please help explaining what iscausing this?

Hi,

Giraph is not standard M/R job. It needs all Mappers to run in samemoment. No computation is started before all mapper are running.Its hard to tell it does not work. I guess you have already raisedtimetout. Check if there is enough slots in queue where jobs is running.

Lukas

Thanks,

Walaa.
Only found 37 responses of 500 needed to start superstep -1. Reportingevery 30000 msecs, 296929 more msecs left before giving up.
2015-01-20 01:29:49,007 ERROR [org.apache.giraph.master.MasterThread]org.apache.giraph.master.BspServiceMaster: checkWorkers: Did notreceive enough processes in time (only 37 of 500 required) afterwaiting 600000msecs). This occurs if you do not have enough map tasksavailable simultaneously on your Hadoop instance to fulfill the numberof requested workers.
2015-01-20 01:29:49,015 FATAL [org.apache.giraph.master.MasterThread]org.apache.giraph.master.BspServiceMaster: failJob: Killing jobjob_1421703431598_0006
2015-01-20 01:29:49,015 FATAL [org.apache.giraph.master.MasterThread]org.apache.giraph.master.BspServiceMaster: failJob: exceptionjava.lang.IllegalStateException: Not enough healthy workers to createinput splits

Re: Number of concurrent workers

Reply via email to