Just bumping up this thread. Thanks in advance for the help. Thanks, Walaa.
On Thu, Jan 22, 2015 at 3:40 PM, Walaa Eldin Moustafa <wa.moust...@gmail.com > wrote: > Hi, > > I am experimenting with a memory-intensive Giraph application on top of a > large graph (50 million nodes), on a 14 node cluster. > > When setting the number of workers to a large number (500 in this > example), I get errors for not being able to fulfill the number of > requested workers (Please see the log excerpt below). To my understanding, > this contradicts with how Yarn/MR map tasks operate, as if the number of > map tasks is more than what is currently available in terms of resources, > only a subset of the maps are started, and new ones are assigned as new > slots become available. In other words, as many map tasks as possible can > run concurrently, and new ones are run as resources become available. Is > not this the case with Giraph workers? I expect it to be the case, since > workers are basically map tasks, so the same should apply to them. However, > the log below suggests otherwise, as based on my resources, 37 map tasks > (workers) could be created, but the application could not proceed without > creating all the 500 workers. Could you please help explaining what is > causing this? > > Thanks, > > Walaa. > > > Only found 37 responses of 500 needed to start superstep -1. Reporting > every 30000 msecs, 296929 more msecs left before giving up. > > > 2015-01-20 01:29:49,007 ERROR [org.apache.giraph.master.MasterThread] > org.apache.giraph.master.BspServiceMaster: checkWorkers: Did not receive > enough processes in time (only 37 of 500 required) after waiting > 600000msecs). This occurs if you do not have enough map tasks available > simultaneously on your Hadoop instance to fulfill the number of requested > workers. > > > 2015-01-20 01:29:49,015 FATAL [org.apache.giraph.master.MasterThread] > org.apache.giraph.master.BspServiceMaster: failJob: Killing job > job_1421703431598_0006 > > 2015-01-20 01:29:49,015 FATAL [org.apache.giraph.master.MasterThread] > org.apache.giraph.master.BspServiceMaster: failJob: exception > java.lang.IllegalStateException: Not enough healthy workers to create input > splits > >