I'm using giraph-1.3.0-SNAPSHOT and hadoop-2.8.4 on Amazon EC2 cluster. My cluster is composed by 4 t2.large machines each having 8 GB RAM and 2 cpus (In future I'll have to use 20 c3.8xlarge machines, each having 60 GB RAM and 32 CPU). I'm blocked on this problem: "Giraph's estimated cluster heap xxxxMBs ask is greater than the current available cluster heap of 0MB. Aborting Job". I red this previous post https://stackoverflow.com/questions/28977337/giraphs-estimated-cluster-heap-4096mb-ask-is-greater-than-the-current-available but I didn't understand what caused the problem in my case since I have configured yarn.resourcemanager.hostname (see below) and my security group is open to all traffic. Maybe I miss some settings (or some ports)?
Furthermore, I have following questions: - Since Giraph doesn't use reduce but only map, is it correct to assign less memory to mapreduce.reduce.memory.mb than the memory assigned to mapreduce.map.memory.mb? Maybe it could be right to assign even 0 MBs to mapreduce.reduce.memory.mb since giraph doesn't use reduce? - I red http://giraph.apache.org/quick_start.html that mapred.tasktracker.map.tasks.maximum and mapred.map.tasks must be set to 4 since "by default hadoop allows 2 mappers to run at once. Giraph's code, however, assumes that we can run 4 mappers at the same time." Hence 4 value must be always set to these properties? This is my configuration. I reported only mapred-site.xml and yarn-site.xml because on others hadoop configuration files I'm quite sure that they are correct. mapred-site.xml <configuration> <property> <name>mapreduce.jobtracker.address</name> <value>{HOSTNAME}:54311</value> </property> <property> <name>mapreduce.framework.name</name> <value>yarn</value> </property> <property> <name>mapred.tasktracker.map.tasks.maximum</name> <value>4</value> </property> <property> <name>mapred.map.tasks</name> <value>4</value> </property> <property> <name>mapreduce.map.memory.mb</name> <value>4608</value> </property> <property> <name>mapreduce.reduce.memory.mb</name> <value>512</value> </property> </configuration> yarn-site.xml <configuration> <property> <name>yarn.nodemanager.aux-services</name> <value>mapreduce_shuffle</value> </property> <property> <name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name> <value>org.apache.hadoop.mapred.ShuffleHandler</value> </property> <property> <name>yarn.resourcemanager.hostname</name> <value>{HOSTNAME}</value> </property> <property> <name>yarn.nodemanager.resource.cpu-vcores</name> <value>2</value> </property> <property> <name>yarn.app.mapreduce.am.resource.mb</name> <value>2048</value> </property> <property> <name>yarn.nodemanager.resource.memory-mb</name> <value>6144</value> </property> <property> <name>yarn.scheduler.maximum-allocation-mb</name> <value>6144</value> </property> <property> <name>yarn.scheduler.minimum-allocation-mb</name> <value>512</value> </property> <property> <name>yarn.nodemanager.vmem-check-enabled</name> <value>false</value> </property> </configuration> -- Francesco Sclano