I am tasked with updating our ancient (circa 7/10/2012) Giraph to
giraph-release-1.0.0-RC3. Most jobs run fine but our largest job now runs
out of memory using the same AWS elastic-mapreduce configuration we have
always used. I have never tried to configure either Giraph or the AWS
Hadoop. We build for Hadoop 1.0.2 because that's closest to the 1.0.3 AWS
provides us. The 8 X m2.4xlarge cluster we use seems to provide 8*14=112
map tasks fitted out with 2GB heap each. Our code is completely unchanged
except as required to adapt to the new Giraph APIs. Our vertex, edge, and
message data are completely unchanged. On smaller jobs, that work, the
aggregate heap usage high-water mark seems about the same as before, but
the "committed heap" seems to run higher. I can't even make it work on a
cluster of 12. In that case I get one map task that seems to end up with
nearly twice as many messages as most of the others so it runs out of
memory anyway. It only takes one to fail the job. Am I missing something
here? Should I be configuring my new Giraph in some way I didn't used to
need to with the old one?

Reply via email to