Ok thanks Avery, I'll try it. I'm not sure I know how I would do that on a
running AWS EMR instance, but I can do it on a local stand-alone Hadoop
running a smaller version of the job and see if anything jumps out.


On Wed, Aug 28, 2013 at 4:57 PM, Avery Ching <ach...@apache.org> wrote:

> Try dumping a histogram of memory usage from a running JVM and see where
> the memory is going.  I can't think of anything in particular that
> changed...
>
>
> On 8/28/13 4:39 PM, Jeff Peters wrote:
>
>>
>> I am tasked with updating our ancient (circa 7/10/2012) Giraph to
>> giraph-release-1.0.0-RC3. Most jobs run fine but our largest job now runs
>> out of memory using the same AWS elastic-mapreduce configuration we have
>> always used. I have never tried to configure either Giraph or the AWS
>> Hadoop. We build for Hadoop 1.0.2 because that's closest to the 1.0.3 AWS
>> provides us. The 8 X m2.4xlarge cluster we use seems to provide 8*14=112
>> map tasks fitted out with 2GB heap each. Our code is completely unchanged
>> except as required to adapt to the new Giraph APIs. Our vertex, edge, and
>> message data are completely unchanged. On smaller jobs, that work, the
>> aggregate heap usage high-water mark seems about the same as before, but
>> the "committed heap" seems to run higher. I can't even make it work on a
>> cluster of 12. In that case I get one map task that seems to end up with
>> nearly twice as many messages as most of the others so it runs out of
>> memory anyway. It only takes one to fail the job. Am I missing something
>> here? Should I be configuring my new Giraph in some way I didn't used to
>> need to with the old one?
>>
>>
>

Reply via email to