Re: Out of memory with giraph-release-1.0.0-RC3, used to work on old Giraph

Lukas Nalezenec Wed, 04 Sep 2013 03:34:53 -0700


Thanks,
I was not sure if it really works as I described.

> Facebook can't be using it like this if, as described, they havebillions of vertices and a trillion edges.

Yes, its strange. I guess configuration does not help so much on largecluster. What might help are properties of input data.

> So do you, or Avery, have any idea how you might initialize this is amore reasonable way, and how???

Fast workaround is to set number of partitions to from W^2 to W or 2*W. It will help if you dont have very large number of workers.

I would not change MAX_*_REQUEST_SIZE much since it may hurt performance.
You can do some preprocessing before loading data to Giraph.



How to change Giraph:

The caches could be flushed if total sum of vertexes/edges in all cachesexceeds some number. Ideally, it should prevent not only OutOfMemoryerrors but also raising high water mark. Not sure if it (preventingraising HWM) is easy to do.I am going to use almost-prebuild partitions. For my use case it wouldbe ideal to detect if some cache is abandoned and i would not be usedanymore. It would cut memory usage in caches from ~O(n^3) to ~O(n). Itcould be done by counting number of cache flushes or cache insertionsand if some cache was not touched for long time it would be flushed.

There could be separated configuration MAX_*_REQUEST_SIZE for perpartition caches during loading data.

I guess there should be simple but efficient way how to trace memoryhigh-water mark. It could look like:


Loading data: Memory high-water mark: start: 100 Gb end: 300 Gb
Iteration 1 Computation: Memory high-water mark: start: 300 Gb end: 300 Gb
Iteration 1 XYZ ....
Iteration 2 Computation: Memory high-water mark: start: 300 Gb end: 300 Gb
.
.
.

Lukas





On 09/04/13 01:12, Jeff Peters wrote:

Thank you Lukas!!! That's EXACTLY the kind of model I was building inmy head over the weekend about why this might be happening, and whyincreasing the number of AWS instances (and workers) does not solvethe problem without increasing each worker's VM. Surely Facebook can'tbe using it like this if, as described, they have billions of verticesand a trillion edges. So do you, or Avery, have any idea how you mightinitialize this is a more reasonable way, and how???

On Mon, Sep 2, 2013 at 6:08 AM, Lukas Nalezenec<lukas.naleze...@firma.seznam.cz<mailto:lukas.naleze...@firma.seznam.cz>> wrote:


    Hi

    I wasted few days on similar problem.

    I guess the problem was that during loading - if you have got W
    workers and W^2 partitions there are W^2 partition caches in each
    worker.
    Each cache can hold 10 000 vertexes by default.
    I had 26 000 000 vertexes, 60 workers -> 3600 partitions. It means
    that there can be up to 36 000 000 vertexes in caches in each
    worker if input files are random.
    Workers were assigned 450 000 vertexes but failed when they had
    900 000 vertexes in memory.

    Btw: Why default number of partitions is W^2 ?

    (I can be wrong)
    Lukas



    On 08/31/13 01:54, Avery Ching wrote:

    Ah, the new caches. =)  These make things a lot faster (bulk data
    sending), but do take up some additional memory.  if you look at
    GiraphConstants, you can find ways to change the cache sizes
    (this will reduce that memory usage).
    For example, MAX_EDGE_REQUEST_SIZE will affect the size of the
    edge cache.  MAX_MSG_REQUEST_SIZE will affect the size of the
    message cache.  The caches are per worker, so 100 workers would
    require 50 MB  per worker by default.  Feel free to trim it if
    you like.

    The byte arrays for the edges are the most efficient storage
    possible (although not as performance as the native edge stores).

    Hope that helps,

    Avery

    On 8/29/13 4:53 PM, Jeff Peters wrote:

    Avery, it would seem that optimizations to Giraph have,
    unfortunately, turned the majority of the heap into "dark
    matter". The two snapshots are at unknown points in a superstep
    but I waited for several supersteps so that the activity had
    more or less stabilized. About the only thing comparable between
    the two snapshots are the vertexes, 192561 X "RecsVertex" in the
    new version and 191995 X "Coloring" in the old system. But with
    the new Giraph 672710176 out of 824886184 bytes are stored as
    primitive byte arrays. That's probably indicative of some very
    fine performance optimization work, but it makes it extremely
    difficult to know what's really out there, and why. I did notice
    that a number of caches have appeared that did not exist before,
    namely SendEdgeCache, SendPartitionCache, SendMessageCache
    and SendMutationsCache.

    Could any of those account for a larger per-worker footprint in
    a modern Giraph? Should I simply assume that I need to force AWS
    to configure its EMR Hadoop so that each instance has fewer map
    tasks but with a somewhat larger VM max, say 3GB instead of 2GB?


    On Wed, Aug 28, 2013 at 4:57 PM, Avery Ching <ach...@apache.org
    <mailto:ach...@apache.org>> wrote:

        Try dumping a histogram of memory usage from a running JVM
        and see where the memory is going.  I can't think of
        anything in particular that changed...


        On 8/28/13 4:39 PM, Jeff Peters wrote:


            I am tasked with updating our ancient (circa 7/10/2012)
            Giraph to giraph-release-1.0.0-RC3. Most jobs run fine
            but our largest job now runs out of memory using the
            same AWS elastic-mapreduce configuration we have always
            used. I have never tried to configure either Giraph or
            the AWS Hadoop. We build for Hadoop 1.0.2 because that's
            closest to the 1.0.3 AWS provides us. The 8 X m2.4xlarge
            cluster we use seems to provide 8*14=112 map tasks
            fitted out with 2GB heap each. Our code is completely
            unchanged except as required to adapt to the new Giraph
            APIs. Our vertex, edge, and message data are completely
            unchanged. On smaller jobs, that work, the aggregate
            heap usage high-water mark seems about the same as
            before, but the "committed heap" seems to run higher. I
            can't even make it work on a cluster of 12. In that case
            I get one map task that seems to end up with nearly
            twice as many messages as most of the others so it runs
            out of memory anyway. It only takes one to fail the job.
            Am I missing something here? Should I be configuring my
            new Giraph in some way I didn't used to need to with the
            old one?

Re: Out of memory with giraph-release-1.0.0-RC3, used to work on old Giraph

Reply via email to