Hi everyone,

I'm working on a community detection algorithm for giraph and I'm trying to
execute the algorithm on the Friendster graph, which has about 65M nodes
and about 1.8 billion edges. Running on 16 machines, before doing ANY
processing, it's taking about 50G of RAM. That's 800G total for this graph,
which seems excessive. I'm using the 1.1.0 stable release and Hadoop
0.20.203 (should I use a newer version of hadoop?).  This is the command
I'm running:

$HADOOP_HOME/bin/hadoop --config $CONF jar $GIR_JAR
org.apache.giraph.GiraphRunner -D 'mapred.child.java.opts=-Xms80G -Xmx80G'
-D 'mapred.tasktracker.map.tasks.maximum=1' -D-Xmx60000m -libjars $LIBJARS
computation.StartComputation $GIRAPH_OPTIONS -eif
org.apache.giraph.io.formats.IntNullReverseTextEdgeInputFormat -eip $INPUT
-vof org.apache.giraph.io.formats.IdWithValueTextOutputFormat -op $OUTPUT
-w $N_WORKERS -mc computation.WccMasterCompute

where N_THREADS=12 (number of available threads on the machine),
N_PARTITIONS=3*N_THREADS*N_WORKERS, and

GIRAPH_OPTIONS="-ca giraph.useSuperstepCounters=false
-ca giraph.numComputeThreads=$N_THREADS
-ca giraph.numInputThreads=$N_THREADS
-ca giraph.numOutputThreads=$N_THREADS
-ca giraph.oneToAllMsgSending=true
-ca giraph.metrics.enable=true
-ca giraph.maxPartitionsInMemory=$N_THREADS
-ca giraph.userPartitionCount=$N_PARTITIONS
-ca giraph.outEdgesClass=utils.IntNullHashSetEdges"

IntNullHashSetEdges takes a bit more memory than IntNullArrayEdges I know
but it doesn't make that big of a difference .

Each vertex contains 7 ints, a double, three arrays and a map, but all of
these are empty when the graph is loaded and it still takes that much
memory.

I feel like I must be doing something wrong, or missing a configuration
option, or something.    Thanks in advance for any help you might be able
to offer.

Best,
Matthew

Reply via email to