that depends on your cluster configuration. what is the maximum number of mappers you can have concurrently on each node?
On Fri, Mar 7, 2014 at 4:42 PM, Suijian Zhou <suijian.z...@gmail.com> wrote: > The current setting is: > <name>mapred.child.java.opts</name> > <value>-Xmx6144m -XX:+UseParallelGC -mx1024m -XX:MaxHeapFreeRatio=10 > -XX:MinHeapFreeRatio=10</value> > > Is 6144MB enough( for each task tracker)? I.e: I have 39 nodes to process > the 8*2GB input files. > > Best Regards, > Suijian > > > > 2014-03-07 9:21 GMT-06:00 Claudio Martella <claudio.marte...@gmail.com>: > > this setting won't be used by Giraph (or by any mapreduce application), >> but by the hadoop infrastructure itself. >> you should use mapred.child.java.opts instead. >> >> >> On Fri, Mar 7, 2014 at 4:19 PM, Suijian Zhou <suijian.z...@gmail.com>wrote: >> >>> Hi, Claudio, >>> I have set the following when ran the program: >>> export HADOOP_DATANODE_OPTS="-Xmx10g" >>> and >>> export HADOOP_HEAPSIZE=30000 >>> >>> in hadoop-env.sh and restarted hadoop. >>> >>> Best Regards, >>> Suijian >>> >>> >>> >>> 2014-03-06 17:29 GMT-06:00 Claudio Martella <claudio.marte...@gmail.com> >>> : >>> >>> did you actually increase the heap? >>>> >>>> >>>> On Thu, Mar 6, 2014 at 11:43 PM, Suijian Zhou >>>> <suijian.z...@gmail.com>wrote: >>>> >>>>> Hi, >>>>> I tried to process only 2 of the input files, i.e, 2GB + 2GB input, >>>>> the program finished successfully in 6 minutes. But as I have 39 nodes, >>>>> they should be enough to load and process the 8*2GB=16GB size graph? Can >>>>> somebody help to give some hints( Will all the nodes participate in graph >>>>> loading from HDFS or only master node load the graph?)? Thanks! >>>>> >>>>> Best Regards, >>>>> Suijian >>>>> >>>>> >>>>> >>>>> 2014-03-06 16:24 GMT-06:00 Suijian Zhou <suijian.z...@gmail.com>: >>>>> >>>>> Hi, Experts, >>>>>> I'm trying to process a graph by pagerank in giraph, but the >>>>>> program always stucks there. >>>>>> There are 8 input files, each one is with size ~2GB and all copied >>>>>> onto HDFS. I use 39 nodes and each node has 16GB Mem and 8 cores. It >>>>>> keeps >>>>>> printing the same info(as the following) on the screen after 2 hours, >>>>>> looks >>>>>> no progress at all. What are the possible reasons? Testing small example >>>>>> files run without problems. Thanks! >>>>>> >>>>>> 14/03/06 16:17:42 INFO job.JobProgressTracker: Data from 39 workers - >>>>>> Compute superstep 0: 5854829 out of 49200000 vertices computed; 181 out >>>>>> of >>>>>> 1521 partitions computed >>>>>> 14/03/06 16:17:47 INFO job.JobProgressTracker: Data from 39 workers - >>>>>> Compute superstep 0: 5854829 out of 49200000 vertices computed; 181 out >>>>>> of >>>>>> 1521 partitions computed >>>>>> >>>>>> Best Regards, >>>>>> Suijian >>>>>> >>>>>> >>>>> >>>> >>>> >>>> -- >>>> Claudio Martella >>>> >>>> >>> >>> >> >> >> -- >> Claudio Martella >> >> > > -- Claudio Martella