Re: Giraph program stucks.

Suijian Zhou Fri, 07 Mar 2014 14:45:52 -0800

I tried to set "mapred.tasktracker.map.tasks.maximum" to 1,  then giraph
stucks even for the testing tiny small input graph. Setting it to 2 works,
but processing the big graph still stucks there for 5*2GB input
files(with-Xmx16g  and mapred.job.tracker.handler.count=8 now ):
....
14/03/07 16:42:17 INFO job.JobProgressTracker: Data from 39 workers -
Compute superstep 0: 23570813 out of 33600000 vertices computed; 1067 out
of 1521 partitions computed
14/03/07 16:42:22 INFO job.JobProgressTracker: Data from 39 workers -
Compute superstep 0: 23570813 out of 33600000 vertices computed; 1067 out
of 1521 partitions computed
....



2014-03-07 10:36 GMT-06:00 Claudio Martella <claudio.marte...@gmail.com>:

> then you're mostly likely swapping, reason why your tasks are not
> responding. you're dedicating 6GB of memory to each of the 7 tasks on a
> 16GB machine. You can do the math.
> If you have control over your cluster configuration, I suggest you set max
> 1 (one) mapper per machine, then assign ~ 16GB to each mapper, and use 8
> compute threads.
>
>
> On Fri, Mar 7, 2014 at 5:16 PM, Suijian Zhou <suijian.z...@gmail.com>wrote:
>
>> 7, each node has a datanode and a tasktracker running on it. I attach the
>> full file here:
>>
>> 2014.03.07|10:13:17~/HadoopSetupTest/hadoop-1.2.1/conf>cat
>> mapred-site.xml
>> <?xml version="1.0"?>
>> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
>>
>> <!-- Put site-specific property overrides in this file. -->
>>
>> <configuration>
>>
>> <property>
>>   <name>mapred.job.tracker</name>
>>   <value>compute-1-23:50331</value>
>>   <description>The host and port at which the MapReduce job tracker runs.
>>   If "local", then jobs are run in-process as a single map
>>   and reduce task.
>>   </description>
>> </property>
>>
>> <property>
>>   <name>mapred.job.tracker.http.address</name>
>>   <value>0.0.0.0:50332</value>
>>   <description>The port at which the MapReduce task tracker runs.
>>   </description>
>> </property>
>>
>> <property>
>>   <name>mapred.task.tracker.http.address</name>
>>   <value>0.0.0.0:50333</value>
>>   <description>The port at which the MapReduce task tracker runs.
>>   </description>
>> </property>
>>
>> <property>
>>   <name>mapred.tasktracker.map.tasks.maximum</name>
>>   <value>7</value>
>>   <description>The maximum number of map tasks that will run
>>   simultaneously by a task tracker.
>>   </description>
>> </property>
>>
>> <property>
>>   <name>mapred.tasktracker.reduce.tasks.maximum</name>
>>   <value>7</value>
>>   <description>The maximum number of reduce tasks that will run
>>   simultaneously by a task tracker.
>>   </description>
>> </property>
>>
>> <property>
>>   <name>mapred.jobtracker.taskScheduler</name>
>>   <value>org.apache.hadoop.mapred.FairScheduler</value>
>> </property>
>>
>> <property>
>>   <name>mapred.fairscheduler.poolnameproperty</name>
>>   <value>pool.name</value>
>>   <description>pool name property can be specified in
>> jobconf</description>
>> </property>
>>
>> <property>
>>   <name>mapred.local.dir</name>
>>   <value>${hadoop.tmp.dir}/mapred/local</value>
>>   <description>The local directory where MapReduce stores intermediate
>>   data files.  May be a comma-separated list of
>>   directories on different devices in order to spread disk I/O.
>>   Directories that do not exist are ignored.
>>   </description>
>> </property>
>>
>> <property>
>>   <name>mapred.system.dir</name>
>>   <value>${hadoop.tmp.dir}/system/mapred</value>
>>   <description>The shared directory where MapReduce stores control files.
>>   </description>
>> </property>
>>
>> <property>
>>   <name>mapred.tasktracker.dns.interface</name>
>>   <value>default</value>
>>   <description>The name of the Network Interface from which a task
>>   tracker should report its IP address. (e.g. eth0)
>>   </description>
>> </property>
>>
>> <property>
>>   <name>mapred.child.java.opts</name>
>>   <value>-Xmx3600m -XX:+UseParallelGC -mx1024m -XX:MaxHeapFreeRatio=10
>> -XX:MinHeapFreeRatio=10</value>
>>   <description>Java opts for the task tracker child processes.
>>   The following symbol, if present, will be interpolated: @taskid@ is
>> replaced
>>   by current TaskID. Any other occurrences of '@' will go unchanged.
>>   For example, to enable verbose gc logging to a file named for the
>> taskid in
>>   /tmp and to set the heap maximum to be a gigabyte, pass a 'value' of:
>>         -Xmx1024m -verbose:gc -Xloggc:/tmp/@taskid@.gc
>>   The configuration variable mapred.child.ulimit can be used to control
>> the
>>   maximum virtual memory of the child processes.
>>   </description>
>> </property>
>>
>> <property>
>>     <name>mapred.job.reuse.jvm.num.tasks</name>
>>     <value>-1</value>
>>     <description>How many tasks to run per jvm. If set to -1, there is no
>> limit.</description>
>> </property>
>>
>> <property>
>>     <name>mapred.job.tracker.handler.count</name>
>>     <value>40</value>
>>     <description>The number of server threads for the JobTracker. This
>> should be roughly 4% of the number of tasktracker nodes.</description>
>> </property>
>>
>> <property>
>>     <name>mapred.jobtracker.maxtasks.per.job</name>
>>     <value>-1</value>
>>     <description>The maximum number of tasks for a single job. A value of
>> -1 indicates that there is no maximum.</description>
>> </property>
>>
>> <property>
>>     <name>mapred.tasktracker.expiry.interval</name>
>>     <value>600000</value>
>>     <description>Time to wait get progress report from a task tracker so
>> that jobtracker decides the task is in progress. default is 1000*60*10 i.e.
>> 10 minutes</description>
>> </property>
>>
>> <property>
>>     <name>mapred.task.timeout</name>
>>     <value>0</value>
>>     <description>Time to wait get progress report from a task tracker so
>> that jobtracker decides the task is in progress. default is 1000*60*10 i.e.
>> 10 minutes</description>
>> </property>
>>
>> <property>
>>     <name>mapred.map.tasks.speculative.execution</name>
>>     <value>false</value>
>>     <description>set the speculative execution for map tasks</description>
>> </property>
>>
>> <property>
>>     <name>mapred.reduce.tasks.speculative.execution</name>
>>     <value>false</value>
>>     <description>set the speculative execution for reduce
>> tasks</description>
>> </property>
>>
>> <property>
>>   <name>mapred.hosts.exclude</name>
>>   <value>conf/excludes</value>
>> </property>
>>
>> <property>
>>     <name>mapred.job.tracker.handler.count</name>
>>     <value>40</value>
>> </property>
>>
>> </configuration>
>>
>>
>>
>> 2014-03-07 9:59 GMT-06:00 Claudio Martella <claudio.marte...@gmail.com>:
>>
>> that depends on your cluster configuration. what is the maximum number of
>>> mappers you can have concurrently on each node?
>>>
>>>
>>> On Fri, Mar 7, 2014 at 4:42 PM, Suijian Zhou <suijian.z...@gmail.com>wrote:
>>>
>>>> The current setting is:
>>>>   <name>mapred.child.java.opts</name>
>>>>   <value>-Xmx6144m -XX:+UseParallelGC -mx1024m -XX:MaxHeapFreeRatio=10
>>>> -XX:MinHeapFreeRatio=10</value>
>>>>
>>>> Is 6144MB enough( for each task tracker)? I.e: I have 39 nodes to
>>>> process the 8*2GB input files.
>>>>
>>>>   Best Regards,
>>>>   Suijian
>>>>
>>>>
>>>>
>>>> 2014-03-07 9:21 GMT-06:00 Claudio Martella <claudio.marte...@gmail.com>
>>>> :
>>>>
>>>> this setting won't be used by Giraph (or by any mapreduce application),
>>>>> but by the hadoop infrastructure itself.
>>>>> you should use mapred.child.java.opts instead.
>>>>>
>>>>>
>>>>> On Fri, Mar 7, 2014 at 4:19 PM, Suijian Zhou 
>>>>> <suijian.z...@gmail.com>wrote:
>>>>>
>>>>>> Hi, Claudio,
>>>>>>   I have set the following when ran the program:
>>>>>> export HADOOP_DATANODE_OPTS="-Xmx10g"
>>>>>> and
>>>>>> export HADOOP_HEAPSIZE=30000
>>>>>>
>>>>>> in hadoop-env.sh and restarted hadoop.
>>>>>>
>>>>>>   Best Regards,
>>>>>>   Suijian
>>>>>>
>>>>>>
>>>>>>
>>>>>> 2014-03-06 17:29 GMT-06:00 Claudio Martella <
>>>>>> claudio.marte...@gmail.com>:
>>>>>>
>>>>>> did you actually increase the heap?
>>>>>>>
>>>>>>>
>>>>>>> On Thu, Mar 6, 2014 at 11:43 PM, Suijian Zhou <
>>>>>>> suijian.z...@gmail.com> wrote:
>>>>>>>
>>>>>>>> Hi,
>>>>>>>>   I tried to process only 2 of the input files, i.e, 2GB + 2GB
>>>>>>>> input, the program finished successfully in 6 minutes. But as I have 39
>>>>>>>> nodes, they should be enough to load  and process the 8*2GB=16GB size
>>>>>>>> graph? Can somebody help to give some hints( Will all the nodes 
>>>>>>>> participate
>>>>>>>> in graph loading from HDFS or only master node load the graph?)? 
>>>>>>>> Thanks!
>>>>>>>>
>>>>>>>>   Best Regards,
>>>>>>>>   Suijian
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> 2014-03-06 16:24 GMT-06:00 Suijian Zhou <suijian.z...@gmail.com>:
>>>>>>>>
>>>>>>>> Hi, Experts,
>>>>>>>>>   I'm trying to process a graph by pagerank in giraph, but the
>>>>>>>>> program always stucks there.
>>>>>>>>> There are 8 input files, each one is with size ~2GB and all copied
>>>>>>>>> onto HDFS. I use 39 nodes and each node has 16GB Mem and 8 cores. It 
>>>>>>>>> keeps
>>>>>>>>> printing the same info(as the following) on the screen after 2 hours, 
>>>>>>>>> looks
>>>>>>>>> no progress at all. What are the possible reasons? Testing small 
>>>>>>>>> example
>>>>>>>>> files run without problems. Thanks!
>>>>>>>>>
>>>>>>>>> 14/03/06 16:17:42 INFO job.JobProgressTracker: Data from 39
>>>>>>>>> workers - Compute superstep 0: 5854829 out of 49200000 vertices 
>>>>>>>>> computed;
>>>>>>>>> 181 out of 1521 partitions computed
>>>>>>>>> 14/03/06 16:17:47 INFO job.JobProgressTracker: Data from 39
>>>>>>>>> workers - Compute superstep 0: 5854829 out of 49200000 vertices 
>>>>>>>>> computed;
>>>>>>>>> 181 out of 1521 partitions computed
>>>>>>>>>
>>>>>>>>>   Best Regards,
>>>>>>>>>   Suijian
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>>    Claudio Martella
>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>>    Claudio Martella
>>>>>
>>>>>
>>>>
>>>>
>>>
>>>
>>> --
>>>    Claudio Martella
>>>
>>>
>>
>>
>
>
> --
>    Claudio Martella
>
>

Re: Giraph program stucks.

Reply via email to