7, each node has a datanode and a tasktracker running on it. I attach the
full file here:

2014.03.07|10:13:17~/HadoopSetupTest/hadoop-1.2.1/conf>cat mapred-site.xml
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<!-- Put site-specific property overrides in this file. -->


  <description>The host and port at which the MapReduce job tracker runs.
  If "local", then jobs are run in-process as a single map
  and reduce task.

  <description>The port at which the MapReduce task tracker runs.

  <description>The port at which the MapReduce task tracker runs.

  <description>The maximum number of map tasks that will run
  simultaneously by a task tracker.

  <description>The maximum number of reduce tasks that will run
  simultaneously by a task tracker.


  <description>pool name property can be specified in jobconf</description>

  <description>The local directory where MapReduce stores intermediate
  data files.  May be a comma-separated list of
  directories on different devices in order to spread disk I/O.
  Directories that do not exist are ignored.

  <description>The shared directory where MapReduce stores control files.

  <description>The name of the Network Interface from which a task
  tracker should report its IP address. (e.g. eth0)

  <value>-Xmx3600m -XX:+UseParallelGC -mx1024m -XX:MaxHeapFreeRatio=10
  <description>Java opts for the task tracker child processes.
  The following symbol, if present, will be interpolated: @taskid@ is
  by current TaskID. Any other occurrences of '@' will go unchanged.
  For example, to enable verbose gc logging to a file named for the taskid
  /tmp and to set the heap maximum to be a gigabyte, pass a 'value' of:
        -Xmx1024m -verbose:gc -Xloggc:/tmp/@taskid@.gc
  The configuration variable mapred.child.ulimit can be used to control the
  maximum virtual memory of the child processes.

    <description>How many tasks to run per jvm. If set to -1, there is no

    <description>The number of server threads for the JobTracker. This
should be roughly 4% of the number of tasktracker nodes.</description>

    <description>The maximum number of tasks for a single job. A value of
-1 indicates that there is no maximum.</description>

    <description>Time to wait get progress report from a task tracker so
that jobtracker decides the task is in progress. default is 1000*60*10 i.e.
10 minutes</description>

    <description>Time to wait get progress report from a task tracker so
that jobtracker decides the task is in progress. default is 1000*60*10 i.e.
10 minutes</description>

    <description>set the speculative execution for map tasks</description>

    <description>set the speculative execution for reduce




2014-03-07 9:59 GMT-06:00 Claudio Martella <claudio.marte...@gmail.com>:

> that depends on your cluster configuration. what is the maximum number of
> mappers you can have concurrently on each node?
> On Fri, Mar 7, 2014 at 4:42 PM, Suijian Zhou <suijian.z...@gmail.com>wrote:
>> The current setting is:
>>   <name>mapred.child.java.opts</name>
>>   <value>-Xmx6144m -XX:+UseParallelGC -mx1024m -XX:MaxHeapFreeRatio=10
>> -XX:MinHeapFreeRatio=10</value>
>> Is 6144MB enough( for each task tracker)? I.e: I have 39 nodes to process
>> the 8*2GB input files.
>>   Best Regards,
>>   Suijian
>> 2014-03-07 9:21 GMT-06:00 Claudio Martella <claudio.marte...@gmail.com>:
>> this setting won't be used by Giraph (or by any mapreduce application),
>>> but by the hadoop infrastructure itself.
>>> you should use mapred.child.java.opts instead.
>>> On Fri, Mar 7, 2014 at 4:19 PM, Suijian Zhou <suijian.z...@gmail.com>wrote:
>>>> Hi, Claudio,
>>>>   I have set the following when ran the program:
>>>> export HADOOP_DATANODE_OPTS="-Xmx10g"
>>>> and
>>>> export HADOOP_HEAPSIZE=30000
>>>> in hadoop-env.sh and restarted hadoop.
>>>>   Best Regards,
>>>>   Suijian
>>>> 2014-03-06 17:29 GMT-06:00 Claudio Martella <claudio.marte...@gmail.com
>>>> >:
>>>> did you actually increase the heap?
>>>>> On Thu, Mar 6, 2014 at 11:43 PM, Suijian Zhou 
>>>>> <suijian.z...@gmail.com>wrote:
>>>>>> Hi,
>>>>>>   I tried to process only 2 of the input files, i.e, 2GB + 2GB input,
>>>>>> the program finished successfully in 6 minutes. But as I have 39 nodes,
>>>>>> they should be enough to load  and process the 8*2GB=16GB size graph? Can
>>>>>> somebody help to give some hints( Will all the nodes participate in graph
>>>>>> loading from HDFS or only master node load the graph?)? Thanks!
>>>>>>   Best Regards,
>>>>>>   Suijian
>>>>>> 2014-03-06 16:24 GMT-06:00 Suijian Zhou <suijian.z...@gmail.com>:
>>>>>> Hi, Experts,
>>>>>>>   I'm trying to process a graph by pagerank in giraph, but the
>>>>>>> program always stucks there.
>>>>>>> There are 8 input files, each one is with size ~2GB and all copied
>>>>>>> onto HDFS. I use 39 nodes and each node has 16GB Mem and 8 cores. It 
>>>>>>> keeps
>>>>>>> printing the same info(as the following) on the screen after 2 hours, 
>>>>>>> looks
>>>>>>> no progress at all. What are the possible reasons? Testing small example
>>>>>>> files run without problems. Thanks!
>>>>>>> 14/03/06 16:17:42 INFO job.JobProgressTracker: Data from 39 workers
>>>>>>> - Compute superstep 0: 5854829 out of 49200000 vertices computed; 181 
>>>>>>> out
>>>>>>> of 1521 partitions computed
>>>>>>> 14/03/06 16:17:47 INFO job.JobProgressTracker: Data from 39 workers
>>>>>>> - Compute superstep 0: 5854829 out of 49200000 vertices computed; 181 
>>>>>>> out
>>>>>>> of 1521 partitions computed
>>>>>>>   Best Regards,
>>>>>>>   Suijian
>>>>> --
>>>>>    Claudio Martella
>>> --
>>>    Claudio Martella
> --
>    Claudio Martella

Reply via email to