I checked several other slave machines.
Basically the map task is waiting on this trace:

"main" prio=10 tid=0x00000000098ed000 nid=0x7beb in Object.wait()
   java.lang.Thread.State: TIMED_WAITING (on object monitor)
        at java.lang.Object.wait(Native Method)
        - waiting on <0x0000000400108530> (a
        - locked <0x0000000400108530> (a
        at org.apache.giraph.graph.GraphMapper.run(GraphMapper.java:92)
        at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:763)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:369)
        at org.apache.hadoop.mapred.Child$4.run(Child.java:259)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:415)
        at org.apache.hadoop.mapred.Child.main(Child.java:253)

Is it because I miss some setting?


On Thu, Sep 26, 2013 at 3:16 PM, Yingyi Bu <buyin...@gmail.com> wrote:

> I have 61 slave machines. Each slave machine has 16GB memory and 4 cores.
> I tried two configurations:
> 1.   Let mapred.map.child.java.opts to be -Xmx4g, and run the job with 4
> workers per machine on average (-w 240, try to use all the cores).
> 2.   Let mapred.map.child.java.opts to be -Xmx16g, and run the job with 1
> worker per machine on average (-w 60).
> I used the combiner.
> Here are the behaviors of the two configurations:
> 1. Configuration 1 fails with OutOfMemoryError--GC limit exceeds during
> superstep -1.
> 2. Configuration 2 can finish superstep -1 but hang at superstep 0 for a
> long time (more than 40 minutes).  The status of each map task is
> "startSuperstep: WORKER_ONLY - Attempt=0, Superstep=0".  I checked several
> slave machines -- the CPU is not used.  Attached is the dumped stacktrace.
> Does any one have experience with similar situations?
> Another question is: how can I effectively use all the cores in slave
> machines?   Does each worker do multi-threading?
> Thanks a lot!
> Yingyi
> On Thu, Sep 26, 2013 at 1:08 PM, Avery Ching <ach...@apache.org> wrote:
>>  Hopefully you are using combiners and also re-using objects.  This can
>> keep memory usage much lower.  Also implementing your own OutEdges can make
>> it much more efficient.
>> How much memory do you have?
>> Avery
>> On 9/26/13 12:51 PM, Yingyi Bu wrote:
>> >> I think you may have added the same vertex 2x?
>> I ran the job over roughly half of the graph and saw this.  However the
>> input is not a connected components such that there might be target vertex
>> ids which do not exist.
>> When I ran the job over the entire graph,  I cannot see this but the job
>> fails with exceeding GC limit (trying out-of-core now).
>>  Yingyi
>> On Thu, Sep 26, 2013 at 12:05 PM, Avery Ching <ach...@apache.org> wrote:
>>>  I think you may have added the same vertex 2x?  That being said, I
>>> don't see why the code is this way.  It should be fine.  We should file a
>>> JIRA.
>>> On 9/26/13 11:02 AM, Yingyi Bu wrote:
>>>  Thanks, Lukas!
>>>  I think the reason of this exception is that I run the job over part of
>>> the graph where some target ids do not exist.
>>>  Yingyi
>>> On Thu, Sep 26, 2013 at 1:13 AM, Lukas Nalezenec <
>>> lukas.naleze...@firma.seznam.cz> wrote:
>>>>  Hi,
>>>> Do you use partition balancing ?
>>>>  Lukas
>>>> On 09/26/13 05:16, Yingyi Bu wrote:
>>>>  Hi,
>>>> I got this exception when I ran a Giraph-1.0.0 PageRank job over a 60 
>>>> machine cluster with 28GB input data.  But I got this exception:
>>>> java.lang.IllegalStateException: run: Caught an unrecoverable exception 
>>>> resolveMutations: Already has missing vertex on this worker for 20464109
>>>>    at org.apache.giraph.graph.GraphMapper.run(GraphMapper.java:102)
>>>>    at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:763)
>>>>    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:369)
>>>>    at org.apache.hadoop.mapred.Child$4.run(Child.java:259)
>>>>    at java.security.AccessController.doPrivileged(Native Method)
>>>>    at javax.security.auth.Subject.doAs(Subject.java:415)
>>>>    at 
>>>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1059)
>>>>    at org.apache.hadoop.mapred.Child.main(Child.java:253)
>>>> Caused by: java.lang.IllegalStateException: resolveMutations: Already has 
>>>> missing vertex on this worker for 20464109
>>>>    at 
>>>> org.apache.giraph.comm.netty.NettyWorkerServer.resolveMutations(NettyWorkerServer.java:184)
>>>>    at 
>>>> org.apache.giraph.comm.netty.NettyWorkerServer.prepareSuperstep(NettyWorkerServer.java:152)
>>>>    at 
>>>> org.apache.giraph.worker.BspServiceWorker.startSuperstep(BspServiceWorker.java:677)
>>>>    at 
>>>> org.apache.giraph.graph.GraphTaskManager.execute(GraphTaskManager.java:249)
>>>>    at org.apache.giraph.graph.GraphMapper.run(GraphMapper.java:92)
>>>>    ... 7 more
>>>> Does anyone know what is the possible cause of this exception?
>>>> Thanks!
>>>> Yingyi

Reply via email to