Nate,

Great, glad to hear it works! We resend open requests after 10 minutes, so 
that's why you were seeing supersteps taking that long.

Have fun with Giraph and let us know if you have any other questions.

Maja

From: Nate <touring_...@msn.com<mailto:touring_...@msn.com>>
Reply-To: "user@giraph.apache.org<mailto:user@giraph.apache.org>" 
<user@giraph.apache.org<mailto:user@giraph.apache.org>>
Date: Thursday, February 21, 2013 1:32 PM
To: "user@giraph.apache.org<mailto:user@giraph.apache.org>" 
<user@giraph.apache.org<mailto:user@giraph.apache.org>>
Subject: RE: Waiting for times required to be 19 (currently 18)

Maja,

Success!

I did check and see that the giraph jar being used was dated 6-Feb, but many 
hours before your fix made it into the source tree.  I probably forgot to put 
the new jar that I made earlier this week into the right place.  How 
frustrating.

I recompiled the very latest code, put the jar into the right place and have 
been able to execute the giraph job multiple times successfully.  It even 
executes much faster than before, and the time to execute is reliable too.  
Time to execute used to vary between 10 and 20 minutes when Giraph was able to 
complete, but now takes between 70 to 80 seconds every time without any 
problems.

Many thanks for fixing the original issue, and for replying to my email to the 
list.

Nate

________________________________
From: majakabi...@fb.com<mailto:majakabi...@fb.com>
To: user@giraph.apache.org<mailto:user@giraph.apache.org>
Subject: Re: Waiting for times required to be 19 (currently 18)
Date: Thu, 21 Feb 2013 20:04:53 +0000

Nate,

Are all the workers waiting for request from the same worker? (in the log 
"waitSomeRequests: Waiting for request" destTask is what you should look at) If 
so, check if there is some exception on that worker. You can also try 
decreasing giraph.maxRequestMilliseconds and see what happens after the request 
gets resent. Please let us know what you find out!

Maja

From: Nate <touring_...@msn.com<mailto:touring_...@msn.com>>
Reply-To: "user@giraph.apache.org<mailto:user@giraph.apache.org>" 
<user@giraph.apache.org<mailto:user@giraph.apache.org>>
Date: Thursday, February 21, 2013 11:16 AM
To: "user@giraph.apache.org<mailto:user@giraph.apache.org>" 
<user@giraph.apache.org<mailto:user@giraph.apache.org>>
Subject: RE: Waiting for times required to be 19 (currently 18)

Hello Maja,

Thank you for your reply and link to the issue.
I last updated the code this week, and do infact have that issue checked-out in 
my local copy of the source.  My compiled jar file of giraph-core is dated Feb 
18th (three days ago).

I will do another update from Git very soon and build and test again to be sure 
that the fix is in place and report back if the behavior changes.

Thank you,
Nate

________________________________
From: majakabi...@fb.com<mailto:majakabi...@fb.com>
To: user@giraph.apache.org<mailto:user@giraph.apache.org>
Subject: Re: Waiting for times required to be 19 (currently 18)
Date: Thu, 21 Feb 2013 17:48:24 +0000

Hi Nate,

When did you take the new Giraph code? Please check if you have GIRAPH-506 
patch in, if not that's probably the reason for the issue.

Maja

From: Nate <touring_...@msn.com<mailto:touring_...@msn.com>>
Reply-To: "user@giraph.apache.org<mailto:user@giraph.apache.org>" 
<user@giraph.apache.org<mailto:user@giraph.apache.org>>
Date: Thursday, February 21, 2013 8:06 AM
To: "user@giraph.apache.org<mailto:user@giraph.apache.org>" 
<user@giraph.apache.org<mailto:user@giraph.apache.org>>
Subject: Waiting for times required to be 19 (currently 18)

I recently upgraded older Giraph code built against CDH3 to a git checkout from 
a few days ago that builds against CDH4.1.0 (MRv1) libraries.  All of the 
Giraph tests pass.

When running my Giraph job with 20 workers, I usually get the above error in in 
19 map processes:

org.apache.giraph.utils.ExpectedBarrier: waitForRequiredPermits: Waiting for 
times required to be 19 (currently 18)

One map worker always shows something like:

org.apache.giraph.comm.netty.NettyClient: waitSomeRequests: Waiting interval of 
15000 msecs, 1 open requests, waiting for it to be <= 0,and some metrics ....
org.apache.giraph.comm.netty.NettyClient: waitSomeRequests: Waiting for request 
(destTask=17, reqId=5032) - (reqId=5326,destAddr=host1:30017,elapsedNanos=..., 
started=..., writeDone=true, writeSuccess=true)
repeats...

I say this happens usually because the same giraph job does complete but only 
rarely.  I have a timeout of 100 minutes set, and the job is killed after that 
much time has elapsed.

Also, the started field in the above output in this past run reads: "Wed Jan 21 
14:21:31 EST 1970"  All machines are synchronized by a single time server and 
currently read accurate times.  I don't think it affected the execution, but it 
still seems erroneous.

I also don't see Hadoop maps having status messages set on them.  I see the 
GraphMapper giving the Context object to the GraphTaskManager instance, and I 
can see it calling "context.setStatus(...)" but those messages never show up in 
the map status column in the job tracker page.

Is there something I've missed while upgrading the old code?

Reply via email to