As I said, failures on specific supersteps *might* happen, but its not necessary.
Did you run the minimum spanning tree job again? Did it finish successfully? On a different note, what do you mean by "submitted a job of 90 supersteps"? I don't think you can specify the number of supersteps-- that number is determined by the total number of iterations required before all vertices vote to halt. That's not something you can specify.. On Thu, Aug 23, 2012 at 7:58 AM, Amani Alonazi <amani.alon...@kaust.edu.sa>wrote: > Thank you Vishal. > > But I submitted a PageRank job of 90 supersteps, 20 workers, 4,000,000 > vertices and 30 edges per vertex. The job completed successfully. I'm > really confused. > > On Wed, Aug 22, 2012 at 7:33 PM, Vishal Patel <write2vis...@gmail.com>wrote: > >> After several supersteps, sometimes a worker thread dies (say it ran out >> of memory). Zookeeper waits for ~5 mins (600 seconds) and then decides that >> the worker is not responsive and fails the entire job. At this point if you >> have a checkpoint saved it will resume from there otherwise you have to >> start from scratch. >> >> If you run the job again it should successfully finish (or it might error >> at some other superstep / worker combination). >> >> Vishal >> >> >> >> On Tue, Aug 21, 2012 at 10:12 PM, Amani Alonazi < >> amani.alon...@kaust.edu.sa> wrote: >> >>> Hi all, >>> >>> I'm running a minimum spanning tree compute function on Hadoop cluster >>> (20 machines). After certain supersteps (e.g. superstep 47 for a graph of >>> 4,194,304 vertices and 181,566,970 edges), the execution time increased >>> dramatically. This is not the only problem, the job has been killed "Task >>> attempt_* failed to report status for 601 seconds. Killing! " >>> >>> I disabled the checkpoint feature by setting the >>> "CHECKPOINT_FREQUENCY_DEFAULT = 0" in GiraphJob.java. I don't need to write >>> any data to disk neither snapshots nor output. I tested the algorithm on >>> sample graph of 7 vertices and it works well. >>> >>> Is there any way to profile or debug Giraph job? >>> In the Giraph Stats the "Aggregate finished vertices" counter is it for >>> the vertices which voted to halt? Also the "sent messages" counter, is it >>> per each superstep or the total msgs? >>> If a vertex vote to halt, will it be activated upon receiving messages? >>> >>> Thanks a lot! >>> >>> Best, >>> Amani AlOnazi >>> MSc Computer Science >>> King Abdullah University of Science and Technology >>> Kingdom of Saudi Arabia >>> >>> >>> ------------------------------ >>> This message and its contents, including attachments are intended solely >>> for the original recipient. If you are not the intended recipient or have >>> received this message in error, please notify me immediately and delete >>> this message from your computer system. Any unauthorized use or >>> distribution is prohibited. Please consider the environment before printing >>> this email. >> >> >> > > > -- > Amani AlOnazi > MSc Computer Science > King Abdullah University of Science and Technology > Kingdom of Saudi Arabia > amani.alon...@kaust.edu.sa | +966 (0) 555 191 795 > > > ------------------------------ > This message and its contents, including attachments are intended solely > for the original recipient. If you are not the intended recipient or have > received this message in error, please notify me immediately and delete > this message from your computer system. Any unauthorized use or > distribution is prohibited. Please consider the environment before printing > this email. >