I finally make a bash script that waits an amount of time before killing any app :(
-- *José Luis Larroque* Analista Programador Universitario - Facultad de Informática - UNLP Desarrollador Java en LIFIA 2017-01-26 23:32 GMT-03:00 José Luis Larroque <[email protected]>: > I believe that i found something related to this issue. > > The behavior when maxAllowedJobTimeMilliseconds is set is strongly > related to giraph.trackJobProgressOnClient option, which is set on *false* > by default. > > For stoping the job when the time reach to the maxAllowedJobTimeMilliseco > nds value, the method mapperStarted() of JobProgressTrackerService should > be executed. > > In Giraph 1.1, the giraph.trackJobProgressOnClient configuration option > is in false by default. When this happens, a JobProgressTrackerClientNoOp > is created for tracking progress on client. This class have the > mapperStarted() method implemented, but with an *empty* body, this means > that nothing is done, and this means that the thread that should be created > for killing the job in a maximum amount of time is not created at all, and > that's why i'm not seeing LOG information related to this option on logs. > > I try to run a job with the giraph.trackJobProgressOnClient set in true, > but when i did this all my containers get this exception: > java.lang.NoClassDefFoundError: org/apache/thrift/transport/TTransport > > Apparently, when i put the giraph.trackJobProgressOnClient on true, > a RetryableJobProgressTrackerClient client is created instead of > JobProgressTrackerClientNoOp, and RetryableJobProgressTrackerClient uses > classes fhat i don't have available on my classpath like > org/apache/thrift/transport/TTransport. Should i start to download jars > dependencies until the NoClassDeffFoundError is solved, or there is a > better workaround for this problem? > > Any help will be greatly appreciated. > > bye! > José > > > > > -- > *José Luis Larroque* > Analista Programador Universitario - Facultad de Informática - UNLP > Desarrollador Java en LIFIA > > 2017-01-25 22:51 GMT-03:00 José Luis Larroque <[email protected]>: > >> Sorry, i forgot to attach the log files, here they are: >> >> >> -- >> *José Luis Larroque* >> Analista Programador Universitario - Facultad de Informática - UNLP >> Desarrollador Java en LIFIA >> >> 2017-01-25 22:50 GMT-03:00 José Luis Larroque <[email protected]>: >> >>> Hi Sergey, thanks for your answer and sorry for my delay. >>> >>> I'm using Hadoop 2.4.0 and Giraph 1.1. In this version of Giraph, i'm >>> using this one i believe: >>> https://github.com/apache/giraph/blob/release-1.1/giraph-cor >>> e/src/main/java/org/apache/giraph/job/JobProgressTrackerServ >>> ice.java#L136 >>> >>> I'm using this job parameters: >>> -w 4 -yh 5700 -ca giraph.metrics.enable=true,gir >>> aph.useOutOfCoreMessages=true,giraph.isStaticGraph=true,gira >>> ph.maxAllowedJobTimeMilliseconds=10000 >>> >>> I'm using a cluster of 1 master and 4 slaves in AWS. >>> >>> I send attached logs from three different containers. I have a superstep >>> that took 12 seconds and the entire Giraph application doesn't get stopped. >>> >>> Thanks in advance! >>> >>> -- >>> *José Luis Larroque* >>> Analista Programador Universitario - Facultad de Informática - UNLP >>> Desarrollador Java en LIFIA >>> >>> 2017-01-25 1:33 GMT-03:00 Sergey Edunov <[email protected]>: >>> >>>> Hello José, >>>> >>>> giraph.maxAllowedJobTimeMilliseconds is supposed to do exactly what >>>> you want, see the code here: >>>> https://github.com/apache/giraph/blob/trunk/giraph-core/src/ >>>> main/java/org/apache/giraph/job/DefaultJobProgressTrackerSer >>>> vice.java#L123 >>>> >>>> However, I have never tested it with any hadoop distro other than >>>> hadoop 1.0, so maybe it doesn't work in your environment. >>>> >>>> Can you share exact configuration (job parameters, and hadoop version) >>>> and what messages do you see in the log? >>>> >>>> Regards, >>>> Sergey Edunov >>>> >>>> >>>> On Tue, Jan 24, 2017 at 7:26 PM, José Luis Larroque >>>> <[email protected]> wrote: >>>> > I have to execute several Giraph process in AWS. For doing it, i have >>>> a >>>> > script that launch one process after another until all process are >>>> finished. >>>> > The problem is that some times, a container gets killed, and i spent >>>> a lot >>>> > of time waiting for the entire giraph app gets killed, so the >>>> following can >>>> > start. I'm trying to diminish this time, because i know that a >>>> process that >>>> > takes more than 5 minutes isn't going to be ended (i prefer get a few >>>> giraph >>>> > process being killed, if the maximum time for executing all of them >>>> gets >>>> > reduced significantly). >>>> > >>>> > I already try putting a "maximum amount of time" with the following >>>> options >>>> > putting a really low value (1 milisecond): >>>> > >>>> > giraph.waitTaskDoneTimeoutMs -> This option make the container throw >>>> an >>>> > IllegalStateException but doens's stop the Giraph app from running. I >>>> know >>>> > that this option have a bug reported, but i hope that is not the case >>>> here. >>>> > giraph.maxAllowedJobTimeMilliseconds -> With LOG level in DEBUG, i >>>> couldn't >>>> > see any impact of using this option. >>>> > >>>> > But yet, i'm not getting the expected result, and i have Giraph >>>> applications >>>> > that take like 12000 seconds or more (a big waste of time, resources >>>> and >>>> > money). >>>> > >>>> > >>>> > Any help will be greatly appreciated. >>>> > >>>> > >>>> > Thanks! >>>> > >>>> > >>>> > -- >>>> > José Luis Larroque >>>> > Analista Programador Universitario - Facultad de Informática - UNLP >>>> > Desarrollador Java en LIFIA >>>> >>> >>> >> >
