Okay, so I built the source and used the target JARs to compile my project, but I'm not seeing any improvement in the behavior. What is the expected behavior if I set the session min held containers property? It still doesn't start up the containers on session start and the failed containers still get shut down. Thoughts?
On Fri, Aug 1, 2014 at 3:43 PM, Thaddeus Diamond <[email protected] > wrote: > Okay. Is there a place I can get the latest JARs to compile my code > against? I need this and other configurations for development but the > latest maven central artifacts are 0.4.1-incubating. Don't worry about > being unstable, I'm still in development with this project. > > > On Fri, Aug 1, 2014 at 1:41 PM, Bikas Saha <[email protected]> wrote: > >> Warning. Master is tracking the 0.5 API stability release. Hence >> transferring to master would mean work. But your code would be a lot >> cleaner. Master is expected to be unstable until next week or so. >> >> >> >> Bikas >> >> >> >> *From:* Thaddeus Diamond [mailto:[email protected]] >> *Sent:* Wednesday, July 30, 2014 9:27 PM >> >> *To:* [email protected] >> *Subject:* Re: Reusing Containers Of Failed Tasks >> >> >> >> Nevermind, I was not on master. I'll investigate that. >> >> >> >> Thanks! >> >> >> >> On Thu, Jul 31, 2014 at 12:14 AM, Thaddeus Diamond < >> [email protected]> wrote: >> >> I don't see that setting in TezConfiguration.java. Do you happen to know >> it offhand? >> >> >> >> On Thu, Jul 31, 2014 at 12:10 AM, Bikas Saha <[email protected]> >> wrote: >> >> There is no workaround without code change in Tez. >> >> >> >> The simplest code change would be to make this behavior configurable and >> have the current behavior as default. >> >> >> >> Btw, you can also try the session min held containers configuration that >> was recently added. This ensures that your session will retain some minimum >> resources. You can use the session min/max timeouts to decay excess >> containers. >> >> >> >> Bikas >> >> >> >> *From:* Thaddeus Diamond [mailto:[email protected]] >> *Sent:* Wednesday, July 30, 2014 8:51 PM >> *To:* [email protected] >> *Subject:* Re: Reusing Containers Of Failed Tasks >> >> >> >> I see. Is there a manual workaround you suggest for this? >> >> >> >> The motivation is this: I have an application with low latency and max >> concurrency SLAs. The way we are trying to solve this with Tez is to keep >> an application-level pool of Tez sessions and configure each to have >> long-lived containers. When users submit DAGs the application grabs an >> idle Tez session from the pool and submits to that one. After the DAG >> completes (successful or not) it is returned to the pool in an idle state. >> >> >> >> If a session gets returned to the pool but no containers are spun up in >> it because the DAG failed, I will fail to meet my SLAs on the next DAG >> submission. >> >> >> >> On Wed, Jul 30, 2014 at 8:05 PM, Bikas Saha <[email protected]> >> wrote: >> >> Currently, failed tasks make the JVM exit. There is no work around for >> that. Before we can change that we would need to be able to check the task >> execution is isolated such that a task failure does not end up “corrupting” >> the host. >> >> >> >> Bikas >> >> >> >> *From:* Thaddeus Diamond [mailto:[email protected]] >> *Sent:* Wednesday, July 30, 2014 3:15 PM >> *To:* [email protected] >> *Subject:* Reusing Containers Of Failed Tasks >> >> >> >> Hi, >> >> >> >> I turned on container reuse and upped the time that containers linger >> after task vertex completion >> (tez.am.container.session.delay-allocation-millis), but I'm still having an >> issue. Sometimes, the Processor I created will fail due to application >> logic in one DAG but not the next. The trivial example is: >> >> >> >> class MyProcessor implements LogicalIOProcessor { >> >> // Other non-application logic code >> >> public void run(...) { >> >> if (new Random().nextBoolean()) { >> >> throw new FooBarBazException(); >> >> } >> >> } >> >> } >> >> >> >> In this case I don't want the task JVM to be deallocated because it was >> application logic that caused the failure and next time I start a DAG I >> will have the long JVM task startup delay. >> >> >> >> I see the following code in the source >> (TaskScheduler#deallocateTask(...)) that I think is the cause of this: >> >> >> >> if (!taskSucceeded || !shouldReuseContainers) { >> >> if (LOG.isDebugEnabled()) { >> >> LOG.debug("Releasing container, containerId=" + >> container.getId() >> >> + ", taskSucceeded=" + taskSucceeded >> >> + ", reuseContainersFlag=" + shouldReuseContainers); >> >> } >> >> releaseContainer(container.getId()); >> >> } >> >> >> >> Is this something that can be fixed in master? Or is there a >> workaround/conf I can set to get this working? >> >> >> >> Thanks, >> >> Thad >> >> >> CONFIDENTIALITY NOTICE >> NOTICE: This message is intended for the use of the individual or entity >> to which it is addressed and may contain information that is confidential, >> privileged and exempt from disclosure under applicable law. If the reader >> of this message is not the intended recipient, you are hereby notified that >> any printing, copying, dissemination, distribution, disclosure or >> forwarding of this communication is strictly prohibited. If you have >> received this communication in error, please contact the sender immediately >> and delete it from your system. Thank You. >> >> >> >> >> CONFIDENTIALITY NOTICE >> NOTICE: This message is intended for the use of the individual or entity >> to which it is addressed and may contain information that is confidential, >> privileged and exempt from disclosure under applicable law. If the reader >> of this message is not the intended recipient, you are hereby notified that >> any printing, copying, dissemination, distribution, disclosure or >> forwarding of this communication is strictly prohibited. If you have >> received this communication in error, please contact the sender immediately >> and delete it from your system. Thank You. >> >> >> >> >> >> CONFIDENTIALITY NOTICE >> NOTICE: This message is intended for the use of the individual or entity >> to which it is addressed and may contain information that is confidential, >> privileged and exempt from disclosure under applicable law. If the reader >> of this message is not the intended recipient, you are hereby notified that >> any printing, copying, dissemination, distribution, disclosure or >> forwarding of this communication is strictly prohibited. If you have >> received this communication in error, please contact the sender immediately >> and delete it from your system. Thank You. >> > >
