Tez does not close Inputs/Outputs/Processors in case there's an error during task execution. We haven't really spent too much time defining semantics in such cases - since the expectation is for the container not to be re-used. Looks like this needs to be figured out - for such cases, as well as LocalMode.
On Sun, Aug 3, 2014 at 5:51 PM, Thaddeus Diamond <[email protected] > wrote: > Thanks. Created https://issues.apache.org/jira/browse/TEZ-1369 and > uploaded a patch. > > > On Sat, Aug 2, 2014 at 3:33 PM, Bikas Saha <[email protected]> wrote: > >> Session min held containers was orthogonal to your main issue about >> failed task causing containers to get lost. >> >> >> >> It was more of a suggestion to your use case of maintaining an allocated >> session pool for low latency. Min held containers will maintain that >> minimum pool of containers (best effort) that is distributed evenly across >> your cluster (best effort) such that subsequent DAGs are assured of some >> min capacity. >> >> >> >> For your failed task to not fail the container, that would still need >> minor code change in Tez to add a config to change that behavior. Please >> feel free to create a jira and if possible provide a patch. >> >> >> >> Bikas >> >> >> >> *From:* Thaddeus Diamond [mailto:[email protected]] >> *Sent:* Friday, August 01, 2014 8:54 PM >> >> *To:* [email protected] >> *Subject:* Re: Reusing Containers Of Failed Tasks >> >> >> >> Okay, so I built the source and used the target JARs to compile my >> project, but I'm not seeing any improvement in the behavior. What is the >> expected behavior if I set the session min held containers property? It >> still doesn't start up the containers on session start and the failed >> containers still get shut down. Thoughts? >> >> >> >> On Fri, Aug 1, 2014 at 3:43 PM, Thaddeus Diamond < >> [email protected]> wrote: >> >> Okay. Is there a place I can get the latest JARs to compile my code >> against? I need this and other configurations for development but the >> latest maven central artifacts are 0.4.1-incubating. Don't worry about >> being unstable, I'm still in development with this project. >> >> >> >> On Fri, Aug 1, 2014 at 1:41 PM, Bikas Saha <[email protected]> wrote: >> >> Warning. Master is tracking the 0.5 API stability release. Hence >> transferring to master would mean work. But your code would be a lot >> cleaner. Master is expected to be unstable until next week or so. >> >> >> >> Bikas >> >> >> >> *From:* Thaddeus Diamond [mailto:[email protected]] >> *Sent:* Wednesday, July 30, 2014 9:27 PM >> >> >> *To:* [email protected] >> *Subject:* Re: Reusing Containers Of Failed Tasks >> >> >> >> Nevermind, I was not on master. I'll investigate that. >> >> >> >> Thanks! >> >> >> >> On Thu, Jul 31, 2014 at 12:14 AM, Thaddeus Diamond < >> [email protected]> wrote: >> >> I don't see that setting in TezConfiguration.java. Do you happen to know >> it offhand? >> >> >> >> On Thu, Jul 31, 2014 at 12:10 AM, Bikas Saha <[email protected]> >> wrote: >> >> There is no workaround without code change in Tez. >> >> >> >> The simplest code change would be to make this behavior configurable and >> have the current behavior as default. >> >> >> >> Btw, you can also try the session min held containers configuration that >> was recently added. This ensures that your session will retain some minimum >> resources. You can use the session min/max timeouts to decay excess >> containers. >> >> >> >> Bikas >> >> >> >> *From:* Thaddeus Diamond [mailto:[email protected]] >> *Sent:* Wednesday, July 30, 2014 8:51 PM >> *To:* [email protected] >> *Subject:* Re: Reusing Containers Of Failed Tasks >> >> >> >> I see. Is there a manual workaround you suggest for this? >> >> >> >> The motivation is this: I have an application with low latency and max >> concurrency SLAs. The way we are trying to solve this with Tez is to keep >> an application-level pool of Tez sessions and configure each to have >> long-lived containers. When users submit DAGs the application grabs an >> idle Tez session from the pool and submits to that one. After the DAG >> completes (successful or not) it is returned to the pool in an idle state. >> >> >> >> If a session gets returned to the pool but no containers are spun up in >> it because the DAG failed, I will fail to meet my SLAs on the next DAG >> submission. >> >> >> >> On Wed, Jul 30, 2014 at 8:05 PM, Bikas Saha <[email protected]> >> wrote: >> >> Currently, failed tasks make the JVM exit. There is no work around for >> that. Before we can change that we would need to be able to check the task >> execution is isolated such that a task failure does not end up “corrupting” >> the host. >> >> >> >> Bikas >> >> >> >> *From:* Thaddeus Diamond [mailto:[email protected]] >> *Sent:* Wednesday, July 30, 2014 3:15 PM >> *To:* [email protected] >> *Subject:* Reusing Containers Of Failed Tasks >> >> >> >> Hi, >> >> >> >> I turned on container reuse and upped the time that containers linger >> after task vertex completion >> (tez.am.container.session.delay-allocation-millis), but I'm still having an >> issue. Sometimes, the Processor I created will fail due to application >> logic in one DAG but not the next. The trivial example is: >> >> >> >> class MyProcessor implements LogicalIOProcessor { >> >> // Other non-application logic code >> >> public void run(...) { >> >> if (new Random().nextBoolean()) { >> >> throw new FooBarBazException(); >> >> } >> >> } >> >> } >> >> >> >> In this case I don't want the task JVM to be deallocated because it was >> application logic that caused the failure and next time I start a DAG I >> will have the long JVM task startup delay. >> >> >> >> I see the following code in the source >> (TaskScheduler#deallocateTask(...)) that I think is the cause of this: >> >> >> >> if (!taskSucceeded || !shouldReuseContainers) { >> >> if (LOG.isDebugEnabled()) { >> >> LOG.debug("Releasing container, containerId=" + >> container.getId() >> >> + ", taskSucceeded=" + taskSucceeded >> >> + ", reuseContainersFlag=" + shouldReuseContainers); >> >> } >> >> releaseContainer(container.getId()); >> >> } >> >> >> >> Is this something that can be fixed in master? Or is there a >> workaround/conf I can set to get this working? >> >> >> >> Thanks, >> >> Thad >> >> >> CONFIDENTIALITY NOTICE >> NOTICE: This message is intended for the use of the individual or entity >> to which it is addressed and may contain information that is confidential, >> privileged and exempt from disclosure under applicable law. If the reader >> of this message is not the intended recipient, you are hereby notified that >> any printing, copying, dissemination, distribution, disclosure or >> forwarding of this communication is strictly prohibited. If you have >> received this communication in error, please contact the sender immediately >> and delete it from your system. Thank You. >> >> >> >> >> CONFIDENTIALITY NOTICE >> NOTICE: This message is intended for the use of the individual or entity >> to which it is addressed and may contain information that is confidential, >> privileged and exempt from disclosure under applicable law. If the reader >> of this message is not the intended recipient, you are hereby notified that >> any printing, copying, dissemination, distribution, disclosure or >> forwarding of this communication is strictly prohibited. If you have >> received this communication in error, please contact the sender immediately >> and delete it from your system. Thank You. >> >> >> >> >> >> >> CONFIDENTIALITY NOTICE >> NOTICE: This message is intended for the use of the individual or entity >> to which it is addressed and may contain information that is confidential, >> privileged and exempt from disclosure under applicable law. If the reader >> of this message is not the intended recipient, you are hereby notified that >> any printing, copying, dissemination, distribution, disclosure or >> forwarding of this communication is strictly prohibited. If you have >> received this communication in error, please contact the sender immediately >> and delete it from your system. Thank You. >> >> >> >> >> >> CONFIDENTIALITY NOTICE >> NOTICE: This message is intended for the use of the individual or entity >> to which it is addressed and may contain information that is confidential, >> privileged and exempt from disclosure under applicable law. If the reader >> of this message is not the intended recipient, you are hereby notified that >> any printing, copying, dissemination, distribution, disclosure or >> forwarding of this communication is strictly prohibited. If you have >> received this communication in error, please contact the sender immediately >> and delete it from your system. Thank You. >> > >
