Gunther Hagleitner created TEZ-1447: ---------------------------------------
Summary: Handle parallelism updates and versioning w/ custom InputInitializerEvents Key: TEZ-1447 URL: https://issues.apache.org/jira/browse/TEZ-1447 Project: Apache Tez Issue Type: Bug Affects Versions: 0.5.0 Reporter: Gunther Hagleitner Priority: Blocker Fix For: 0.5.0 I'm trying to do dynamic partition pruning through input initializer events in Hive. That means that the initializer of a table scan vertex has to receive events from all tasks in another vertex (which contain the pruning info) before generating tasks to run. The problem with the current API I ran into: getNumTasks: I'm currently using a busy loop to wait for the num tasks for a vertex to be decided (-1 -> x). There's no way around it, because it's the only way to find out what number of events to expect (0 is a valid number of tasks - so I can't wait for the first to complete). With auto-reducer parallelism I have to employ another busy loop. Because I might be initially expecting 10 events, which later get's knocked down to 5. Since there's no event associated with this, I have to periodically check whether I have enough events. Versioning: Events have a version number, but I don't know which task they are coming from. Thus I can't de-dup events. -- This message was sent by Atlassian JIRA (v6.2#6252)