Gunther Hagleitner created TEZ-1447:
---------------------------------------

             Summary: Handle parallelism updates and versioning w/ custom 
InputInitializerEvents
                 Key: TEZ-1447
                 URL: https://issues.apache.org/jira/browse/TEZ-1447
             Project: Apache Tez
          Issue Type: Bug
    Affects Versions: 0.5.0
            Reporter: Gunther Hagleitner
            Priority: Blocker
             Fix For: 0.5.0


I'm trying to do dynamic partition pruning through input initializer events in 
Hive. That means that the initializer of a table scan vertex has to receive 
events from all tasks in another vertex (which contain the pruning info) before 
generating tasks to run.

The problem with the current API I ran into:

getNumTasks: I'm currently using a busy loop to wait for the num tasks for a 
vertex to be decided (-1 -> x). There's no way around it, because it's the only 
way to find out what number of events to expect (0 is a valid number of tasks - 
so I can't wait for the first to complete).

With auto-reducer parallelism I have to employ another busy loop. Because I 
might be initially expecting 10 events, which later get's knocked down to 5. 
Since there's no event associated with this, I have to periodically check 
whether I have enough events.

Versioning: Events have a version number, but I don't know which task they are 
coming from. Thus I can't de-dup events.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to