Ten processor with multiple inputs

2015-05-18 Thread Oleg Zhurakousky
Is it possible to allow Tez processor implementation which has multiple inputs to become available as soon as at least one input is available to be read. This could allow for some computation to begin while waiting for other inputs. Other inputs could (if logic allows) be processed as they become

Re: Ten processor with multiple inputs

2015-05-18 Thread Oleg Zhurakousky
Also, while trying something related to this i’ve noticed the following: "A vertex with an Initial Input and a Shuffle Input are not supported at the moment”. Is there a target timeframe for this? JIRA? Thanks Oleg > On May 18, 2015, at 10:27 AM, Oleg Zhurakousky > wrote: > > Is it possible

Re: [DISCUSS] Drop Java 6 support in 0.8

2015-05-18 Thread Siddharth Seth
Doing a quick search - Hadoop 2.2 and beyond seem to work with Java 7 ( http://wiki.apache.org/hadoop/HadoopJavaVersions). Vendors have been shipping with Java 7 support for quite some time. That should take care of allowing Tez with Hadoop 2.2 libraries to work on older clusters, unless the users

RE: Ten processor with multiple inputs

2015-05-18 Thread Bikas Saha
All inputs have a waitForReady() method (with flavors) that can be used by the processor to wait as it deems fit. -Original Message- From: Oleg Zhurakousky [mailto:ozhurakou...@hortonworks.com] Sent: Monday, May 18, 2015 7:27 AM To: user@tez.apache.org; d...@tez.apache.org Subject: Ten p

Re: Ten processor with multiple inputs

2015-05-18 Thread Siddharth Seth
There's APIs on the ProcessorContext - waitForAllInputsReady, waitForAnyInputReady - which can be used to figure out when a specific Input is ready for consumption. That should solve the first question. Regarding vertices with multiple Inputs and Shuffle - that requires a custom VertexManager plug

Re: Ten processor with multiple inputs

2015-05-18 Thread Oleg Zhurakousky
Thanks Sid So, any pointer on how one would interact with it. I mean all I do is assemble DAG and I can’t seem to see anything on the Vertex that would allow me to do that. Thanks Oleg On May 18, 2015, at 2:00 PM, Siddharth Seth mailto:ss...@apache.org>> wrote: There's APIs on the Process

Re: Ten processor with multiple inputs

2015-05-18 Thread Hitesh Shah
There is nothing that prevents a processor running and finishing without even reading any data from any input. The only point when the processor blocks is when it tries to read data from a particular input that has not yet finished fetching all of its data. That said, a processor cannot yet que

Re: Ten processor with multiple inputs

2015-05-18 Thread Hitesh Shah
This is with respect to how work is assigned to a Task. For a shuffle edge, a Task’s input is determined based on the partitions and how partitions are assigned to a Task. For a vertex reading data from HDFS ( initial input ). this is effectively random as the input data is split up and then ass

Re: [DISCUSS] Drop Java 6 support in 0.8

2015-05-18 Thread Hitesh Shah
To clarify, my main point was that to run Tez 0.8 with java7 features would need the hadoop cluster to be running with java7 ( or atleast support a way for the YARN containers to use a different java version i.e java7 ). Users on older versions of hadoop running java6 would likely not be able to

Re: Ten processor with multiple inputs

2015-05-18 Thread Siddharth Seth
These APIs are available during execution of the Processor. They're a mechanism to get notified and wait till certain Inputs are ready, or get notified on an Input being ready while another is being processed. There's nothing on the DAG API for this. What are you looking to do ? One thing to note t