Reading each input in its own vertex gives better control and allow tuning
differently. That is why in general MRInput is used.  MultiMRInput
<https://github.com/apache/tez/blob/master/tez-mapreduce/src/main/java/org/apache/tez/mapreduce/input/MultiMRInput.java>
was
added for hive smb joins and it is used in hive code -
https://www.codatlas.com/search?q=MultiMRInput&projid=github.com%2Fapache%2Fhive&searchType=code
. So it should be safe for public consumption.

On Mon, Feb 13, 2017 at 5:37 PM, Piyush Narang <[email protected]> wrote:

> hi folks,
>
> While debugging the DAG generated by a Scalding / Cascading job, I noticed
> that in Tez we end up with two input vertices - one vertex for each input
> path. In case of Hadoop on the other hand we end up with our map phase
> reading from both input datasets. Is this supported in Tez? I noticed that
> Cascading is currently using MRInput
> <https://github.com/apache/tez/blob/master/tez-mapreduce/src/main/java/org/apache/tez/mapreduce/input/MRInput.java>
>  to
> set up its Tez inputs. I wasn't sure if we could use MultiMRInput
> <https://github.com/apache/tez/blob/master/tez-mapreduce/src/main/java/org/apache/tez/mapreduce/input/MultiMRInput.java>
>  to
> read from multiple input directories in the same vertex in Tez or if it has
> a different purpose. If we can use it, is it safe for public consumption?
> (noticed it is still annotated with @Evolving).
>
> Thanks,
>
> --
> - Piyush
>

Reply via email to