Re:Re: What's the application scenario of Apache TEZ

LLBian Wed, 20 Jan 2016 23:11:24 -0800

Thank you so much for your timely responses ,Hitesh ,Bikas and Rohini.
Your explanations are very very clearly. I will try to use tez in scenarios to  
understand it more .


Best Regards.

---LLBian

At 2016-01-21 02:45:29, "Hitesh Shah" <[email protected]> wrote:
>Couple of other points to add to Bikas’s email: 
>
>Regarding your question on small data: No - Tez is geared to work in both 
>small data and extremely large data cases. Hive should likely perform better 
>with Tez regardless of data size unless there is a bad query plan created that 
>is non-optimal for Tez.
>
>For 3). Hive/Pig/Cascading when used with MR would deconstruct a single hive 
>query/pig script into multiple MR jobs. This would end up reading/writing 
>from/to HDFS multiple times. Furthermore, with MR, you are stuck to fitting 
>all your code into a Mapper and Reducer ( each with only a single input and 
>output ) and using Shuffle for data transfer. This introduces additional 
>inefficiencies. With Tez, a single hive query can be converted into a single 
>DAG. Vertices can run any kind of logic and the edges between vertices are not 
>restricted to “shuffle-like” data transfer which allows more optimizations at 
>the query planning stages. The fact that Tez allows Hive/Pig to use smarter 
>ways of processing queries/scripts is what is usually the biggest win in terms 
>of performance. Spark is similarly better than MR as it provides a richer 
>operator library in some sense. As for comparing Spark vs Tez, to some extent, 
>it is likely comparing apples to oranges as Tez is quite a low-level library. 
>Depending on how an application is written to make use of Tez vs Spark, you 
>will find different cases where one is faster than the other. 
> 
>— Hitesh
>
>On Jan 20, 2016, at 8:44 AM, LLBian <[email protected]> wrote:
>
>> 
>> Hello,Tez experts:
>>       I have known that, tez is used in DAG cases.
>>        Because it can control the intermediate results do not write to disk, 
>> and container reuse, so it is more effective in processing small amount of 
>> data than mr. So, mybe I will think that hive on tez is better than hive on 
>> mr in processing small amount of data, am I right?
>>      Well, now, my questions are:
>> (1)Even though there are main design themes in https://tez.apache.org/ ， I 
>> am still not very clear about its application scenarios，and If there are 
>> some real and main enterprise applications，so much the better.
>> (2)I am still not very clear what question It is mainly used to solving？ 
>> (3) Why it is use for hive and pig? how is it better than spark or mr？
>> (4)I looked at your official PPT and paper “Apache Tez: A Unifying Framework 
>> for Modeling and Building Data Processing Applications" , but still not very 
>> clearly. 
>> How to understand this :"Don’t solve problems that have already been solved. 
>> Or else you will have to solve them again!"? Is there any real example?
>> 
>>      Apache tez is a great product ， I hope to learn more about it.
>> 
>> Any reply are very appreciated.
>> 
>> Thankyou & Best Regards.
>> 
>> ---LLBian
>> 
>>   
>

Re:Re: What's the application scenario of Apache TEZ

Reply via email to