Thank you so much for your timely responses ,Hitesh ,Bikas and Rohini. Your explanations are very very clearly. I will try to use tez in scenarios to understand it more .
Best Regards. ---LLBian At 2016-01-21 02:45:29, "Hitesh Shah" <[email protected]> wrote: >Couple of other points to add to Bikas’s email: > >Regarding your question on small data: No - Tez is geared to work in both >small data and extremely large data cases. Hive should likely perform better >with Tez regardless of data size unless there is a bad query plan created that >is non-optimal for Tez. > >For 3). Hive/Pig/Cascading when used with MR would deconstruct a single hive >query/pig script into multiple MR jobs. This would end up reading/writing >from/to HDFS multiple times. Furthermore, with MR, you are stuck to fitting >all your code into a Mapper and Reducer ( each with only a single input and >output ) and using Shuffle for data transfer. This introduces additional >inefficiencies. With Tez, a single hive query can be converted into a single >DAG. Vertices can run any kind of logic and the edges between vertices are not >restricted to “shuffle-like” data transfer which allows more optimizations at >the query planning stages. The fact that Tez allows Hive/Pig to use smarter >ways of processing queries/scripts is what is usually the biggest win in terms >of performance. Spark is similarly better than MR as it provides a richer >operator library in some sense. As for comparing Spark vs Tez, to some extent, >it is likely comparing apples to oranges as Tez is quite a low-level library. >Depending on how an application is written to make use of Tez vs Spark, you >will find different cases where one is faster than the other. > >— Hitesh > >On Jan 20, 2016, at 8:44 AM, LLBian <[email protected]> wrote: > >> >> Hello,Tez experts: >> I have known that, tez is used in DAG cases. >> Because it can control the intermediate results do not write to disk, >> and container reuse, so it is more effective in processing small amount of >> data than mr. So, mybe I will think that hive on tez is better than hive on >> mr in processing small amount of data, am I right? >> Well, now, my questions are: >> (1)Even though there are main design themes in https://tez.apache.org/ , I >> am still not very clear about its application scenarios,and If there are >> some real and main enterprise applications,so much the better. >> (2)I am still not very clear what question It is mainly used to solving? >> (3) Why it is use for hive and pig? how is it better than spark or mr? >> (4)I looked at your official PPT and paper “Apache Tez: A Unifying Framework >> for Modeling and Building Data Processing Applications" , but still not very >> clearly. >> How to understand this :"Don’t solve problems that have already been solved. >> Or else you will have to solve them again!"? Is there any real example? >> >> Apache tez is a great product , I hope to learn more about it. >> >> Any reply are very appreciated. >> >> Thankyou & Best Regards. >> >> ---LLBian >> >> >
