It always depends on what you want to do and thus from experience I cannot agree with your comment. Do you have any reasoning for this statement?
> On 02 Mar 2016, at 19:14, Dayong <will...@gmail.com> wrote: > > Tez is kind of outdated and Orc is so dedicated on hive. In addition, hive > metadata store can be decoupled from hive as well. In reality, we do suffer > from hive's performance even for ETL job. As result, we'll switch to implala > + spark/ flink. > > Thanks, > Dayong > >> On Mar 2, 2016, at 10:35 AM, Mich Talebzadeh <mich.talebza...@gmail.com> >> wrote: >> >> I forgot besides LLAP you are going to have Hive Hybrid Procedural SQL On >> Hadoop (HPL/SQL) which is going to add another dimension to Hive >> >> Dr Mich Talebzadeh >> >> LinkedIn >> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw >> >> http://talebzadehmich.wordpress.com >> >> >>> On 2 March 2016 at 15:30, Mich Talebzadeh <mich.talebza...@gmail.com> wrote: >>> SQL plays an increasing important role on Hadoop. As of today Hive IMO >>> provides the best and most robust solution to anything resembling to Data >>> Warehouse "solution" on Hadoop, chiefly by means of its powerful metastore >>> which can be hosted on a variety of mission critical databases plus Hive's >>> ever increasing support for a variety of file types on HDFs from humble >>> textfile to ORC. The remaining tools are little more than query tools that >>> crucially rely on Hive Metastore for their needs. Take away Hive component >>> and they are more and less lame ducks. >>> >>> Hive on MR speed was perceived to be slow but what the hec we are talking >>> about a Data Warehouse here which in most part should be batch oriented >>> and not user-facing and batch oriented. In Hive 0.14 and 2.0 you can use >>> Spark and Tez as the execution engine and if you are well into functional >>> programming, you can deploy Spark on Hive. If you look around from Impala >>> to Spark the architecture is essentially a query tool. >>> >>> >>> >>> Dr Mich Talebzadeh >>> >>> LinkedIn >>> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw >>> >>> http://talebzadehmich.wordpress.com >>> >>> >>>> On 2 March 2016 at 13:52, Dayong <will...@gmail.com> wrote: >>>> As I remember of few weeks before in Hadoop weekly news feed, cloudera has >>>> a benchmark showing implala is a little better than spark SQL and hive >>>> with tez. You can check that. From my experience, hive is still leading >>>> tool for regular ETL job since it is stable. The other tool are better for >>>> adhoc and interactive query use case. Cloudera bet on implala especially >>>> with its new kudo project. >>>> >>>> Thanks, >>>> Dayong >>>> >>>>> On Mar 1, 2016, at 5:14 PM, Edward Capriolo <edlinuxg...@gmail.com> wrote: >>>>> >>>>> My nocks on impala. (not intended to be a post knocking impala) >>>>> >>>>> Impala really has not delivered on the complex types that hive has (after >>>>> promising it for quite a while), also it only works with the 'blessed' >>>>> input formats, parquet, avro, text. >>>>> >>>>> It is very annoying to work with impala, In my version if you create a >>>>> partition in hive impala does not see it. You have to run "refresh". >>>>> >>>>> In impala I do not have all the UDFS that hive has like percentile, etc. >>>>> >>>>> Impala is fast. Many data-analysts / data-scientist types that can't wait >>>>> 10 seconds for a query so when I need top produce something for them I >>>>> make sure the data has no complex types and uses a table type that impala >>>>> understands. >>>>> >>>>> But for my work I still work primarily in hive, because I do not want to >>>>> deal with all the things that impala does not have/might have/ and when I >>>>> need something special like my own UDFs it is easier to whip up the >>>>> solution in hive. >>>>> >>>>> Having worked with M$ SQL server, and vertica, Impala is on par with them >>>>> but I don'think of it like i think of hive. To me it just feels like a >>>>> vertica that I can cheat loading sometimes because it is backed by hdfs. >>>>> >>>>> Hive is something different, I am making pipelines, I am transforming >>>>> data, doing streaming, writing custom udfs, querying JSON directly. Its >>>>> not != impala. >>>>> >>>>> ::random message of the day:: >>>>> >>>>> >>>>> >>>>> >>>>>> On Tue, Mar 1, 2016 at 4:38 PM, Ashok Kumar <ashok34...@yahoo.com> wrote: >>>>>> >>>>>> Dr Mitch, >>>>>> >>>>>> My two cents here. >>>>>> >>>>>> I don't have direct experience of Impala but in my humble opinion I >>>>>> share your views that Hive provides the best metastore of all Big Data >>>>>> systems. Looking around almost every product in one form and shape use >>>>>> Hive code somewhere. My colleagues inform me that Hive is one of the >>>>>> most stable Big Data products. >>>>>> >>>>>> With the capabilities of Spark on Hive and Hive on Spark or Tez plus of >>>>>> course MR, there is really little need for many other products in the >>>>>> same space. It is good to keep things simple. >>>>>> >>>>>> Warmest >>>>>> >>>>>> >>>>>> On Tuesday, 1 March 2016, 11:33, Mich Talebzadeh >>>>>> <mich.talebza...@gmail.com> wrote: >>>>>> >>>>>> >>>>>> I have not heard of Impala anymore. I saw an article in LinkedIn titled >>>>>> >>>>>> "Apache Hive Or Cloudera Impala? What is Best for me?" >>>>>> >>>>>> "We can access all objects from Hive data warehouse with HiveQL which >>>>>> leverages the map-reduce architecture in background for data retrieval >>>>>> and transformation and this results in latency." >>>>>> >>>>>> My response was >>>>>> >>>>>> This statement is no longer valid as you have choices of three engines >>>>>> now with MR, Spark and Tez. I have not used Impala myself as I don't >>>>>> think there is a need for it with Hive on Spark or Spark using Hive >>>>>> metastore providing whatever needed. Hive is for Data Warehouse and >>>>>> provides what is says on the tin. Please also bear in mind that Hive >>>>>> offers ORC storage files that provide store Index capabilities further >>>>>> optimizing the queries with additional stats at file, stripe and row >>>>>> group levels. >>>>>> >>>>>> Anyway the question is with Hive on Spark or Spark using Hive metastore >>>>>> what we cannot achieve that we can achieve with Impala? >>>>>> >>>>>> >>>>>> Dr Mich Talebzadeh >>>>>> >>>>>> LinkedIn >>>>>> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw >>>>>> >>>>>> http://talebzadehmich.wordpress.com >>