Thanks Edward , you summed it all in your reply. On Sun, Jan 4, 2015 at 4:15 AM, Edward Capriolo <[email protected]> wrote:
> "how does vertica compare with Hive in similar settings" > Vertical in hive do no have similar settings. Vertica is a columnar MPP > analytic database. Hive is an SQL on hadoop platform. Depending on your > usage patterns you can use these things interchangeably but not all. > > Vertica can do low latency < 1 second. select queries based on > "PROJECTIONS" which are something like materialized views. Hive is a batch > processing system and not built for low latency queries. > > Hive use cases typically chop up very large datasets and produce other > large data sets, or smaller datasets which are put into online systems for > low latency analysis. > > Even if you use one of the columnar style formats like Parquet and the > faster execution engines you are not going to get the type of experience > you get from vertica. > > Vertical to Mysql. They both do queries, but vertica does not do primary > key enforcement and is non-optimal for small row at a time insertions. > Certain datasets mysql queries faster some Vertical will query faster. > > Basically hive and vertica are very different even though they are both > considered data warehouses. > > On Sat, Jan 3, 2015 at 4:30 PM, Shashidhar Rao <[email protected] > > wrote: > >> Sorry Edward, I mentioned that I didn't have access to vertica , but yes >> I was given vertica query retrieval time . Query is kind of select a.x, >> b.y from t as a , t1 as b where a.id = b.id etc and the schema for >> those tables required for the join were given. >> A subset of the data was given and I just need to find a open source >> framework like Hive and simulate it by storing and executing it. >> Yes, you are right by saying that it is impossible and I do agree, but >> unfortunately, I have to come up with this difficult task. >> >> The hard part is that the rows in vertica will not have similar number >> of rows in Hive , it is only a subset , but it seems query time will be >> derived based on the rows in hive compared to rows in vertica through some >> calculation. >> >> I only wanted to know how does vertica compare with Hive in similar >> settings in case if someone has done some benchmarking, it need not be my >> case. Some links to some blogs would certainly help. >> >> But thanks anyway for your reply >> >> Thanks >> shashi >> >> On Sun, Jan 4, 2015 at 2:21 AM, Edward Capriolo <[email protected]> >> wrote: >> >>> Shashi, >>> >>> Your questions are too broad, and you are asking questions that are >>> impossible to answer. >>> Q. "What is faster X or Y?". >>> A. "This depends on countless variables and can not be answered." >>> >>> For one example even databases that are very similar in nature like >>> mysql/postgres might execute a query a different way based on it's query >>> planner or even the characteristics of the data. >>> >>> How can you show if a query is "faster then vertica" if you do not have >>> access vertica to prove it? >>> >>> I understand some of what you are trying to determine, but you should >>> really attempt to install these things and build a prototype to determine >>> what is the best fit for your application. This will grow your >>> understanding of the systems, help you ask better questions, and >>> potentially give you the ability to answer those questions yourself and >>> make better decisions. >>> >>> The right way to ask this question might be "Hello, I have loaded >>> 50Million rows of data into hive and I am running this query 'select X, >>> from bla bla'. My vertica instances runs this query in X seconds and hive >>> runs this in Y seconds. Can this be optimized further?" >>> >>> The software license for Impala is included here: >>> https://github.com/cloudera/Impala/blob/master/LICENSE.txt >>> >>> Edward >>> >>> >>> On Sat, Jan 3, 2015 at 3:29 PM, Shashidhar Rao < >>> [email protected]> wrote: >>> >>>> Edward, >>>> >>>> Thanks for your reply. >>>> Can you please tell me the query performance of Hive-parquet against >>>> Vertica. Can Hive -parquet match against Vertica's retrieval performance, >>>> as I have been told Vertica is also compressed columnar format and is fast? >>>> What if I query against some 50 millions of rows , which one will be >>>> faster? >>>> >>>> And moreover is Impala open source ? In some blogs I have seen Impala >>>> as open source but in some it says Impala as Cloudera proprietary engine. >>>> >>>> Ultimately, I want to use Hive -parquet but need to show that it is >>>> better than Vertica, a few microseconds here and there would be fine. I >>>> don't have access to Vertica. >>>> >>>> Thanks >>>> shashi >>>> >>>> On Sun, Jan 4, 2015 at 1:07 AM, Edward Capriolo <[email protected]> >>>> wrote: >>>> >>>>> Hive is the only system that can store and query xml directly, with >>>>> the help of different serde's or input formats. >>>>> >>>>> Impala and Vertical have more standard schema systems that do not >>>>> support Collections like List, Map, Struct or nested collections you might >>>>> need to store and process a complex XML document. >>>>> >>>>> Parquet (A storage format that works with Hive and Impala can support >>>>> List,Map, Structs) but he the Impala engine can not access these at the >>>>> moment. Last I checked impala refuses to read tables that have one of >>>>> these >>>>> elements ( instead of skipping them). >>>>> >>>>> It sounds like you want to do one of a few things: >>>>> 1) Normalize your xml into a table and then you can use Vertica, Hive, >>>>> or Imapa >>>>> 2) Write your data using using an Parquet (to handle nested objects ) >>>>> and Hive to query it.(Hopefully then when Impala adds collection support >>>>> you can switch over. >>>>> >>>>> But mostly you need to do more research. >>>>> >>>>> Edward >>>>> >>>>> On Sat, Jan 3, 2015 at 2:15 PM, Shashidhar Rao < >>>>> [email protected]> wrote: >>>>> >>>>>> Hi, >>>>>> >>>>>> Can someone help me with insights into Hive with parquet vs Vertica >>>>>> comparison. >>>>>> >>>>>> I need to store large xml data into one these database so please help >>>>>> me with query performance. >>>>>> >>>>>> Is Impala opensource and can we use it without Cloudera license. >>>>>> >>>>>> Thanks >>>>>> Shashi >>>>>> >>>>>> >>>>>> >>>>> >>>> >>> >> >
