Hive is the only system that can store and query xml directly, with the help of different serde's or input formats.
Impala and Vertical have more standard schema systems that do not support Collections like List, Map, Struct or nested collections you might need to store and process a complex XML document. Parquet (A storage format that works with Hive and Impala can support List,Map, Structs) but he the Impala engine can not access these at the moment. Last I checked impala refuses to read tables that have one of these elements ( instead of skipping them). It sounds like you want to do one of a few things: 1) Normalize your xml into a table and then you can use Vertica, Hive, or Imapa 2) Write your data using using an Parquet (to handle nested objects ) and Hive to query it.(Hopefully then when Impala adds collection support you can switch over. But mostly you need to do more research. Edward On Sat, Jan 3, 2015 at 2:15 PM, Shashidhar Rao <raoshashidhar...@gmail.com> wrote: > Hi, > > Can someone help me with insights into Hive with parquet vs Vertica > comparison. > > I need to store large xml data into one these database so please help me > with query performance. > > Is Impala opensource and can we use it without Cloudera license. > > Thanks > Shashi > > >