Hive is the only system that can store and query xml directly, with the
help of different serde's or input formats.

Impala and Vertical have more standard schema systems that do not support
Collections like List, Map, Struct or nested collections you might need to
store and process a complex XML document.

Parquet (A storage format that works with Hive and Impala can support
List,Map, Structs) but he the Impala engine can not access these at the
moment. Last I checked impala refuses to read tables that have one of these
elements ( instead of skipping them).

It sounds like you want to do one of a few things:
1) Normalize your xml into a table and then you can use Vertica, Hive, or
Imapa
2) Write your data using using an Parquet (to handle nested objects ) and
Hive to query it.(Hopefully then when Impala adds collection support you
can switch over.

But mostly you need to do more research.

Edward

On Sat, Jan 3, 2015 at 2:15 PM, Shashidhar Rao <raoshashidhar...@gmail.com>
wrote:

> Hi,
>
> Can someone help me with insights into Hive with parquet vs Vertica
> comparison.
>
> I need to store large xml data into one these database so please help me
> with query performance.
>
> Is Impala opensource and can we use it without Cloudera license.
>
> Thanks
> Shashi
>
>
>

Reply via email to