Hi Alan,

thank you for your reply. The loose idea I had was to store one row in the
RDBMS per Hive partition so I don't think the size will be an issue
(expecting 3000 partitions or so). The end goal was to help to decide which
partitions that are relevant for a query. Something like adding partition
info to the WHERE clause behind the scenes. The way the data is structured
we currently need to look up which partitions to use elsewhere.

I'll look into ORC for sure. Currently we do not use any of the provided
file formats but have implemented our own InputFormat that read gzip:ed
protobufs. I suspect that we later on should investigate a possible
performance gain coming from moving to a another file format.

Petter


2014/1/22 Alan Gates <ga...@hortonworks.com>

> HCatalog is definitely not designed for this purpose.  Could you explain
> your use case more fully?  Is this indexing for better query planning or
> faster file access?  If so, you might look at some of the work going on in
> ORC, which is storing indices of its data in the format itself for these
> purposes.  Also, how much data do you need to store?  Even index size on a
> Hadoop scale data can quickly overwhelm MySQL or Postgres (which is what
> most people use for their metastores) if you are keeping per row
> information.  If you truly want to access an RDBMS as if it were an
> external data store, you could implement a HiveStorageHandler for your
> RDBMS.
>
> Alan.
>
> On Jan 22, 2014, at 2:02 AM, Petter von Dolwitz (Hem) <
> petter.von.dolw...@gmail.com> wrote:
>
> > Hi,
> >
> > I have a case where I would like to extend Hive to use information from
> a regular RDBMS. To limit the complexity of the installation I thought I
> could piggyback on the already existing metatstore.
> >
> > As I understand it, HCatalog is not built for this purpose. Is there
> someone out there that has a similar usecase or have any input on how this
> is done or if it should be avoided?
> >
> > The use case is to look up which partitions that contain certain data.
> >
> > Thanks,
> > Petter
> >
> >
>
>
> --
> CONFIDENTIALITY NOTICE
> NOTICE: This message is intended for the use of the individual or entity to
> which it is addressed and may contain information that is confidential,
> privileged and exempt from disclosure under applicable law. If the reader
> of this message is not the intended recipient, you are hereby notified that
> any printing, copying, dissemination, distribution, disclosure or
> forwarding of this communication is strictly prohibited. If you have
> received this communication in error, please contact the sender immediately
> and delete it from your system. Thank You.
>

Reply via email to