we do these jobs in cascading/scalding
On Apr 9, 2014 5:56 AM, "Henning Blohm" <henning.bl...@zfabrik.de> wrote:

> We operate a solution that stores large amounts of data in HBASE that needs
> to be available for online access.
>
> For efficient scanning, there are three pieces of data encoded in row keys
> (in particular a time dimension) and for other reasons some columns hold
> JSON encoded data.
>
> Currently, analytics data is created in two ways:
>
> a) a non-trivial M/R job that computes pre-aggregated data sets and
> offloads them into an analytical data base for interactive reporting
> b) other M/R jobs that create specialize reports (heuristics) that cannot
> be computed from pre-aggregated data
>
> In particular for b) but possibly also for variations of a) I would like to
> find more "user friendly" ways than Java implemented M/R jobs - at least
> for some cases.
>
> So this is not about interactive querying of data directly from HBase
> tables. It is rather about pre-processing HBase stored (large) data sets
> into either input to interactive query engines (some other DB, Phoenix,...)
> or into some other specialized format.
>
> I spent some time with HIVE but found that the HBase integration simply
> doesn't cut it (parsing a row key, mapping JSON column content). I know
> there is some more out there, but before spending an eternity trying out
> various methods, I am shamelessly trying to benefit from your expertise by
> asking for some good pointers.
>
> Thanks,
> Henning
>

Reply via email to