we do these jobs in cascading/scalding On Apr 9, 2014 5:56 AM, "Henning Blohm" <henning.bl...@zfabrik.de> wrote:
> We operate a solution that stores large amounts of data in HBASE that needs > to be available for online access. > > For efficient scanning, there are three pieces of data encoded in row keys > (in particular a time dimension) and for other reasons some columns hold > JSON encoded data. > > Currently, analytics data is created in two ways: > > a) a non-trivial M/R job that computes pre-aggregated data sets and > offloads them into an analytical data base for interactive reporting > b) other M/R jobs that create specialize reports (heuristics) that cannot > be computed from pre-aggregated data > > In particular for b) but possibly also for variations of a) I would like to > find more "user friendly" ways than Java implemented M/R jobs - at least > for some cases. > > So this is not about interactive querying of data directly from HBase > tables. It is rather about pre-processing HBase stored (large) data sets > into either input to interactive query engines (some other DB, Phoenix,...) > or into some other specialized format. > > I spent some time with HIVE but found that the HBase integration simply > doesn't cut it (parsing a row key, mapping JSON column content). I know > there is some more out there, but before spending an eternity trying out > various methods, I am shamelessly trying to benefit from your expertise by > asking for some good pointers. > > Thanks, > Henning >