Suggestions to define schema for storing aggregates

universal localhost Wed, 26 Mar 2014 00:20:08 -0700

Hey All,

I am using Phoenix Jdbc Api to insert raw data, do OLAP processing (mainly
groupby aggregations) and insert the aggregated data back in HBase.


Sample phoenix table:
CREATE TABLE ABCD (TXID INTEGER NOT NULL, INSTANCEID INTEGER NOT NULL,
DATELOGGED TIMESTAMP NOT NULL, ORGNAME VARCHAR, MERCHANT_NAME VARCHAR,
COUNTRY VARCHAR, AMOUNT FLOAT, CODE INTEGER, CONTYPE VARCHAR,
IP_ROUTINGTYPE VARCHAR, SCORE INTEGER,RULE VARCHAR, STATUS INTEGER, ACTION
VARCHAR CONSTRAINT pk PRIMARY KEY (DATELOGGED, ORGNAME, TXID, INSTANCEID ))

I want to do aggregations on many dimensions over the data in this table
and store the output in a Phoenix table.

Sample aggregation query:
1.  Select TRUNC(DATELOGGED, 'hour'), ORGNAME, 'fraudrep1' as
reportid, count(*) from ABCD GROUP BY TRUNC(DATELOGGED, 'hour'), ORGNAME,
STATUS WHERE DATELOGGED > X && <Y

2. Select TRUNC(DATELOGGED, 'minute'), ORGNAME, 'fraudrep2' ' as reportid,
STATUS, count(*) from ABCD GROUP BY TRUNC(DATELOGGED, 'minute'), ORGNAME,
STATUS  WHERE DATELOGGED > X && <Y

3. Select TRUNC(DATELOGGED, 'minute'), ORGNAME,  'fraudrep3'' as reportid,
CODE, count(*) from ABCD GROUP BY TRUNC(DATELOGGED, 'minute'), ORGNAME,
CODE WHERE DATELOGGED > X && <Y;

4.  Select TRUNC(DATELOGGED, 'hour'), ORGNAME, 'fraudrep4'' as reportid,
STATUS, CODE, count(*) from ABCD GROUP BY TRUNC(DATELOGGED, 'hour'),
ORGNAME, STATUS, CODE  WHERE DATELOGGED > X && <Y;

5. Select TRUNC(DATELOGGED, 'hour'), ORGNAME, 'fraudrep5'' as reportid,
CONTYPE, count(*) from ABCD GROUP BY TRUNC(DATELOGGED, 'hour'), ORGNAME,
CONTYPE  WHERE DATELOGGED > X && <Y;
There can be many queries with 2-4 dimensions clubbed.

*How should I design the schema of the table to store the aggregated data ?*
FYI, When these processed data is queried then Datelogged, reportId,
timedimension(hour mor minute or day) are known.


--Unilocal

Suggestions to define schema for storing aggregates

Reply via email to