Rolling hourly data

Monil Gandhi Thu, 15 Nov 2018 17:02:41 -0800

Hello,
Currently we have hourly data in our phoenix table. However, the schema was
designed to perform well for daily data. Increasing the number of rows by
24X has lead to degradation of our service over time.
Our current schema is as follows


CREATE TABLE IF NOT EXISTS T1 (sid BIGINT NOT NULL, day DATE NOT NULL, cid
BIGINT NOT NULL, s.count INTEGER, CONSTRAINT PK PRIMARY KEY (sid, day, cid)
) COMPRESSION='SNAPPY'

The query we run is something along the lines of
SELECT sid, cid, CAST(SUM(count) AS BIGINT) AS count FROM PUBLISHER_V4
WHERE sid IN (34535) AND day BETWEEN TO_DATE('2018-07-01', 'yyyy-MM-dd',
'GMT') AND TO_DATE('2018-07-02', 'yyyy-MM-dd', 'GMT') GROUP BY sid, cid

Based on our investigation we have concluded that the main reason is purely
the number of rows that are being read. I am open to other suggestions

If number of rows is the case
I am wondering if there is a way to either
1. to roll hourly data to daily using views, secondary index or map reduce.
I know map reduce is possible.
2. migrate to a newer schema where cid is not part of pk and is actually a
column family. I was unable to find any kind of documentation on this.

Thanks
Monil

Rolling hourly data

Reply via email to