Speeding Up Group By Queries

Amit Shah Fri, 25 Mar 2016 07:02:28 -0700

Hi,

I am trying to evaluate apache hbase (version 1.0.0) and phoenix (version
4.6) deployed through cloudera for our OLAP workfload. I have a table that
has 10 mil rows. I try to execute the below roll up query and it takes
around 2 mins to return 1,850 rows.


SELECT SUM(UNIT_CNT_SOLD), SUM(TOTAL_SALES) FROM TRANSACTIONS GROUP BY
T_COUNTRY;

I tried applying the "joining with indices" example given on the website
<https://phoenix.apache.org/joins.html> on the TRANSACTIONS table by
creating an index on the grouped by column as below but that doesn't help.

CREATE INDEX TRANSACTIONS_COUNTRY_INDEX ON TRANSACTIONS (T_COUNTRY) INCLUDE
(UNIT_CNT_SOLD, TOTAL_SALES);

This index is not getting used when the query is executed. The query plan
is as below

+------------------------------------------+
|                   PLAN                   |
+------------------------------------------+
| CLIENT 31-CHUNK PARALLEL 31-WAY FULL SCAN OVER TRANSACTIONS |
|     SERVER AGGREGATE INTO DISTINCT ROWS BY [T_COUNTRY] |
| CLIENT MERGE SORT                        |
+------------------------------------------+

Theoretically can secondary indexes help improve the performance of group
by queries?

Any suggestions on what are different options in phoenix I could try out to
speed up GROUP BY queries?

Thanks,
Amit.

Speeding Up Group By Queries

Reply via email to