Hi all,
I've seen performance problem when selecting rows within date range.
My table schema is:
CREATE TABLE MY_TABLE (
OBJ_ID varchar(20) not null,
CREATE_DT timestamp not null,
KEYWORD varchar(100) not null,
COUNT integer,
CONSTRAINT PK PRIMARY KEY (OBJ_ID,CREATE_DT,KEYWORD)
);
MY_TABLE has almost 5,200,000 rows,
CREATE_DT has about 6 months range.
And executed query:
SELECT KEYWORD, SUM(COUNT)
FROM MY_TABLE
WHERE CREATE_DT > to_timestamp('2016-03-01 00:00')
AND CREATE_DT < to_timestamp('2016-04-01 00:00')
GROUP BY KEYWORD;
It tooks 46 seconds, too slow than expected, cause CREATE_DT is one of row
key.
I created a secondary index on CREATE_DT but there's no improvement.
Query plan looks weird:
CLIENT 2-CHUNK 0 ROWS 0 BYTES PARALLEL 2-WAY FULL SCAN OVER
NEWS_KEYWORD_COUNT
SERVER FILTER BY (CREATE_DT > TIMESTAMP '2017-03-01 00:00:00.000' AND
CREATE_DT < TIMESTAMP '2017-04-01 00:00:00.000')
SERVER AGGREGATE INTO DISTINCT ROWS BY [KEYWORD]
CLIENT MERGE SORT
CLIENT 100 ROW LIMIT
BUT If CREATE_DT comes first of row key, plan says range scan will be done.
Any suggestion? : )
Thanks,
NaHeon