Jonathan, So much helpful explanations! Skip scan and index including any column in a query make a big difference. I'll try row timestamp more and see what happens.
Thanks, NaHeon 2017-02-24 13:25 GMT+09:00 Jonathan Leech <jonat...@gmail.com>: > 1. No, I am not confused. A skip scan would "skip" over entire ranges of > obj_id and all create_dt values for it. This will only be effective if > there are many less distinct values of obj_id than there are total rows. If > there are too many distinct obj_ids then it either wont speed the query up > at all, or not enough, but it's simple to try it and see. > > 2. Your index isnt used because it doesn't contain the other columns used > in the query; e.g your query is isn't "covered". You get the column(s) > defined in the index + anything in the rowkey. You can also use the > "include" keyword to add other columns to the index. Alternatively, you can > look at "local" indexes, or it may be possible to nest the query with a > sub-select to fetch the desired primary key values from the index by > create_dt and the others from the main table. > > 3. No, not all. Phoenix will assign the the internal hbase timestamp of > the row to whatever you set to create_dt. It can also automatically set it > to the current time when you create the row, if you want it to. This has > other implications; e.g if you set a TTL, versions, etc in hbase. It can > speed up queries, especially those that execute on the most recent data > written, but prior to hbase compaction. Advanced stuff and performance is > highly dependent on your specific use case and hbase compaction settings... > > On Feb 23, 2017, at 7:59 PM, NaHeon Kim <honey.and...@gmail.com> wrote: > > Thanks for suggestion. > > Here's further questions: > 1. create_dt (not obj_id, I think you confused) would have large sets of > date, so SKIP_SCAN hint might be not useful. > > 2. I created secondary index on create_dt > create index IDX1_CREATE_DT on MY_TABLE(CREATE_DT; > > However, EXPLAIN still shows query plan of FULL SCAN. > Giving index hint on SELECT doesn't work as well. > > CLIENT 2-CHUNK 0 ROWS 0 BYTES PARALLEL 2-WAY FULL SCAN OVER > NEWS_KEYWORD_COUNT > SERVER FILTER BY (CREATE_DT > TIMESTAMP '2017-03-01 00:00:00.000' AND > CREATE_DT < TIMESTAMP '2017-04-01 00:00:00.000') > SERVER AGGREGATE INTO DISTINCT ROWS BY [KEYWORD] > CLIENT MERGE SORT > CLIENT 100 ROW LIMIT > > 3. ROW_TIMESTAMP is time of current query execution time, right? > Then it's not a right choice. :-( > > > 2017-02-24 1:54 GMT+09:00 Jonathan Leech <jonat...@gmail.com>: > >> If there are not a large number of distinct values of obj_id, try a >> SKIP_SCAN hint. Otherwise, the secondary index should work, make sure it's >> actually used via explain. Finally, you might try the ROW_TIMESTAMP feature >> if it fits your use case. >> >> On Feb 22, 2017, at 11:30 PM, NaHeon Kim <honey.and...@gmail.com> wrote: >> >> Hi all, >> I've seen performance problem when selecting rows within date range. >> >> My table schema is: >> >> CREATE TABLE MY_TABLE ( >> OBJ_ID varchar(20) not null, >> CREATE_DT timestamp not null, >> KEYWORD varchar(100) not null, >> COUNT integer, >> CONSTRAINT PK PRIMARY KEY (OBJ_ID,CREATE_DT,KEYWORD) >> ); >> >> MY_TABLE has almost 5,200,000 rows, >> CREATE_DT has about 6 months range. >> >> And executed query: >> >> SELECT KEYWORD, SUM(COUNT) >> FROM MY_TABLE >> WHERE CREATE_DT > to_timestamp('2016-03-01 00:00') >> AND CREATE_DT < to_timestamp('2016-04-01 00:00') >> GROUP BY KEYWORD; >> >> It tooks 46 seconds, too slow than expected, cause CREATE_DT is one of >> row key. >> I created a secondary index on CREATE_DT but there's no improvement. >> >> Query plan looks weird: >> CLIENT 2-CHUNK 0 ROWS 0 BYTES PARALLEL 2-WAY FULL SCAN OVER >> NEWS_KEYWORD_COUNT >> SERVER FILTER BY (CREATE_DT > TIMESTAMP '2017-03-01 00:00:00.000' >> AND CREATE_DT < TIMESTAMP '2017-04-01 00:00:00.000') >> SERVER AGGREGATE INTO DISTINCT ROWS BY [KEYWORD] >> CLIENT MERGE SORT >> CLIENT 100 ROW LIMIT >> >> BUT If CREATE_DT comes first of row key, plan says range scan will be >> done. >> >> Any suggestion? : ) >> >> Thanks, >> NaHeon >> >> >