I am trying to replace case-insensitive query using JOIN/Subselect with one 
using a query hint. This has worked well for our fixed-case fields. So far I 
have been unable to convince Phoenix to use a functional index on a SELECT * 
query.  Is this a bug, or is it intended for functional indexes to ignore hints?

This is the query syntax we are trying to replace using query hints (Avg time 
0.25s):

EXPLAIN SELECT * FROM LOG INNER JOIN (SELECT TS,F,R FROM LOG WHERE LOWER(RQ) 
LIKE '/jquery%') AS A ON  (LOG.TS = A.TS AND LOG.F=A.F AND LOG.R=A.R);
+------------------------------------------+
| CLIENT 40-CHUNK PARALLEL 40-WAY FULL SCAN OVER LOG |
|     PARALLEL INNER-JOIN TABLE 0          |
|         CLIENT 40-CHUNK PARALLEL 40-WAY RANGE SCAN OVER LOG_LOWER_REQUEST_IDX 
[0,'/jquery'] - [0,'/jquerz'] |
|             SERVER FILTER BY FIRST KEY ONLY |
|     DYNAMIC SERVER FILTER BY (LOG.TS, LOG.F, LOG.R) IN ((A.TS, A.F, A.R)) |
+------------------------------------------+

This is closer to the query we want - Hint on the non-functional index 
generates an expected, but non-optimal due to the ILIKE/function on RQ (avg 
time 0.78s)
EXPLAIN SELECT /*+ INDEX(LOG LOG_REQUEST_IDX) */ * FROM LOG WHERE RQ ILIKE 
'/jquery%';
+------------------------------------------+
| CLIENT 40-CHUNK PARALLEL 40-WAY FULL SCAN OVER LOG |
|     SKIP-SCAN-JOIN TABLE 0               |
|         CLIENT 40-CHUNK PARALLEL 40-WAY FULL SCAN OVER LOG_REQUEST_IDX |
|             SERVER FILTER BY FIRST KEY ONLY AND "RQ" LIKE '/jquery%' |
|     DYNAMIC SERVER FILTER BY ("LOG.TS", "LOG.F", "LOG.R") IN (($707.$709, 
$707.$710, $707.$711)) |
+------------------------------------------+

This is what we REALLY want it to do, but the hint is ignored for the 
functional index (Avg 2.57s)
 EXPLAIN SELECT /*+ INDEX(LOG LOG_LOWER_REQUEST_IDX) */ * FROM LOG WHERE 
LOWER(RQ) LIKE '/jquery%';
+------------------------------------------+
| CLIENT 40-CHUNK PARALLEL 40-WAY FULL SCAN OVER LOG |
|     SERVER FILTER BY LOWER(RQ) LIKE '/jquery%' |
+------------------------------------------+

Test table has 2.9 million records; production table is many orders of 
magnitude larger. Table is actually much wider than sample schema below (30+ 
cols) so INCLUDE() isn't viable.  Users want all the columns, all the time.
Here's a simplified schema for the table:
CREATE TABLE IF NOT EXISTS LOG
(
    TS VARCHAR NOT NULL,
    f VARCHAR NOT NULL,
    r INTEGER NOT NULL,
    sa VARCHAR,
    da VARCHAR,
    rq VARCHAR
CONSTRAINT pkey PRIMARY KEY (TS, f, r)
) 
TTL='5616000',KEEP_DELETED_CELLS='false',IMMUTABLE_ROWS=true,COMPRESSION='SNAPPY',SALT_BUCKETS=40,MAX_FILESIZE='10000000000',SPLIT_POLICY='org.apache.hadoop.hbase.regionserver.ConstantSizeRegionSplitPolicy';

CREATE INDEX IF NOT EXISTS LOG_LOWER_REQUEST_IDX  ON LOG(LOWER(rq)) 
TTL='5616000',KEEP_DELETED_CELLS='false',COMPRESSION='SNAPPY',MAX_FILESIZE='10000000000',SPLIT_POLICY='org.apache.hadoop.hbase.regionserver.ConstantSizeRegionSplitPolicy';
CREATE INDEX IF NOT EXISTS LOG_REQUEST_IDX  ON LOG(rq) 
TTL='5616000',KEEP_DELETED_CELLS='false',COMPRESSION='SNAPPY',MAX_FILESIZE='10000000000',SPLIT_POLICY='org.apache.hadoop.hbase.regionserver.ConstantSizeRegionSplitPolicy';
Bryan G.

Reply via email to