I am trying to replace case-insensitive query using JOIN/Subselect with one using a query hint. This has worked well for our fixed-case fields. So far I have been unable to convince Phoenix to use a functional index on a SELECT * query. Is this a bug, or is it intended for functional indexes to ignore hints?
This is the query syntax we are trying to replace using query hints (Avg time 0.25s): EXPLAIN SELECT * FROM LOG INNER JOIN (SELECT TS,F,R FROM LOG WHERE LOWER(RQ) LIKE '/jquery%') AS A ON (LOG.TS = A.TS AND LOG.F=A.F AND LOG.R=A.R); +------------------------------------------+ | CLIENT 40-CHUNK PARALLEL 40-WAY FULL SCAN OVER LOG | | PARALLEL INNER-JOIN TABLE 0 | | CLIENT 40-CHUNK PARALLEL 40-WAY RANGE SCAN OVER LOG_LOWER_REQUEST_IDX [0,'/jquery'] - [0,'/jquerz'] | | SERVER FILTER BY FIRST KEY ONLY | | DYNAMIC SERVER FILTER BY (LOG.TS, LOG.F, LOG.R) IN ((A.TS, A.F, A.R)) | +------------------------------------------+ This is closer to the query we want - Hint on the non-functional index generates an expected, but non-optimal due to the ILIKE/function on RQ (avg time 0.78s) EXPLAIN SELECT /*+ INDEX(LOG LOG_REQUEST_IDX) */ * FROM LOG WHERE RQ ILIKE '/jquery%'; +------------------------------------------+ | CLIENT 40-CHUNK PARALLEL 40-WAY FULL SCAN OVER LOG | | SKIP-SCAN-JOIN TABLE 0 | | CLIENT 40-CHUNK PARALLEL 40-WAY FULL SCAN OVER LOG_REQUEST_IDX | | SERVER FILTER BY FIRST KEY ONLY AND "RQ" LIKE '/jquery%' | | DYNAMIC SERVER FILTER BY ("LOG.TS", "LOG.F", "LOG.R") IN (($707.$709, $707.$710, $707.$711)) | +------------------------------------------+ This is what we REALLY want it to do, but the hint is ignored for the functional index (Avg 2.57s) EXPLAIN SELECT /*+ INDEX(LOG LOG_LOWER_REQUEST_IDX) */ * FROM LOG WHERE LOWER(RQ) LIKE '/jquery%'; +------------------------------------------+ | CLIENT 40-CHUNK PARALLEL 40-WAY FULL SCAN OVER LOG | | SERVER FILTER BY LOWER(RQ) LIKE '/jquery%' | +------------------------------------------+ Test table has 2.9 million records; production table is many orders of magnitude larger. Table is actually much wider than sample schema below (30+ cols) so INCLUDE() isn't viable. Users want all the columns, all the time. Here's a simplified schema for the table: CREATE TABLE IF NOT EXISTS LOG ( TS VARCHAR NOT NULL, f VARCHAR NOT NULL, r INTEGER NOT NULL, sa VARCHAR, da VARCHAR, rq VARCHAR CONSTRAINT pkey PRIMARY KEY (TS, f, r) ) TTL='5616000',KEEP_DELETED_CELLS='false',IMMUTABLE_ROWS=true,COMPRESSION='SNAPPY',SALT_BUCKETS=40,MAX_FILESIZE='10000000000',SPLIT_POLICY='org.apache.hadoop.hbase.regionserver.ConstantSizeRegionSplitPolicy'; CREATE INDEX IF NOT EXISTS LOG_LOWER_REQUEST_IDX ON LOG(LOWER(rq)) TTL='5616000',KEEP_DELETED_CELLS='false',COMPRESSION='SNAPPY',MAX_FILESIZE='10000000000',SPLIT_POLICY='org.apache.hadoop.hbase.regionserver.ConstantSizeRegionSplitPolicy'; CREATE INDEX IF NOT EXISTS LOG_REQUEST_IDX ON LOG(rq) TTL='5616000',KEEP_DELETED_CELLS='false',COMPRESSION='SNAPPY',MAX_FILESIZE='10000000000',SPLIT_POLICY='org.apache.hadoop.hbase.regionserver.ConstantSizeRegionSplitPolicy'; Bryan G.