I am trying to replace case-insensitive query using JOIN/Subselect with one
using a query hint. This has worked well for our fixed-case fields. So far I
have been unable to convince Phoenix to use a functional index on a SELECT *
query. Is this a bug, or is it intended for functional indexes to ignore hints?
This is the query syntax we are trying to replace using query hints (Avg time
0.25s):
EXPLAIN SELECT * FROM LOG INNER JOIN (SELECT TS,F,R FROM LOG WHERE LOWER(RQ)
LIKE '/jquery%') AS A ON (LOG.TS = A.TS AND LOG.F=A.F AND LOG.R=A.R);
+------------------------------------------+
| CLIENT 40-CHUNK PARALLEL 40-WAY FULL SCAN OVER LOG |
| PARALLEL INNER-JOIN TABLE 0 |
| CLIENT 40-CHUNK PARALLEL 40-WAY RANGE SCAN OVER LOG_LOWER_REQUEST_IDX
[0,'/jquery'] - [0,'/jquerz'] |
| SERVER FILTER BY FIRST KEY ONLY |
| DYNAMIC SERVER FILTER BY (LOG.TS, LOG.F, LOG.R) IN ((A.TS, A.F, A.R)) |
+------------------------------------------+
This is closer to the query we want - Hint on the non-functional index
generates an expected, but non-optimal due to the ILIKE/function on RQ (avg
time 0.78s)
EXPLAIN SELECT /*+ INDEX(LOG LOG_REQUEST_IDX) */ * FROM LOG WHERE RQ ILIKE
'/jquery%';
+------------------------------------------+
| CLIENT 40-CHUNK PARALLEL 40-WAY FULL SCAN OVER LOG |
| SKIP-SCAN-JOIN TABLE 0 |
| CLIENT 40-CHUNK PARALLEL 40-WAY FULL SCAN OVER LOG_REQUEST_IDX |
| SERVER FILTER BY FIRST KEY ONLY AND "RQ" LIKE '/jquery%' |
| DYNAMIC SERVER FILTER BY ("LOG.TS", "LOG.F", "LOG.R") IN (($707.$709,
$707.$710, $707.$711)) |
+------------------------------------------+
This is what we REALLY want it to do, but the hint is ignored for the
functional index (Avg 2.57s)
EXPLAIN SELECT /*+ INDEX(LOG LOG_LOWER_REQUEST_IDX) */ * FROM LOG WHERE
LOWER(RQ) LIKE '/jquery%';
+------------------------------------------+
| CLIENT 40-CHUNK PARALLEL 40-WAY FULL SCAN OVER LOG |
| SERVER FILTER BY LOWER(RQ) LIKE '/jquery%' |
+------------------------------------------+
Test table has 2.9 million records; production table is many orders of
magnitude larger. Table is actually much wider than sample schema below (30+
cols) so INCLUDE() isn't viable. Users want all the columns, all the time.
Here's a simplified schema for the table:
CREATE TABLE IF NOT EXISTS LOG
(
TS VARCHAR NOT NULL,
f VARCHAR NOT NULL,
r INTEGER NOT NULL,
sa VARCHAR,
da VARCHAR,
rq VARCHAR
CONSTRAINT pkey PRIMARY KEY (TS, f, r)
)
TTL='5616000',KEEP_DELETED_CELLS='false',IMMUTABLE_ROWS=true,COMPRESSION='SNAPPY',SALT_BUCKETS=40,MAX_FILESIZE='10000000000',SPLIT_POLICY='org.apache.hadoop.hbase.regionserver.ConstantSizeRegionSplitPolicy';
CREATE INDEX IF NOT EXISTS LOG_LOWER_REQUEST_IDX ON LOG(LOWER(rq))
TTL='5616000',KEEP_DELETED_CELLS='false',COMPRESSION='SNAPPY',MAX_FILESIZE='10000000000',SPLIT_POLICY='org.apache.hadoop.hbase.regionserver.ConstantSizeRegionSplitPolicy';
CREATE INDEX IF NOT EXISTS LOG_REQUEST_IDX ON LOG(rq)
TTL='5616000',KEEP_DELETED_CELLS='false',COMPRESSION='SNAPPY',MAX_FILESIZE='10000000000',SPLIT_POLICY='org.apache.hadoop.hbase.regionserver.ConstantSizeRegionSplitPolicy';
Bryan G.