PHOENIX-2094 created.

Covering the queries isn’t a viable option right now, the table is 5TB already 
and we have multiple indexes where we are trying to optimize case-insensitive 
queries. Fortunately we can brute-force the plan with the JOIN & subselect to 
get what we need near-term.

Bryan G.

From: James Taylor [mailto:jamestay...@apache.org]
Sent: Tuesday, June 30, 2015 6:56 PM
To: user
Subject: Re: Query Hints on Functional Index

Thanks for the detail, Bryan. Looks like a bug (as you've concluded) in the 
hinting mechanism wrt functional indexes. Would you mind filing a JIRA? FWIW, I 
tried the following and when only columns contained in the functional index are 
used, the functional index is used as desired (see below).

Might be worth trying to include those 30+ cols in the functional index. Your 
query performance will be much better. Have you tried that?

Thanks,
James

0: jdbc:phoenix:localhost> EXPLAIN SELECT /*+ INDEX(LOG LOG_LOWER_REQUEST_IDX) 
*/ * FROM LOG WHERE LOWER(RQ)='/jquery';
+------------------------------------------+
|                   PLAN                   |
+------------------------------------------+
| CLIENT 40-CHUNK PARALLEL 40-WAY FULL SCAN OVER LOG |
|     SERVER FILTER BY LOWER(RQ) = '/jquery' |
+------------------------------------------+
2 rows selected (0.022 seconds)
0: jdbc:phoenix:localhost> EXPLAIN SELECT F FROM LOG WHERE LOWER(RQ) LIKE 
'/jquery%';
+------------------------------------------+
|                   PLAN                   |
+------------------------------------------+
| CLIENT 40-CHUNK PARALLEL 40-WAY RANGE SCAN OVER LOG_LOWER_REQUEST_IDX 
[0,'/jquery'] - [0,'/jquerz'] |
|     SERVER FILTER BY FIRST KEY ONLY      |
+------------------------------------------+


On Tue, Jun 30, 2015 at 1:54 PM, Gerber, Bryan W 
<bryan.ger...@pnnl.gov<mailto:bryan.ger...@pnnl.gov>> wrote:
I am trying to replace case-insensitive query using JOIN/Subselect with one 
using a query hint. This has worked well for our fixed-case fields. So far I 
have been unable to convince Phoenix to use a functional index on a SELECT * 
query.  Is this a bug, or is it intended for functional indexes to ignore hints?

This is the query syntax we are trying to replace using query hints (Avg time 
0.25s):

EXPLAIN SELECT * FROM LOG INNER JOIN (SELECT TS,F,R FROM LOG WHERE LOWER(RQ) 
LIKE '/jquery%') AS A ON  (LOG.TS = A.TS AND LOG.F=A.F AND LOG.R=A.R);
+------------------------------------------+
| CLIENT 40-CHUNK PARALLEL 40-WAY FULL SCAN OVER LOG |
|     PARALLEL INNER-JOIN TABLE 0          |
|         CLIENT 40-CHUNK PARALLEL 40-WAY RANGE SCAN OVER LOG_LOWER_REQUEST_IDX 
[0,'/jquery'] - [0,'/jquerz'] |
|             SERVER FILTER BY FIRST KEY ONLY |
|     DYNAMIC SERVER FILTER BY (LOG.TS, LOG.F, LOG.R) IN ((A.TS, A.F, A.R)) |
+------------------------------------------+

This is closer to the query we want - Hint on the non-functional index 
generates an expected, but non-optimal due to the ILIKE/function on RQ (avg 
time 0.78s)
EXPLAIN SELECT /*+ INDEX(LOG LOG_REQUEST_IDX) */ * FROM LOG WHERE RQ ILIKE 
'/jquery%';
+------------------------------------------+
| CLIENT 40-CHUNK PARALLEL 40-WAY FULL SCAN OVER LOG |
|     SKIP-SCAN-JOIN TABLE 0               |
|         CLIENT 40-CHUNK PARALLEL 40-WAY FULL SCAN OVER LOG_REQUEST_IDX |
|             SERVER FILTER BY FIRST KEY ONLY AND "RQ" LIKE '/jquery%' |
|     DYNAMIC SERVER FILTER BY ("LOG.TS", "LOG.F", "LOG.R") IN (($707.$709, 
$707.$710, $707.$711)) |
+------------------------------------------+

This is what we REALLY want it to do, but the hint is ignored for the 
functional index (Avg 2.57s)
 EXPLAIN SELECT /*+ INDEX(LOG LOG_LOWER_REQUEST_IDX) */ * FROM LOG WHERE 
LOWER(RQ) LIKE '/jquery%';
+------------------------------------------+
| CLIENT 40-CHUNK PARALLEL 40-WAY FULL SCAN OVER LOG |
|     SERVER FILTER BY LOWER(RQ) LIKE '/jquery%' |
+------------------------------------------+

Test table has 2.9 million records; production table is many orders of 
magnitude larger. Table is actually much wider than sample schema below (30+ 
cols) so INCLUDE() isn't viable.  Users want all the columns, all the time.
Here’s a simplified schema for the table:
CREATE TABLE IF NOT EXISTS LOG
(
    TS VARCHAR NOT NULL,
    f VARCHAR NOT NULL,
    r INTEGER NOT NULL,
    sa VARCHAR,
    da VARCHAR,
    rq VARCHAR
CONSTRAINT pkey PRIMARY KEY (TS, f, r)
) 
TTL='5616000',KEEP_DELETED_CELLS='false',IMMUTABLE_ROWS=true,COMPRESSION='SNAPPY',SALT_BUCKETS=40,MAX_FILESIZE='10000000000',SPLIT_POLICY='org.apache.hadoop.hbase.regionserver.ConstantSizeRegionSplitPolicy';

CREATE INDEX IF NOT EXISTS LOG_LOWER_REQUEST_IDX  ON LOG(LOWER(rq)) 
TTL='5616000',KEEP_DELETED_CELLS='false',COMPRESSION='SNAPPY',MAX_FILESIZE='10000000000',SPLIT_POLICY='org.apache.hadoop.hbase.regionserver.ConstantSizeRegionSplitPolicy';
CREATE INDEX IF NOT EXISTS LOG_REQUEST_IDX  ON LOG(rq) 
TTL='5616000',KEEP_DELETED_CELLS='false',COMPRESSION='SNAPPY',MAX_FILESIZE='10000000000',SPLIT_POLICY='org.apache.hadoop.hbase.regionserver.ConstantSizeRegionSplitPolicy';
Bryan G.


Reply via email to