I suspect this problem is similar to PHOENIX-4288.
On Mon, Oct 30, 2017 at 11:26 PM James Taylor <[email protected]>
wrote:

> Please file a JIRA and include the explain plan for each of the queries. I
> suspect your index is not being used in the first query due to the
> selection of all the columns. You can try hinting the query to force your
> index to be used. See
> https://phoenix.apache.org/secondary_indexing.html#Index_Usage
>
> Thanks,
> James
>
> On Mon, Oct 30, 2017 at 7:02 AM, Marcin Januszkiewicz <
> [email protected]> wrote:
>
>> We have a wide table with 100M records created with the following DDL:
>>
>> CREATE TABLE traces (
>>   rowkey VARCHAR PRIMARY KEY,
>>   time VARCHAR,
>>   number VARCHAR,
>>   +40 more columns)
>>
>> We want to select a large (~30M records) subset of this data with the
>> query:
>>
>> SELECT *all columns*
>>   FROM traces
>>   WHERE (UPPER(number) LIKE 'PO %')
>>   ORDER BY time DESC, ROWKEY
>>   LIMIT 101;
>>
>> This times out after 15 minutes and puts a huge load on our cluster.
>> We have an alternate way of selecting this data:
>>
>> SELECT t.rowkey, *all columns*
>> FROM TRACES t
>> JOIN (
>>   SELECT rowkey
>>   FROM TRACES
>>   WHERE (UPPER(number) LIKE 'PO %')
>>   ORDER BY time DESC, ROWKEY
>>   LIMIT 101
>> ) ix
>> ON t.ROWKEY = ix.ROWKEY
>> order by t.ROWKEY;
>>
>> Which completes in just under a minute.
>> Is there a better way to construct this query?
>> When is using the self-join a worse choice than the simple select?
>> Given that we have a functional index on UPPER(number), could this
>> potentially be a statistics-based optimizer decision?
>>
>> --
>> Pozdrawiam,
>> Marcin Januszkiewicz
>>
>
>

Reply via email to