Re: querying time for Apache Phoenix

James Taylor Tue, 26 Jul 2016 09:51:10 -0700

Hi Irina,
Answers inline below. Would be helpful too to see your CREATE TABLE and
CREATE INDEX statements. Also, you're likely already doing this, but for
benchmarking, it's always good to run the same query multiple times and
average them (or take the max or 95% percentile). Do you know about our
Pherf, our perf testing tool: https://phoenix.apache.org/pherf.html ?
Thanks,
James


On Tue, Jul 26, 2016 at 9:22 AM, Placinta, Irina (ELS) <
[email protected]> wrote:

> I recreated the table with UPDATE_CACHE_FREQUENCY and used
> PreparedStatements for querying.
>
Make sure to set UPDATE_CACHE_FREQUENCY on the CREATE INDEX call as well.

> - select * from documents where profile_id = ? (multiple rows returned,
> doing a full scan)              --- 5 sec
>
See https://phoenix.apache.org/secondary_indexing.html#Index_Usage for when
an index is used/not used, but by default a global index won't be used if
not all columns are contained in the index (i.e. covered index).

Also, do you know how to generate an explain plan in Phoenix, as this will
tell you in advance if the index is going to be used:
https://phoenix.apache.org/language/index.html#explain

> - select PROFILE_ID, count( * ) from documents_test group by PROFILE_ID
>  -- 1.5 sec
>
Make sure your index on PROFILE_ID is being used. This requires scanning
through all rows, FYI.

> - select profile_id from documents_test order by added desc limit 1
> (index on added) --- 5.5 sec
>
If PROFILE_ID is not in the primary key constraint of the table, make sure
to include it in your index on the ADDED column.

>
> Are there any timelines for the next release?
>
We're in the process of releasing 4.8.0. The first RC failed, but the
second one should be up by the end of the week. We're going to try doing
monthly point releases after that with a minor release every three months.
But given that it's open source, it can be difficult to pin this down
exactly.

>
>
> Thanks,
>
> Irina
> ------------------------------
> *From:* James Taylor <[email protected]>
> *Sent:* Tuesday, July 26, 2016 4:39:46 PM
> *To:* user
> *Subject:* Re: querying time for Apache Phoenix
>
> Hi Irina,
>
> I'd recommend trying the following:
> - set the UPDATE_CACHE_FREQUENCY=60000 property when you create your table
> and index to prevent extra RPCs at query time.
> - if you're querying for a single row, use the serial and small hints like
> this: /*+ SERIAL SMALL */
> - though not strictly necessary, try using the index hint like this:
> /*+ INDEX(my_table my_index) */
> - use PreparedStatement to prevent extra parsing
>
> Have you tried other types of queries too that do aggregation, topN, range
> scans, sorts, etc?
>
> In the next release, we'll work on having better default values for these
> as well as driving them in a cost-based manner.
>
> Thanks,
> James
>
>
>
> On Tue, Jul 26, 2016 at 8:07 AM, Placinta, Irina (ELS) <
> [email protected]> wrote:
>
>> Hi,
>>
>>
>> We are interested in querying performance of Phoenix on small to large
>> datasets. We have Apache Phoenix installed on an EMR with 5 instances.
>>
>>
>> The querying times we get are slow compared to the equivalent query in
>> hbase, for example:
>>
>>
>> Table Documents with primary key UUID and index on profile_id
>>
>>
>>
>> Apache Phoenix Hbase
>> 400k rows dataset: select * from documents where uuid = 10-a
>> 0.25 sec 0.02 sec
>> 400k rows dataset: select profile_id from documents where uuid = 10-a
>> 0.20 sec 0.02 sec
>>
>> Hbase seems 10x faster than Phoenix, is there some tuning we can do to
>> achieve better results?
>> We are querying the DB programatically (scala) & also using the
>> client sqlline.
>>
>> Thank you!
>>
>
>

Re: querying time for Apache Phoenix

Reply via email to