Re: Query optimization

2019-06-21 Thread Vincent Poon
I wasn't able to repro this from sqlline. The query seems to setup the correct scan with two filters: skip-scan and the column value filter. So I don't know why the join without the filter is fast for you, but with the filter it's slow. Anything else special about your tables? e.g. indexes, stats

Re: Query optimization

2019-06-19 Thread Alexander Batyrshin
Is it possible not to full scan table1 for ’table1.col = ?’, but do this check only on subset table1.pk IN (…)? > On 19 Jun 2019, at 23:31, Vincent Poon wrote: > > 'table1.col = ?' will be a full table scan of table1 unless you have a > secondary index on table.col > Check the explain plan to

Re: Query optimization

2019-06-19 Thread Vincent Poon
'table1.col = ?' will be a full table scan of table1 unless you have a secondary index on table.col Check the explain plan to see if it's working as expected On Wed, Jun 19, 2019 at 7:43 AM Alexander Batyrshin <0x62...@gmail.com> wrote: > Hello, > We have 2 tables: > > Table1 - big one (2000M+ r

Query optimization

2019-06-19 Thread Alexander Batyrshin
Hello, We have 2 tables: Table1 - big one (2000M+ rows): CREATE TABLE table1 ( pk varchar PRIMARY KEY, col varchar ); Table2 - small one (300K rows): CREATE TABLE table2 ( pk varchar PRIMARY KEY, other varchar ); Query like this work fast (~ 30sec): SELECT table1.pk, table1.c

Re: Query optimization

2018-01-02 Thread Flavio Pompermaier
Here it is: https://issues.apache.org/jira/browse/PHOENIX-4508 On Thu, Dec 28, 2017 at 9:19 AM, Flavio Pompermaier wrote: > Hi James, > What should be the subject of the JIRA? > Could you open it for me...? I'm on vacation and opening tickets on JIRA > from mobile is not that easy... > Just 2 ob

Re: Query optimization

2017-12-28 Thread Flavio Pompermaier
Hi James, What should be the subject of the JIRA? Could you open it for me...? I'm on vacation and opening tickets on JIRA from mobile is not that easy... Just 2 observations: PEOPLE table is indeed sorted by PEOPLE_ID, MY_TABLE is somewhat of a pivot table so it's MUCH bigger that PEOPLE in terms

Re: Query optimization

2017-12-27 Thread James Taylor
Looks like the second query is sorting the entire PEOPLE table (though it seems like that shouldn’t be necessary as it’s probably already sorted by PEOPLE_ID) while the first one is sorting only part of MY_TABLE (which is likely less data). Might be a bug as the queries look the same. Please log a

Re: Query optimization

2017-12-27 Thread Flavio Pompermaier
Ok. So why the 2nd query requires more memory than the first one (nonetheless USE_SORT_MERGE_JOIN is used) and can't complete? On 28 Dec 2017 00:33, "James Taylor" wrote: A hash join (the default) will be faster but the tables being cached (last or RHS table being joined) must be small enough t

Re: Query optimization

2017-12-27 Thread James Taylor
A hash join (the default) will be faster but the tables being cached (last or RHS table being joined) must be small enough to fit into memory on the region server. If it's too big, you can use the USE_SORT_MERGE_JOIN which would not have this restriction. On Wed, Dec 27, 2017 at 3:16 PM, Flavio Po

Re: Query optimization

2017-12-27 Thread Flavio Pompermaier
Just to summarize things...is the best approach, in terms of required memory, for Apache Phoenix queries to use sort merge join? Should inner queries be avoided? On 22 Dec 2017 22:47, "Flavio Pompermaier" wrote: MYTABLE is definitely much bigger than PEOPLE table, in terms of cardinality. In ter

Re: Query optimization

2017-12-22 Thread Flavio Pompermaier
MYTABLE is definitely much bigger than PEOPLE table, in terms of cardinality. In terms of cells (rows x columns) PEOPLE is probably bigger On 22 Dec 2017 22:36, "Ethan" wrote: > I see. I think client side probably hold on to the iterators from the both > sides and crawling forward to do the merg

Re: Query optimization

2017-12-22 Thread Ethan
I see. I think client side probably hold on to the iterators from the both sides and crawling forward to do the merge sort. in this case should be no much memory footprint either way where the filter is performed.  On December 22, 2017 at 1:04:18 PM, James Taylor (jamestay...@apache.org) wrote:

Re: Query optimization

2017-12-22 Thread James Taylor
There’s no shipping of any tables with a sort merge join. On Fri, Dec 22, 2017 at 1:02 PM Ethan Wang wrote: > I see. Looks like it's possible the rhs (MYTABLE) is too big to ship > around without get filtered first. Just for experiment, if you took out > hint USE_SORT_MERGE_JOIN, what will be th

Re: Query optimization

2017-12-22 Thread Ethan Wang
I see. Looks like it's possible the rhs (MYTABLE) is too big to ship around without get filtered first. Just for experiment, if you took out hint USE_SORT_MERGE_JOIN, what will be the plan? On December 22, 2017 at 12:46:25 PM, James Taylor (jamestay...@apache.org) wrote: For sort merge join, bot

Re: Query optimization

2017-12-22 Thread James Taylor
For sort merge join, both post-filtered table results are sorted on the server side and then a merge sort is done on the client-side. On Fri, Dec 22, 2017 at 12:44 PM, Ethan wrote: > Hello Flavio, > > From the plan looks like to me the second query is doing the filter at > parent table (PEOPLE).

Re: Query optimization

2017-12-22 Thread Ethan
Hello Flavio, From the plan looks like to me the second query is doing the filter at parent table (PEOPLE). So what is the size of your PEOPLE and MYTABLE (after filtered) respectively? For sort merge join, anyone knows are the both sides get shipped to client to do the merge sort? Thanks,

Re: Query optimization

2017-12-22 Thread Flavio Pompermaier
Any help here...? On 20 Dec 2017 17:58, "Flavio Pompermaier" wrote: > Hi to all, > I'm trying to find the best query for my use case but I found that one > version work and the other one does not (unless that I don't apply some > tuning to timeouts etc like explained in [1]). > > The 2 queries e

Query optimization

2017-12-20 Thread Flavio Pompermaier
Hi to all, I'm trying to find the best query for my use case but I found that one version work and the other one does not (unless that I don't apply some tuning to timeouts etc like explained in [1]). The 2 queries extract the same data but, while the first query terminates the second does not. *P