I wasn't able to repro this from sqlline. The query seems to setup the
correct scan with two filters: skip-scan and the column value filter.
So I don't know why the join without the filter is fast for you, but with
the filter it's slow.
Anything else special about your tables? e.g. indexes, stats
Is it possible not to full scan table1 for ’table1.col = ?’, but do this check
only on subset table1.pk IN (…)?
> On 19 Jun 2019, at 23:31, Vincent Poon wrote:
>
> 'table1.col = ?' will be a full table scan of table1 unless you have a
> secondary index on table.col
> Check the explain plan to
'table1.col = ?' will be a full table scan of table1 unless you have a
secondary index on table.col
Check the explain plan to see if it's working as expected
On Wed, Jun 19, 2019 at 7:43 AM Alexander Batyrshin <0x62...@gmail.com>
wrote:
> Hello,
> We have 2 tables:
>
> Table1 - big one (2000M+ r
Hello,
We have 2 tables:
Table1 - big one (2000M+ rows):
CREATE TABLE table1 (
pk varchar PRIMARY KEY,
col varchar
);
Table2 - small one (300K rows):
CREATE TABLE table2 (
pk varchar PRIMARY KEY,
other varchar
);
Query like this work fast (~ 30sec):
SELECT table1.pk, table1.c
Here it is: https://issues.apache.org/jira/browse/PHOENIX-4508
On Thu, Dec 28, 2017 at 9:19 AM, Flavio Pompermaier
wrote:
> Hi James,
> What should be the subject of the JIRA?
> Could you open it for me...? I'm on vacation and opening tickets on JIRA
> from mobile is not that easy...
> Just 2 ob
Hi James,
What should be the subject of the JIRA?
Could you open it for me...? I'm on vacation and opening tickets on JIRA
from mobile is not that easy...
Just 2 observations: PEOPLE table is indeed sorted by PEOPLE_ID, MY_TABLE
is somewhat of a pivot table so it's MUCH bigger that PEOPLE in terms
Looks like the second query is sorting the entire PEOPLE table (though it
seems like that shouldn’t be necessary as it’s probably already sorted by
PEOPLE_ID) while the first one is sorting only part of MY_TABLE (which is
likely less data). Might be a bug as the queries look the same.
Please log a
Ok. So why the 2nd query requires more memory than the first one
(nonetheless USE_SORT_MERGE_JOIN is used) and can't complete?
On 28 Dec 2017 00:33, "James Taylor" wrote:
A hash join (the default) will be faster but the tables being cached (last
or RHS table being joined) must be small enough t
A hash join (the default) will be faster but the tables being cached (last
or RHS table being joined) must be small enough to fit into memory on the
region server. If it's too big, you can use the USE_SORT_MERGE_JOIN which
would not have this restriction.
On Wed, Dec 27, 2017 at 3:16 PM, Flavio Po
Just to summarize things...is the best approach, in terms of required
memory, for Apache Phoenix queries to use sort merge join? Should inner
queries be avoided?
On 22 Dec 2017 22:47, "Flavio Pompermaier" wrote:
MYTABLE is definitely much bigger than PEOPLE table, in terms of
cardinality. In ter
MYTABLE is definitely much bigger than PEOPLE table, in terms of
cardinality. In terms of cells (rows x columns) PEOPLE is probably bigger
On 22 Dec 2017 22:36, "Ethan" wrote:
> I see. I think client side probably hold on to the iterators from the both
> sides and crawling forward to do the merg
I see. I think client side probably hold on to the iterators from the both
sides and crawling forward to do the merge sort. in this case should be no much
memory footprint either way where the filter is performed.
On December 22, 2017 at 1:04:18 PM, James Taylor (jamestay...@apache.org) wrote:
There’s no shipping of any tables with a sort merge join.
On Fri, Dec 22, 2017 at 1:02 PM Ethan Wang wrote:
> I see. Looks like it's possible the rhs (MYTABLE) is too big to ship
> around without get filtered first. Just for experiment, if you took out
> hint USE_SORT_MERGE_JOIN, what will be th
I see. Looks like it's possible the rhs (MYTABLE) is too big to ship around
without get filtered first. Just for experiment, if you took out hint
USE_SORT_MERGE_JOIN, what will be the plan?
On December 22, 2017 at 12:46:25 PM, James Taylor (jamestay...@apache.org)
wrote:
For sort merge join, bot
For sort merge join, both post-filtered table results are sorted on the
server side and then a merge sort is done on the client-side.
On Fri, Dec 22, 2017 at 12:44 PM, Ethan wrote:
> Hello Flavio,
>
> From the plan looks like to me the second query is doing the filter at
> parent table (PEOPLE).
Hello Flavio,
From the plan looks like to me the second query is doing the filter at parent
table (PEOPLE). So what is the size of your PEOPLE and MYTABLE (after filtered)
respectively?
For sort merge join, anyone knows are the both sides get shipped to client to
do the merge sort?
Thanks,
Any help here...?
On 20 Dec 2017 17:58, "Flavio Pompermaier" wrote:
> Hi to all,
> I'm trying to find the best query for my use case but I found that one
> version work and the other one does not (unless that I don't apply some
> tuning to timeouts etc like explained in [1]).
>
> The 2 queries e
Hi to all,
I'm trying to find the best query for my use case but I found that one
version work and the other one does not (unless that I don't apply some
tuning to timeouts etc like explained in [1]).
The 2 queries extract the same data but, while the first query terminates
the second does not.
*P
18 matches
Mail list logo