Hi,
Is there any document which can help me understand explain plan output in
detail? Or, which piece of code should I look at, to get an idea?
Here is explain plan for inner join query below. Can anyone help in explaining
it to me? Like, as per the plan which table is being cached, etc.?Here,
indx_exdocb is index table (on ID) and exDocStoreb is main table with rowkey as
(current_timestamp, ID).
+------------------------------------------+| PLAN
|+------------------------------------------+| CLIENT 36-CHUNK PARALLEL
36-WAY FULL SCAN OVER exDocStoreb || PARALLEL INNER-JOIN TABLE 0 (SKIP
MERGE) || CLIENT 36-CHUNK PARALLEL 36-WAY RANGE SCAN OVER indx_exdocb
[0,'42ecf4abd4bd7e7606025dc8eee3de6a3cc04418cbc2619ddc01f54d88d7c3bf'] -
[0,'42ecf4abd4bd7e7606025dc8eee3de6a3cc04418cbc2619ddc01f54d88d7c3bg' ||
SERVER FILTER BY FIRST KEY ONLY || SERVER AGGREGATE INTO
ORDERED DISTINCT ROWS BY ["ID"] || CLIENT MERGE SORT ||
DYNAMIC SERVER FILTER BY (A.CURRENT_TIMESTAMP, A.ID) IN ((TMP.MCT, TMP.TID))
|+------------------------------------------+
Also, is there a way to turn ON more verbose explain plan? Like, seeing number
of bytes, rows that each step results in?
Thanks,Sumit
From: Sumit Nigam <[email protected]>
To: Users Mail List Phoenix <[email protected]>
Sent: Tuesday, September 27, 2016 9:17 PM
Subject: Hash join confusion
Hi,
I am using hbase 1.1 with phoenix 4.6.
I have a table with row key as (current_timestamp, id) which is salted and
index on (id). This table has ~3 million records.
I have a query like given below.
SELECT ID, CURRENT_TIMESTAMP, <SOME OTHER COLUMNS> from TBL
as a inner join ( select
max(CURRENT_TIMESTAMP) as mct, ID as tid from TBL where ID like
'42ecf4abd4bd7e7606025dc8eee3de6a3cc04418cbc2619ddc01f54d88d7c3bf%' group by
ID) as tmp on a.ID=tmp.tid and
a.CURRENT_TIMESTAMP=tmp.mct
The query hangs for long and finally fails with a timeout. I have 12 region
servers each with 5GB heap and also the total records satisfying the above
query is 62K whose CSV dump is ~10MB only.
DoNotRetryIOException: Could not find hash cache for join Id: Ӧ�8�D�. The cache
might have expired and have been removed
and -
Caused by: java.sql.SQLException: Encountered exception in sub plan [0]
execution.
at
org.apache.phoenix.execute.HashJoinPlan.iterator(HashJoinPlan.java:175)
at
com.infa.products.ldm.ingestion.server.java.hadoop.impl.FixPhoenixIngestInputFormat.getQueryPlan(FixPhoenixIngestInputFormat.java:94)
... 22 more
and
Caused by: java.sql.SQLException:
java.util.concurrent.TimeoutException at
org.apache.phoenix.cache.ServerCacheClient.addServerCache(ServerCacheClient.java:264)
I can try playing around with parameters such as
phoenix.coprocessor.maxServerCacheTimeToLiveMs and switching to sort_merge_join
actually helped.
But my question is as per Joins | Apache Phoenix in a case such as lhs INNER
JOIN rhs, it is rhs which will be built as hash table in server cache. So, in
the above query I assume this gets cached?
select max(CURRENT_TIMESTAMP) as mct, ID as tid from TBL where ID like
'42ecf4abd4bd7e7606025dc8eee3de6a3cc04418cbc2619ddc01f54d88d7c3bf%' group by
ID) as tmp
Thanks,
Sumit