Re: Hash join confusion

Sumit Nigam Wed, 28 Sep 2016 22:29:45 -0700

Thank you Maryann.
I am not using multi-tenancy for these tables. Increasing 
phoenix.coprocessor.maxServerCacheTimeToLiveMs and the corresponding cache size 
config just delayed the error. 
I have also started seeing some memory problem -
Caused by: org.apache.phoenix.memory.InsufficientMemoryException: Requested 
memory of 22871932 bytes could not be allocated from remaining memory of 
776776654 bytes from global pool of 778469376 bytes after waiting for 10000ms.
        at 
org.apache.phoenix.memory.GlobalMemoryManager.allocateBytes(GlobalMemoryManager.java:78)
        at 
org.apache.phoenix.memory.GlobalMemoryManager.access$300(GlobalMemoryManager.java:30)
        at 
org.apache.phoenix.memory.GlobalMemoryManager$GlobalMemoryChunk.resize(GlobalMemoryManager.java:139)
What I am having trouble with is, that the total size of csv produced by 
sub-query is only ~7Mb. I have 12 region servers with 5GB heap each. So, when 
this result gets sent across to all region servers to perform the server side 
join, not sure why a memory issue should show up (or a time out occur). Any 
insights?

These tables are salted. Not sure if it is 
https://issues.apache.org/jira/browse/PHOENIX-2900 issue. 
Switching to sort merge join helped. But not sure if that is the right solution 
going forward.
Thanks again!Sumit

      From: Maryann Xue <[email protected]>
 To: "[email protected]" <[email protected]>; Sumit Nigam 
<[email protected]> 
 Sent: Wednesday, September 28, 2016 11:36 PM
 Subject: Re: Hash join confusion

Yes, Sumit, the sub-query will get cached in hash join. Are you using 
multi-tenancy for these tables? If yes, you might want to checkout Phoenix 4.7 
or 4.8, since a related bug fix got in the 4.7 release. 
https://issues.apache.org/jira/browse/PHOENIX-2381?jql=project%20%3D%20PHOENIX%20AND%20text%20~%20%22hash%20cache%20id%22
Otherwise I think it's the hash cache timeout issue, in which case changing 
phoenix.coprocessor.maxServerCacheTimeToLiveMs might be helpful.

Thanks,Maryann
On Tue, Sep 27, 2016 at 10:02 PM, Sumit Nigam <[email protected]> wrote:

Hi,
Is there any document which can help me understand explain plan output in 
detail? Or, which piece of code should I look at, to get an idea?
Here is explain plan for inner join query below. Can anyone help in explaining 
it to me? Like, as per the plan which table is being cached, etc.?Here, 
indx_exdocb is index table (on ID) and exDocStoreb is main table with rowkey as 
(current_timestamp, ID).
+----------------------------- -------------+|                   PLAN           
        |+----------------------------- -------------+| CLIENT 36-CHUNK 
PARALLEL 36-WAY FULL SCAN OVER exDocStoreb ||     PARALLEL INNER-JOIN TABLE 0 
(SKIP MERGE) ||         CLIENT 36-CHUNK PARALLEL 36-WAY RANGE SCAN OVER 
indx_exdocb [0,' 42ecf4abd4bd7e7606025dc8eee3de 6a3cc04418cbc2619ddc01f54d88d7 
c3bf'] - [0,' 42ecf4abd4bd7e7606025dc8eee3de 6a3cc04418cbc2619ddc01f54d88d7 
c3bg' ||             SERVER FILTER BY FIRST KEY ONLY ||             SERVER 
AGGREGATE INTO ORDERED DISTINCT ROWS BY ["ID"] ||         CLIENT MERGE SORT     
           ||     DYNAMIC SERVER FILTER BY (A.CURRENT_TIMESTAMP, A.ID) IN 
((TMP.MCT, TMP.TID)) |+----------------------------- -------------+
Also, is there a way to turn ON more verbose explain plan? Like, seeing number 
of bytes, rows that each step results in?
Thanks,Sumit

      From: Sumit Nigam <[email protected]>
 To: Users Mail List Phoenix <[email protected]> 
 Sent: Tuesday, September 27, 2016 9:17 PM
 Subject: Hash join confusion

Hi,
I am using hbase 1.1 with phoenix 4.6. 
I have a table with row key as (current_timestamp, id) which is salted and 
index on (id). This table has ~3 million records.
I have a query like given below. 
SELECT  ID, CURRENT_TIMESTAMP, <SOME OTHER COLUMNS> from TBL                    
    as a inner join (                                select 
max(CURRENT_TIMESTAMP) as mct, ID as tid from TBL where ID like ' 
42ecf4abd4bd7e7606025dc8eee3de 6a3cc04418cbc2619ddc01f54d88d7 c3bf%' group by 
ID) as tmp                        on a.ID=tmp.tid and 
a.CURRENT_TIMESTAMP=tmp.mct

The query hangs for long and finally fails with a timeout. I have 12 region 
servers each with 5GB heap and also the total records satisfying the above 
query is 62K whose CSV dump is ~10MB only. 
DoNotRetryIOException: Could not find hash cache for join Id: Ӧ�8�D�. The cache 
might have expired and have been removed

and - 
Caused by: java.sql.SQLException: Encountered exception in sub plan [0] 
execution.
        at org.apache.phoenix.execute. HashJoinPlan.iterator( 
HashJoinPlan.java:175)
        at com.infa.products.ldm. ingestion.server.java.hadoop. impl. 
FixPhoenixIngestInputFormat. getQueryPlan( FixPhoenixIngestInputFormat. java:94)
        ... 22 more
and
Caused by: java.sql.SQLException:
    java.util.concurrent.Timeo utException        at org.apache.phoenix.cache. 
ServerCacheClient. addServerCache( ServerCacheClient.java:264)

I can try playing around with parameters such as phoenix.coprocessor. 
maxServerCacheTimeToLiveMs and switching to sort_merge_join actually helped.
But my question is as per Joins | Apache Phoenix in a case such as lhs INNER 
JOIN rhs, it is rhs which will be built as hash table in server cache. So, in 
the above query I assume this gets cached?
 select max(CURRENT_TIMESTAMP) as mct, ID as tid from TBL where ID like ' 
42ecf4abd4bd7e7606025dc8eee3de 6a3cc04418cbc2619ddc01f54d88d7 c3bf%' group by 
ID) as tmp 
Thanks,
Sumit

Re: Hash join confusion

Reply via email to