Re: Full Scan - is it really?

Josh Elser Thu, 27 Jul 2017 15:56:11 -0700


On 7/27/17 4:36 PM, Lew Jackman wrote:

I am joining two tables by using only a key fields in two tables.
(if this were straight hbase, I know I would code with some range scans)
There are many billions of rows in each table.
I am trying to understand the explain plan as I am having difficulties with 
query
failures under various load testing scenarios.
When used in sqlline, the query for which I am posting the plan  can take 6-30 
seconds.

Below is my explain plan.

CLIENT 173251-CHUNK 87628582101 ROWS 53384270894992 BYTES PARALLEL 1-WAY ROUND 
ROBIN FULL
SCAN OVER CLICKS
      SERVER FILTER BY FIRST KEY ONLY

PARALLEL INNER-JOIN TABLE 0 (SKIP MERGE)CLIENT 5181-CHUNK 14470353216 ROWS 1562798147328 BYTES PARALLEL 256-WAY ROUND ROBIN SKIP SCAN ON 25600 KEYS OVER PRODUCT [0,'020lj7'] - [255,'z4l777']

              SERVER FILTER BY FIRST KEY ONLY

DYNAMIC SERVER FILTER BY CLICKS.SHA_KEY IN (PRODUCT.SHA_KEY)


Question: Does " FULL SCAN " truly mean full scan?

I do know that running the hbase map reduce RowCounter to perform a row count, 
which
truly is a full scan, can take over 30 minutes. So I am very confused about - 
when is a
FULL SCAN really a full scan?

Thanks for any insight.


Yes and no, I think.

You're joining SHA_KEY from the CLICKS and PRODUCT table, but your"search space" on CLICKS is the entire table -- the query doesn't haveany information that it can use to limit your query to only a portion ofthe CLICKS table. Phoenix can make intelligent decisions about filteringdata based on that inner-join which preclude the query from enumeratingall of the key-values pairs in CLICKS.

So, yes the query is doing less work than a RowCounter would take, buttechnically it's still looking at the entire CLICKS table.

Re: Full Scan - is it really?

Reply via email to