Hm... and what is the right to presplit table then? 2016-02-02 18:30 GMT+01:00 Mujtaba Chohan <[email protected]>:
> If your filter matches few rows due to filter on leading part of PK then > your data might only reside in a single block which leads to less > overall disk reads for non-salted case vs need for multiple blocks reads for > salted one. > > > On Tuesday, February 2, 2016, Serega Sheypak <[email protected]> > wrote: > >> > then you would be better off not using salt buckets all together >> rather than having 100 parallel scan and block reads in your case. I >> Didn't understand you correctly. What is difference between salted/not >> salted table in case of "primary key leading-part select"? >> >> 2016-02-02 1:18 GMT+01:00 Mujtaba Chohan <[email protected]>: >> >>> If you are filtering on leading part of row key which is highly >>> selective then you would be better off not using salt buckets all together >>> rather than having 100 parallel scan and block reads in your case. In our >>> test with billion+ row table, non-salted table offer much better >>> performance since it ends up reading fewer blocks from a single region. >>> >>> //mujtaba >>> >>> On Mon, Feb 1, 2016 at 1:16 PM, Serega Sheypak <[email protected] >>> > wrote: >>> >>>> Hi, here is my table DDL: >>>> CREATE TABLE IF NOT EXISTS id_ref >>>> ( >>>> id1 VARCHAR NOT NULL, >>>> value1 VARCHAR, >>>> >>>> id2 VARCHAR NOT NULL, >>>> value2 VARCHAR >>>> CONSTRAINT id_ref_pk PRIMARY KEY (id1, id2) >>>> )IMMUTABLE_ROWS=true,SALT_BUCKETS=100, VERSIONS=1, TTL=691200 >>>> >>>> I'm trying to analyze result of explain: >>>> >>>> explain select id1, value1, id2, value2 from id_ref where id1 = 'xxx' >>>> >>>> . . . . . . . . . . . . . . . . . . . . . . .> ; >>>> >>>> *+------------------------------------------+* >>>> >>>> *| ** PLAN ** |* >>>> >>>> *+------------------------------------------+* >>>> >>>> *| *CLIENT 100-CHUNK PARALLEL 100-WAY RANGE SCAN OVER ID_REF >>>> [0,'1fd5c44a75549162ca1602dda55f6d129cab61a6']* |* >>>> >>>> *| *CLIENT MERGE SORT * |* >>>> >>>> *+------------------------------------------+* >>>> >>>> >>>> What happens? Client spawns 100 parallel scans (because of bucketing) >>>> and waits for 100 responses? >>>> >>>> Is it effective? What is the right way to optimize such query pattern: >>>> "select by first part of primary key"? Reduce the amount of buckets? I get >>>> exeption a while after restarting app: >>>> >>>> >>>> *Task org.apache.phoenix.job.JobManager$JobFutureTask@60a40644 rejected >>>> from org.apache.phoenix.job.JobManager$1@58e3fe9aRunning, pool size = 128, >>>> active threads = 121, queued tasks = 5000, completed tasks = 2629565* >>>> >>>> >>>> >>> >>
