Is there a way to specify split num or reducer num when creating phoenix table ?

you Zhuang Thu, 29 Aug 2019 01:12:01 -0700

I have a chronological series of data. Data row like dt, r1 ,r2 ,r3 ,r4 ,r5 ,r6 
,d1 ,d2 ,d3 ,d4 , d5 …


And dt is format  as  20190829 , increasing monotonically, such as 
20190830,20190831...

The query pattern is some like select * from table where dt between 20180620 
and  20190829 and r3 = ? And r6 = ?;

Dt is mandatory, remain filter is some random combination of r1 to r6, selected 
columns are always  all columns *.


I have made dt,r1,r2,… r6 to be compound primary key. The create table clause 
is below:

CREATE TABLE app.table(
 Dt integer not null ,
 R1 integer not null,
 R2 integer not null,
 R3 integer not null,
 R4 integer not null,
 R5 integer not null,
 R6 integer not null,

 D1 decimal(30,6),
 D2 decimal(30,6),
 D3 decimal(30,6),
 D4 decimal(30,6),
 D5 decimal(30,6),
 D6 decimal(30,6)


 CONSTRAINT pk PRIMARY KEY (dt,r1,r2,r3,r4,r5,r6)
) SALT_BUCKETS = 3,UPDATE_CACHE_FREQUENCY = 300000,COMPRESSION = 'SNAPPY',  
SPLIT_POLICY = 
'org.apache.hadoop.hbase.regionserver.ConstantSizeRegionSplitPolicy', 
MAX_FILESIZE = '5000000000’;

I have 3 region server so I determine SALT_BUCKETS = 3.

But when I initially load table data with csvbulkload tool , the  dt ranges 
from  20180620 to 20190829, data size is about 1T,

Csvbulkload map reduce shows 3 partitions for reducer, It  always failed due to 
so small partitions.

I increase SALT_BUCKETS = 512, but max SALT_BUCKETS = 256, I set it to 256 but 
not works.



I know I can split on (…)  when creating table, but I don’t know how to 
determine the point , and hundreds of points is scaring.


So Is there a way to specify split num or reducer num when creating phoenix 
table ? 

I will be expecting any advice for tuning this scenario.

Is there a way to specify split num or reducer num when creating phoenix table ?

Reply via email to