[jira] [Resolved] (PHOENIX-6677) Parallelism within a batch of mutations

Kadir Ozdemir (Jira) Wed, 08 Jun 2022 13:56:04 -0700


     [ 
https://issues.apache.org/jira/browse/PHOENIX-6677?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Kadir Ozdemir resolved PHOENIX-6677.
------------------------------------
    Resolution: Not A Problem

> Parallelism within a batch of mutations 
> ----------------------------------------
>
>                 Key: PHOENIX-6677
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-6677
>             Project: Phoenix
>          Issue Type: Improvement
>            Reporter: Kadir OZDEMIR
>            Priority: Major
>             Fix For: 4.17.0, 5.2.0
>
>
> Currently, Phoenix client simply passes the batches of row mutations from the 
> application to HBase client without any parallelism or intelligent grouping 
> (except grouping mutations for the same row). 
> Assume that the application creates batches 10000 row mutations for a given 
> table. Phoenix client divides these rows based on their arrival order into 
> HBase batches of n (e.g., 100) rows based on the configured batch size, i.e., 
> the number of rows and bytes. Then, Phoenix calls HBase batch API, one batch 
> at a time (i.e., serially).  HBase client further divides a given batch of 
> rows into smaller batches based on their regions. This means that a large 
> batch created by the application is divided into many tiny batches and 
> executed mostly serially. For slated tables, this will result in even smaller 
> batches. 
> We can improve the current implementation greatly if we group the rows of the 
> batch prepared by the application into sub batches based on table region 
> boundaries and then execute these batches in parallel. 



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

[jira] [Resolved] (PHOENIX-6677) Parallelism within a batch of mutations

Reply via email to