[ 
https://issues.apache.org/jira/browse/SENTRY-2249?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16525670#comment-16525670
 ] 

kalyan kumar kalvagadda edited comment on SENTRY-2249 at 6/27/18 10:14 PM:
---------------------------------------------------------------------------

*Here are two options I had in mind.*

*Option-1:* Persist the snapshot entities in batches. This may significantly 
reduce the DB operations. Currently there is one DB operation for one entry in 
snapshot. Which does not scale.

*Option-2:* Break the total snapshot into to batches and persist all of them in 
parallel in different transactions. As we use using repeatable_read isolation 
level we should be able to have parallel writes on the same table. This bring 
an issue if there is a failure in persisting any of the batches. This approach 
needs additional logic of cleaning the partially persisted snapshot.

 


was (Author: kkalyan):
*Here are two options I had in mind.*

*Option-1:* Persist the snapshot entities in batches. This may significantly 
reduce the DB operations. Currently there is one DB operation for one entry in 
snapshot. Which does not scale. 

*Option-2:* Break the total snapshot into to batches and persist all of them in 
parallel in different transactions. As we use using repeatable_read isolation 
level we should be able to have parallel writes on the same table. This bring 
an issue if there is a failure in persisting any of the batches. This approach 
needs additional logic of cleaning the partially persisted snapshot. I’m 
evaluating this option. 

 

> Persist HMS Full Snapshot in batches.
> -------------------------------------
>
>                 Key: SENTRY-2249
>                 URL: https://issues.apache.org/jira/browse/SENTRY-2249
>             Project: Sentry
>          Issue Type: Improvement
>          Components: Sentry
>    Affects Versions: 2.1.0
>            Reporter: kalyan kumar kalvagadda
>            Assignee: kalyan kumar kalvagadda
>            Priority: Major
>         Attachments: SENTRY-2249.001.patch
>
>
> Currently each entry in full snapshot of HMS is persisted one entry at a 
> time. Instead it could be optimized by persisting the entries in batches. DB 
> operations are expensive, reducing the number of database operations should 
> help. This would decrease the time to persist the snapshot in to database 
> significantly.
> Size of the batch could be configurable.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to