[ https://issues.apache.org/jira/browse/SENTRY-2249?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16525670#comment-16525670 ]
kalyan kumar kalvagadda edited comment on SENTRY-2249 at 6/27/18 10:14 PM: --------------------------------------------------------------------------- *Here are two options I had in mind.* *Option-1:* Persist the snapshot entities in batches. This may significantly reduce the DB operations. Currently there is one DB operation for one entry in snapshot. Which does not scale. *Option-2:* Break the total snapshot into to batches and persist all of them in parallel in different transactions. As we use using repeatable_read isolation level we should be able to have parallel writes on the same table. This bring an issue if there is a failure in persisting any of the batches. This approach needs additional logic of cleaning the partially persisted snapshot. was (Author: kkalyan): *Here are two options I had in mind.* *Option-1:* Persist the snapshot entities in batches. This may significantly reduce the DB operations. Currently there is one DB operation for one entry in snapshot. Which does not scale. *Option-2:* Break the total snapshot into to batches and persist all of them in parallel in different transactions. As we use using repeatable_read isolation level we should be able to have parallel writes on the same table. This bring an issue if there is a failure in persisting any of the batches. This approach needs additional logic of cleaning the partially persisted snapshot. I’m evaluating this option. > Persist HMS Full Snapshot in batches. > ------------------------------------- > > Key: SENTRY-2249 > URL: https://issues.apache.org/jira/browse/SENTRY-2249 > Project: Sentry > Issue Type: Improvement > Components: Sentry > Affects Versions: 2.1.0 > Reporter: kalyan kumar kalvagadda > Assignee: kalyan kumar kalvagadda > Priority: Major > Attachments: SENTRY-2249.001.patch > > > Currently each entry in full snapshot of HMS is persisted one entry at a > time. Instead it could be optimized by persisting the entries in batches. DB > operations are expensive, reducing the number of database operations should > help. This would decrease the time to persist the snapshot in to database > significantly. > Size of the batch could be configurable. -- This message was sent by Atlassian JIRA (v7.6.3#76005)