[ https://issues.apache.org/jira/browse/BEAM-5105?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Chamikara Jayalath reassigned BEAM-5105: ---------------------------------------- Assignee: Reuven Lax > Move load job poll to finishBundle() method to better parallelize execution > --------------------------------------------------------------------------- > > Key: BEAM-5105 > URL: https://issues.apache.org/jira/browse/BEAM-5105 > Project: Beam > Issue Type: Improvement > Components: io-java-gcp > Reporter: Chamikara Jayalath > Assignee: Reuven Lax > Priority: Major > Fix For: 2.8.0 > > Time Spent: 2h 20m > Remaining Estimate: 0h > > It appears that when we write to BigQuery using WriteTablesDoFn we start a > load job and wait for that job to finish. > [https://github.com/apache/beam/blob/master/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/WriteTables.java#L318] > > In cases where we are trying to write a PCollection of tables (for example, > when user use dynamic destinations feature) this relies on dynamic work > rebalancing to parallellize execution of load jobs. If the runner does not > support dynamic work rebalancing or does not execute dynamic work rebalancing > from some reason this could have significant performance drawbacks. For > example, scheduling times for load jobs will add up. > > A better approach might be to start load jobs at process() method but wait > for all load jobs to finish at finishBundle() method. This will parallelize > any overheads as well as job execution (assuming more than one job is > schedule by BQ.). > -- This message was sent by Atlassian JIRA (v7.6.3#76005)