On Wed, Aug 7, 2019 at 10:55 PM Yohei Onishi <[email protected]> wrote:
> Hi, > > If you are familiar with BiqQuery insert retry policies in Apache Beam API > (BigQueryIO), please help me understand the following behavior. I am using > Dataflow runner. > > - How Dataflow job behave if I specify retryTransientErrors? > > All errors are considered transient except if BigQuery says that the error reason is one of "invalid", "invalidQuery", "notImplemented" https://github.com/apache/beam/blob/c2f0d282337f3ae0196a7717712396a5a41fdde1/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/InsertRetryPolicy.java#L44 > > - shouldRetry provides an error from BigQuery and I can decide if I > should retry. Where can I find expected error from BigQuery? > > You can't since the errors are not visible to the caller: https://github.com/apache/beam/blob/c2f0d282337f3ae0196a7717712396a5a41fdde1/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/InsertRetryPolicy.java#L36 I'm not sure if this was done on purpose or whether Apache Beam should expose the errors so users can write their own retry logic. > *BiqQuery insert retry policies* > > https://beam.apache.org/releases/javadoc/2.1.0/org/apache/beam/sdk/io/gcp/bigquery/InsertRetryPolicy.html > > > - alwaysRetry - Always retry all failures. > - neverRetry - Never retry any failures. > - retryTransientErrors - Retry all failures except for known > persistent errors. > - shouldRetry - Return true if this failure should be retried. > > *Background* > > - When my Cloud Dataflow job inserting very old timestamp (more than 1 > year before from now) to BigQuery, I got the following error. > - Retry did not stop so I added retryTransientErrors to > BigQueryIO.Write step then the retry stopped. > > jsonPayload: { >> exception: "java.lang.RuntimeException: java.io.IOException: Insert >> failed: >> [{"errors":[{"debugInfo":"","location":"","message":"Value 690000000 for >> field >> timestamp_scanned of the destination table >> fr-prd-datalake:rfid_raw.store_epc_transactions_cr_uqjp is outside the >> allowed bounds. >> You can only stream to date range within 365 days in the past and 183 >> days in >> the future relative to the current date.","reason":"invalid"}], >> After the first error, Dataflow try to retry insert and it always >> rejected from BigQuery with the same error. > > > I also posted the same question here > https://stackoverflow.com/questions/57403980/biqquery-insert-retry-policy-in-apache-beam > > Yohei Onishi >
