Thanks for the replies, @Lukasz that sounds like a good option. It's just it may be hard to catch and filter out every case that will result in a 4xx error. I just want to avoid the whole pipeline failing in the case of a few elements in the stream being bad.
@Dan that sounds promising, I will keep an eye on BEAM-190. do you have any idea if there will be an initial version of this to try out in the next couple of weeks? On Tue, Apr 11, 2017 at 11:37 PM, Dan Halperin <dhalp...@google.com> wrote: > I believe this is BEAM-190, which is actually being worked on today. > However, it will probably not be ready in time for the first stable release. > > https://issues.apache.org/jira/browse/BEAM-190 > > On Tue, Apr 11, 2017 at 7:52 AM, Lukasz Cwik <lc...@google.com> wrote: > >> Have you thought of fetching the schema upfront from BigQuery and >> prefiltering out any records in a preceeding DoFn instead of relying on >> BigQuery telling you that the schema doesn't match? >> >> Otherwise you are correct in believing that you will need to update >> BigQueryIO to have the retry/error semantics that you want. >> >> On Tue, Apr 11, 2017 at 1:12 AM, Josh <jof...@gmail.com> wrote: >> >>> What I really want to do is configure BigQueryIO to log an error and >>> skip the write if it receives a 4xx response from BigQuery (e.g. element >>> does not match table schema). And for other errors (e.g. 5xx) I want it to >>> retry n times with exponential backoff. >>> >>> Is there any way to do this at the moment? Will I need to make some >>> custom changes to BigQueryIO? >>> >>> >>> >>> On Mon, Apr 10, 2017 at 7:11 PM, Josh <jof...@gmail.com> wrote: >>> >>>> Hi, >>>> >>>> I'm using BigQueryIO to write the output of an unbounded streaming job >>>> to BigQuery. >>>> >>>> In the case that an element in the stream cannot be written to >>>> BigQuery, the BigQueryIO seems to have some default retry logic which >>>> retries the write a few times. However, if the write fails repeatedly, it >>>> seems to cause the whole pipeline to halt. >>>> >>>> How can I configure beam so that if writing an element fails a few >>>> times, it simply gives up on writing that element and moves on without >>>> affecting the pipeline? >>>> >>>> Thanks for any advice, >>>> Josh >>>> >>> >>> >> >