Thanks for the replies,
@Lukasz that sounds like a good option. It's just it may be hard to catch
and filter out every case that will result in a 4xx error. I just want to
avoid the whole pipeline failing in the case of a few elements in the
stream being bad.

@Dan that sounds promising, I will keep an eye on BEAM-190. do you have any
idea if there will be an initial version of this to try out in the next
couple of weeks?

On Tue, Apr 11, 2017 at 11:37 PM, Dan Halperin <dhalp...@google.com> wrote:

> I believe this is BEAM-190, which is actually being worked on today.
> However, it will probably not be ready in time for the first stable release.
>
> https://issues.apache.org/jira/browse/BEAM-190
>
> On Tue, Apr 11, 2017 at 7:52 AM, Lukasz Cwik <lc...@google.com> wrote:
>
>> Have you thought of fetching the schema upfront from BigQuery and
>> prefiltering out any records in a preceeding DoFn instead of relying on
>> BigQuery telling you that the schema doesn't match?
>>
>> Otherwise you are correct in believing that you will need to update
>> BigQueryIO to have the retry/error semantics that you want.
>>
>> On Tue, Apr 11, 2017 at 1:12 AM, Josh <jof...@gmail.com> wrote:
>>
>>> What I really want to do is configure BigQueryIO to log an error and
>>> skip the write if it receives a 4xx response from BigQuery (e.g. element
>>> does not match table schema). And for other errors (e.g. 5xx) I want it to
>>> retry n times with exponential backoff.
>>>
>>> Is there any way to do this at the moment? Will I need to make some
>>> custom changes to BigQueryIO?
>>>
>>>
>>>
>>> On Mon, Apr 10, 2017 at 7:11 PM, Josh <jof...@gmail.com> wrote:
>>>
>>>> Hi,
>>>>
>>>> I'm using BigQueryIO to write the output of an unbounded streaming job
>>>> to BigQuery.
>>>>
>>>> In the case that an element in the stream cannot be written to
>>>> BigQuery, the BigQueryIO seems to have some default retry logic which
>>>> retries the write a few times. However, if the write fails repeatedly, it
>>>> seems to cause the whole pipeline to halt.
>>>>
>>>> How can I configure beam so that if writing an element fails a few
>>>> times, it simply gives up on writing that element and moves on without
>>>> affecting the pipeline?
>>>>
>>>> Thanks for any advice,
>>>> Josh
>>>>
>>>
>>>
>>
>

Reply via email to