Thanks for the response Luke :-)

I did try setting <pcoll>.element_type for each resulting PCollection using
"apache_beam.typehints.typehints.KV" to describe the elements, which passed
type checking.  I also ran the full dataset (batch job) without the GBK in
question but instead using a dummy DoFn in its place which asserted that
every element that would be going into the GBK was a 2-tuple, along with
using --runtime_type_check, all of which run successfully without the GBK
after the TaggedOutput DoFn.

Adding back the GBK also runs end-to-end successfully on the DirectRunner
using the identical dataset.  But as soon as I add the GBK and use the
DataflowRunner (v2), I get errors as soon as the optimized step involving
the GBK is in the "running" status:

- "Could not start worker docker container"
- "Error syncing pod"
- "Check failed: pair_coder Strings" or "Check failed: kv_coder : expecting
a KV coder, but had Strings"

Anything further to try? I can also provide Job IDs from Dataflow if
helpful (and safe to share).

Thanks,
Evan

On Wed, Sep 22, 2021 at 1:09 AM Luke Cwik <lc...@google.com> wrote:

> Have you tried setting the element_type[1] explicitly on each output
> PCollection that is returned after applying the multi-output ParDo?
> I believe you'll get a DoOutputsTuple[2] returned after applying the
> mult-output ParDo which allows access to the underlying PCollection objects.
>
> 1:
> https://github.com/apache/beam/blob/ebf2aacf37b97fc85b167271f184f61f5b06ddc3/sdks/python/apache_beam/pvalue.py#L99
> 2:
> https://github.com/apache/beam/blob/ebf2aacf37b97fc85b167271f184f61f5b06ddc3/sdks/python/apache_beam/pvalue.py#L234
>
> On Tue, Sep 21, 2021 at 10:29 AM Evan Galpin <evan.gal...@gmail.com>
> wrote:
>
>> This is badly plaguing a pipeline I'm currently developing, where the
>> exact same data set and code runs end-to-end on DirectRunner, but fails on
>> DataflowRunner with either "Check failed: kv_coder : expecting a KV
>> coder, but had Strings" or "Check failed: pair_coder Strings" hidden in the
>> harness logs. It seems to be consistently repeatable with any TaggedOutput
>> + GBK afterwards.
>>
>> Any advice on how to proceed?
>>
>> Thanks,
>> Evan
>>
>> On Fri, Sep 17, 2021 at 11:20 AM Evan Galpin <evan.gal...@gmail.com>
>> wrote:
>>
>>> The Dataflow error logs only showed 1 error which was:  "The job failed
>>> because a work item has failed 4 times. Look in previous log entries for
>>> the cause of each one of the 4 failures. For more information, see
>>> https://cloud.google.com/dataflow/docs/guides/common-errors. The work
>>> item was attempted on these workers: beamapp-XXXX-XXXXX-kt85-harness-8k2c
>>> Root cause: The worker lost contact with the service."  In "Diagnostics"
>>> there were errors stating "Error syncing pod: Could not start worker docker
>>> container".  The harness logs i.e. "projects/my-project/logs/
>>> dataflow.googleapis.com%2Fharness" finally contained an error that
>>> looked suspect, which was "Check failed: kv_coder : expecting a KV
>>> coder, but had Strings", below[1] is a link to possibly a stacktrace or
>>> extra detail, but is internal to google so I don't have access.
>>>
>>> [1]
>>> https://symbolize.corp.google.com/r/?trace=55a197abcf56,55a197abbe33,55a197abb97e,55a197abd708,55a196d4e22f,55a196d4d8d3,55a196d4da35,55a1967ec247,55a196f62b26,55a1968969b3,55a196886613,55a19696b0e6,55a196969815,55a1969693eb,55a19696916e,55a1969653bc,55a196b0150a,55a196b04e11,55a1979fc8df,7fe7736674e7,7fe7734dc22c&map=13ddc0ac8b57640c29c5016eb26ef88e:55a1956e7000-55a197bd5010,f1c96c67b57b74a4d7050f34aca016eef674f765:7fe773660000-7fe773676dac,76b955c7af655a4c1e53b8d4aaa0255f3721f95f:7fe7734a5000-7fe7736464c4
>>>
>>> On Thu, Sep 9, 2021 at 6:46 PM Robert Bradshaw <rober...@google.com>
>>> wrote:
>>>
>>>> Huh, that's strange. Yes, the exact error on the service would be
>>>> helpful.
>>>>
>>>> On Wed, Sep 8, 2021 at 10:12 AM Evan Galpin <evan.gal...@gmail.com>
>>>> wrote:
>>>> >
>>>> > Thanks for the response. I've created a gist here to demonstrate a
>>>> minimal repro:
>>>> https://gist.github.com/egalpin/2d6ad2210cf9f66108ff48a9c7566ebc
>>>> >
>>>> > It seemed to run fine both on DirectRunner and PortableRunner (embed
>>>> mode), but Dataflow v2 runner raised an error at runtime seemingly
>>>> associated with the Shuffle service?  I have job IDs and trace links if
>>>> those are helpful as well.
>>>> >
>>>> > Thanks,
>>>> > Evan
>>>> >
>>>> > On Tue, Sep 7, 2021 at 4:35 PM Robert Bradshaw <rober...@google.com>
>>>> wrote:
>>>> >>
>>>> >> This is not yet supported. Using a union for now is the way to go.
>>>> (If
>>>> >> only the last value of the union was used, that sounds like a bug. Do
>>>> >> you have a minimal repro?)
>>>> >>
>>>> >> On Tue, Sep 7, 2021 at 1:23 PM Evan Galpin <evan.gal...@gmail.com>
>>>> wrote:
>>>> >> >
>>>> >> > Hi all,
>>>> >> >
>>>> >> > What is the recommended way to write type hints for a tagged
>>>> output DoFn where the outputs to different tags have different types?
>>>> >> >
>>>> >> > I tried using a Union to describe each of the possible output
>>>> types, but that resulted in mismatched coder errors where only the last
>>>> entry in the Union was used as the assumed type.  Is there a way to
>>>> associate a type hint to a tag or something like that?
>>>> >> >
>>>> >> > Thanks,
>>>> >> > Evan
>>>>
>>>

Reply via email to