Thanks for the response Luke :-) I did try setting <pcoll>.element_type for each resulting PCollection using "apache_beam.typehints.typehints.KV" to describe the elements, which passed type checking. I also ran the full dataset (batch job) without the GBK in question but instead using a dummy DoFn in its place which asserted that every element that would be going into the GBK was a 2-tuple, along with using --runtime_type_check, all of which run successfully without the GBK after the TaggedOutput DoFn.
Adding back the GBK also runs end-to-end successfully on the DirectRunner using the identical dataset. But as soon as I add the GBK and use the DataflowRunner (v2), I get errors as soon as the optimized step involving the GBK is in the "running" status: - "Could not start worker docker container" - "Error syncing pod" - "Check failed: pair_coder Strings" or "Check failed: kv_coder : expecting a KV coder, but had Strings" Anything further to try? I can also provide Job IDs from Dataflow if helpful (and safe to share). Thanks, Evan On Wed, Sep 22, 2021 at 1:09 AM Luke Cwik <[email protected]> wrote: > Have you tried setting the element_type[1] explicitly on each output > PCollection that is returned after applying the multi-output ParDo? > I believe you'll get a DoOutputsTuple[2] returned after applying the > mult-output ParDo which allows access to the underlying PCollection objects. > > 1: > https://github.com/apache/beam/blob/ebf2aacf37b97fc85b167271f184f61f5b06ddc3/sdks/python/apache_beam/pvalue.py#L99 > 2: > https://github.com/apache/beam/blob/ebf2aacf37b97fc85b167271f184f61f5b06ddc3/sdks/python/apache_beam/pvalue.py#L234 > > On Tue, Sep 21, 2021 at 10:29 AM Evan Galpin <[email protected]> > wrote: > >> This is badly plaguing a pipeline I'm currently developing, where the >> exact same data set and code runs end-to-end on DirectRunner, but fails on >> DataflowRunner with either "Check failed: kv_coder : expecting a KV >> coder, but had Strings" or "Check failed: pair_coder Strings" hidden in the >> harness logs. It seems to be consistently repeatable with any TaggedOutput >> + GBK afterwards. >> >> Any advice on how to proceed? >> >> Thanks, >> Evan >> >> On Fri, Sep 17, 2021 at 11:20 AM Evan Galpin <[email protected]> >> wrote: >> >>> The Dataflow error logs only showed 1 error which was: "The job failed >>> because a work item has failed 4 times. Look in previous log entries for >>> the cause of each one of the 4 failures. For more information, see >>> https://cloud.google.com/dataflow/docs/guides/common-errors. The work >>> item was attempted on these workers: beamapp-XXXX-XXXXX-kt85-harness-8k2c >>> Root cause: The worker lost contact with the service." In "Diagnostics" >>> there were errors stating "Error syncing pod: Could not start worker docker >>> container". The harness logs i.e. "projects/my-project/logs/ >>> dataflow.googleapis.com%2Fharness" finally contained an error that >>> looked suspect, which was "Check failed: kv_coder : expecting a KV >>> coder, but had Strings", below[1] is a link to possibly a stacktrace or >>> extra detail, but is internal to google so I don't have access. >>> >>> [1] >>> https://symbolize.corp.google.com/r/?trace=55a197abcf56,55a197abbe33,55a197abb97e,55a197abd708,55a196d4e22f,55a196d4d8d3,55a196d4da35,55a1967ec247,55a196f62b26,55a1968969b3,55a196886613,55a19696b0e6,55a196969815,55a1969693eb,55a19696916e,55a1969653bc,55a196b0150a,55a196b04e11,55a1979fc8df,7fe7736674e7,7fe7734dc22c&map=13ddc0ac8b57640c29c5016eb26ef88e:55a1956e7000-55a197bd5010,f1c96c67b57b74a4d7050f34aca016eef674f765:7fe773660000-7fe773676dac,76b955c7af655a4c1e53b8d4aaa0255f3721f95f:7fe7734a5000-7fe7736464c4 >>> >>> On Thu, Sep 9, 2021 at 6:46 PM Robert Bradshaw <[email protected]> >>> wrote: >>> >>>> Huh, that's strange. Yes, the exact error on the service would be >>>> helpful. >>>> >>>> On Wed, Sep 8, 2021 at 10:12 AM Evan Galpin <[email protected]> >>>> wrote: >>>> > >>>> > Thanks for the response. I've created a gist here to demonstrate a >>>> minimal repro: >>>> https://gist.github.com/egalpin/2d6ad2210cf9f66108ff48a9c7566ebc >>>> > >>>> > It seemed to run fine both on DirectRunner and PortableRunner (embed >>>> mode), but Dataflow v2 runner raised an error at runtime seemingly >>>> associated with the Shuffle service? I have job IDs and trace links if >>>> those are helpful as well. >>>> > >>>> > Thanks, >>>> > Evan >>>> > >>>> > On Tue, Sep 7, 2021 at 4:35 PM Robert Bradshaw <[email protected]> >>>> wrote: >>>> >> >>>> >> This is not yet supported. Using a union for now is the way to go. >>>> (If >>>> >> only the last value of the union was used, that sounds like a bug. Do >>>> >> you have a minimal repro?) >>>> >> >>>> >> On Tue, Sep 7, 2021 at 1:23 PM Evan Galpin <[email protected]> >>>> wrote: >>>> >> > >>>> >> > Hi all, >>>> >> > >>>> >> > What is the recommended way to write type hints for a tagged >>>> output DoFn where the outputs to different tags have different types? >>>> >> > >>>> >> > I tried using a Union to describe each of the possible output >>>> types, but that resulted in mismatched coder errors where only the last >>>> entry in the Union was used as the assumed type. Is there a way to >>>> associate a type hint to a tag or something like that? >>>> >> > >>>> >> > Thanks, >>>> >> > Evan >>>> >>>
