Thanks Andrew, I think this is a valid advice. I will update the
documentation~

Regards,
Dian

,

On Fri, Feb 10, 2023 at 10:08 PM Andrew Otto <o...@wikimedia.org> wrote:

> Question about side outputs and OutputTags in pyflink.  The docs
> <https://nightlies.apache.org/flink/flink-docs-master/docs/dev/datastream/side_output/>
> say we are supposed to
>
> yield output_tag, value
>
> Docs then say:
> > For retrieving the side output stream you use getSideOutput(OutputTag) on
> the result of the DataStream operation.
>
> From this, I'd expect that calling datastream.get_side_output would be
> optional.   However, it seems that if you do not call
> datastream.get_side_output, then the main datastream will have the record
> destined to the output tag still in it, as a Tuple(output_tag, value).
> This caused me great confusion for a while, as my downstream tasks would
> break because of the unexpected Tuple type of the record.
>
> Here's an example of the failure using side output and ProcessFunction in
> the word count example.
> <https://gist.github.com/ottomata/001df5df72eb1224c01c9827399fcbd7#file-pyflink_sideout_fail_word_count_example-py-L86-L100>
>
> I'd expect that just yielding an output_tag would make those records be in
> a different datastream, but apparently this is not the case unless you call
> get_side_output.
>
> If this is the expected behavior, perhaps the docs should be updated to
> say so?
>
> -Andrew Otto
>  Wikimedia Foundation
>
>
>
>
>

Reply via email to