Thank you!

On Mon, Feb 13, 2023 at 5:55 AM Dian Fu <dian0511...@gmail.com> wrote:

> Thanks Andrew, I think this is a valid advice. I will update the
> documentation~
>
> Regards,
> Dian
>
> ,
>
> On Fri, Feb 10, 2023 at 10:08 PM Andrew Otto <o...@wikimedia.org> wrote:
>
>> Question about side outputs and OutputTags in pyflink.  The docs
>> <https://nightlies.apache.org/flink/flink-docs-master/docs/dev/datastream/side_output/>
>> say we are supposed to
>>
>> yield output_tag, value
>>
>> Docs then say:
>> > For retrieving the side output stream you use getSideOutput(OutputTag) on
>> the result of the DataStream operation.
>>
>> From this, I'd expect that calling datastream.get_side_output would be
>> optional.   However, it seems that if you do not call
>> datastream.get_side_output, then the main datastream will have the record
>> destined to the output tag still in it, as a Tuple(output_tag, value).
>> This caused me great confusion for a while, as my downstream tasks would
>> break because of the unexpected Tuple type of the record.
>>
>> Here's an example of the failure using side output and ProcessFunction
>> in the word count example.
>> <https://gist.github.com/ottomata/001df5df72eb1224c01c9827399fcbd7#file-pyflink_sideout_fail_word_count_example-py-L86-L100>
>>
>> I'd expect that just yielding an output_tag would make those records be
>> in a different datastream, but apparently this is not the case unless you
>> call get_side_output.
>>
>> If this is the expected behavior, perhaps the docs should be updated to
>> say so?
>>
>> -Andrew Otto
>>  Wikimedia Foundation
>>
>>
>>
>>
>>

Reply via email to