Re: Debugging External Transforms on Dataflow (Python)

Alex Koay Tue, 15 Jun 2021 20:25:13 -0700

1. I'm building a streaming pipeline.
2. For the pure Java transforms pipeline I believe it got substituted with
a Dataflow native Solace transform (it isn't using use_runner_v2 as I think
Java doesn't support that publicly yet). I used the default Java flags with
a DataflowRunner.
3. I believe it's the source reader that is being created in mass.


Currently I just tested the Python pipeline (with Java Solace transform) on
the DirectRunner without bounds, and it seems that the issue is similarly
manifesting. I'm trying to debug it this way for now.

On Wed, Jun 16, 2021 at 9:01 AM Boyuan Zhang <[email protected]> wrote:

> In terms of the odd case you are experiencing,  it seems like you are
> comparing a pure java pipeline with a cross-language pipeline, right? I
> want to learn more details on this case:
>
>    - Is this a batch pipeline or a streaming pipeline?
>    - For your pure java transforms pipeline, do you run the pipeline with
>    'use_runner_v2' or 'beam_fn_api' or 'use_unified_worker'?
>    - For a large number of consumers, do you mean dataflow workers or the
>    source reader?
>
> If you can share the implementation of the source and the pipeline, that
> would be really helpful.
>
> +Lukasz Cwik <[email protected]> for awareness.
>
> On Tue, Jun 15, 2021 at 9:50 AM Chamikara Jayalath <[email protected]>
> wrote:
>
>>
>>
>> On Tue, Jun 15, 2021 at 3:20 AM Alex Koay <[email protected]> wrote:
>>
>>> Several questions:
>>>
>>> 1. Is there any way to set the log level for the Java workers via a
>>> Python Dataflow pipeline?
>>>
>>
>>> 2. What is the easiest way to debug an external transform in Java? My
>>> main pipeline code is in Python.
>>>
>>
>> In general, debugging a job should be similar to any other Dataflow job
>> [1]. But some of the SDK options available to the main SDK environment are
>> currently not available to external SDK environments. This includes
>> changing the debug level. So I suggest adding INFO logs instead of changing
>> the debug level if possible.
>>
>> [1]
>> https://cloud.google.com/dataflow/docs/guides/troubleshooting-your-pipeline
>>
>>
>>>
>>> 3. Are there any edge cases with the UnboundedSourceWrapperFn SDF that I
>>> should be wary of? I'm currently encountering a odd case (in Dataflow)
>>> where a Java pipeline runs with only one worker all the way reading Solace
>>> messages, but with an external transform in Python, it generates a large
>>> number of consumers and stop reading messages altogether about 90% of the
>>> way.
>>>
>>
>> +Boyuan Zhang <[email protected]> might be able to help.
>>
>>
>>> Thanks!
>>>
>>> Cheers
>>> Alex
>>>
>>

Re: Debugging External Transforms on Dataflow (Python)

Reply via email to