In terms of the odd case you are experiencing,  it seems like you are
comparing a pure java pipeline with a cross-language pipeline, right? I
want to learn more details on this case:

   - Is this a batch pipeline or a streaming pipeline?
   - For your pure java transforms pipeline, do you run the pipeline with
   'use_runner_v2' or 'beam_fn_api' or 'use_unified_worker'?
   - For a large number of consumers, do you mean dataflow workers or the
   source reader?

If you can share the implementation of the source and the pipeline, that
would be really helpful.

+Lukasz Cwik <[email protected]> for awareness.

On Tue, Jun 15, 2021 at 9:50 AM Chamikara Jayalath <[email protected]>
wrote:

>
>
> On Tue, Jun 15, 2021 at 3:20 AM Alex Koay <[email protected]> wrote:
>
>> Several questions:
>>
>> 1. Is there any way to set the log level for the Java workers via a
>> Python Dataflow pipeline?
>>
>
>> 2. What is the easiest way to debug an external transform in Java? My
>> main pipeline code is in Python.
>>
>
> In general, debugging a job should be similar to any other Dataflow job
> [1]. But some of the SDK options available to the main SDK environment are
> currently not available to external SDK environments. This includes
> changing the debug level. So I suggest adding INFO logs instead of changing
> the debug level if possible.
>
> [1]
> https://cloud.google.com/dataflow/docs/guides/troubleshooting-your-pipeline
>
>
>>
>> 3. Are there any edge cases with the UnboundedSourceWrapperFn SDF that I
>> should be wary of? I'm currently encountering a odd case (in Dataflow)
>> where a Java pipeline runs with only one worker all the way reading Solace
>> messages, but with an external transform in Python, it generates a large
>> number of consumers and stop reading messages altogether about 90% of the
>> way.
>>
>
> +Boyuan Zhang <[email protected]> might be able to help.
>
>
>> Thanks!
>>
>> Cheers
>> Alex
>>
>

Reply via email to