In terms of the odd case you are experiencing, it seems like you are comparing a pure java pipeline with a cross-language pipeline, right? I want to learn more details on this case:
- Is this a batch pipeline or a streaming pipeline? - For your pure java transforms pipeline, do you run the pipeline with 'use_runner_v2' or 'beam_fn_api' or 'use_unified_worker'? - For a large number of consumers, do you mean dataflow workers or the source reader? If you can share the implementation of the source and the pipeline, that would be really helpful. +Lukasz Cwik <[email protected]> for awareness. On Tue, Jun 15, 2021 at 9:50 AM Chamikara Jayalath <[email protected]> wrote: > > > On Tue, Jun 15, 2021 at 3:20 AM Alex Koay <[email protected]> wrote: > >> Several questions: >> >> 1. Is there any way to set the log level for the Java workers via a >> Python Dataflow pipeline? >> > >> 2. What is the easiest way to debug an external transform in Java? My >> main pipeline code is in Python. >> > > In general, debugging a job should be similar to any other Dataflow job > [1]. But some of the SDK options available to the main SDK environment are > currently not available to external SDK environments. This includes > changing the debug level. So I suggest adding INFO logs instead of changing > the debug level if possible. > > [1] > https://cloud.google.com/dataflow/docs/guides/troubleshooting-your-pipeline > > >> >> 3. Are there any edge cases with the UnboundedSourceWrapperFn SDF that I >> should be wary of? I'm currently encountering a odd case (in Dataflow) >> where a Java pipeline runs with only one worker all the way reading Solace >> messages, but with an external transform in Python, it generates a large >> number of consumers and stop reading messages altogether about 90% of the >> way. >> > > +Boyuan Zhang <[email protected]> might be able to help. > > >> Thanks! >> >> Cheers >> Alex >> >
