Pipeline gets stuck when chaining two SDFs (Python SDK)

2024-05-02 Thread Jaehyeon Kim
Hello, I am building a pipeline using two SDFs that are chained. The first function (DirectoryWatchFn) checks a folder continuously and grabs if a new file is added. The second one (ProcessFilesFn) processes a file while splitting each line - the processing simply prints the file name and line num

Re: Pipeline gets stuck when chaining two SDFs (Python SDK)

2024-05-04 Thread XQ Hu via user
I played with your example. Indeed, create_tracker in your ProcessFilesFn is never called, which is quite strange. I could not find any example that shows the chained SDFs, which makes me wonder whether the chained SDFs work. @Chamikara Jayalath Any thoughts? On Fri, May 3, 2024 at 2:45 AM Jaehy

Re: Pipeline gets stuck when chaining two SDFs (Python SDK)

2024-05-05 Thread Jaehyeon Kim
Hi XQ Thanks for checking it out. SDFs chaining seems to work as I created my pipeline while converting a pipeline that is built in the Java SDK. The source of the Java pipeline can be found in https://github.com/PacktPublishing/Building-Big-Data-Pipelines-with-Apache-Beam/blob/main/chapter7/src/m

Re: Pipeline gets stuck when chaining two SDFs (Python SDK)

2024-05-05 Thread XQ Hu via user
Have you tried to use other runners? I think this might be caused by some gaps in Python DirectRunner to support the streaming cases or SDFs, On Sun, May 5, 2024 at 5:10 AM Jaehyeon Kim wrote: > Hi XQ > > Thanks for checking it out. SDFs chaining seems to work as I created my > pipeline while co

Re: Pipeline gets stuck when chaining two SDFs (Python SDK)

2024-05-05 Thread Jaehyeon Kim
Hi XQ Yes, it works with the FlinkRunner. Thank you so much! Cheers, Jaehyeon [image: image.png] On Mon, 6 May 2024 at 02:49, XQ Hu via user wrote: > Have you tried to use other runners? I think this might be caused by some > gaps in Python DirectRunner to support the streaming cases or SDFs,

Re: Pipeline gets stuck when chaining two SDFs (Python SDK)

2024-05-05 Thread XQ Hu via user
I added this issue here https://github.com/apache/beam/issues/24528#issuecomment-2095026324 But we do not plan to fix this for Python DirectRunner since Prism will become the default local runner when it is ready. On Sun, May 5, 2024 at 2:41 PM Jaehyeon Kim wrote: > Hi XQ > > Yes, it works with