Nothing problematic is standing out for me in those logs. A job service and artifact staging service is spun up to allow the job (and its artifacts) to be submitted, then they are shut down. What are the actual errors that you are seeing?
On Wed, Jan 3, 2024 at 7:39 AM Lydian <[email protected]> wrote: > > > Hi, > > We are running Beam 2.41.0 with the portable flink runner using python SDK. > However, we suddenly noticed that all our jobs are now failing with error > like this: > ``` > 2024-01-03 15:35:30,067 INFO > org.apache.beam.runners.jobsubmission.JobServerDriver [] - > ArtifactStagingService started on localhost:41047 > 2024-01-03 15:35:31,640 INFO > org.apache.beam.runners.jobsubmission.JobServerDriver [] - Java > ExpansionService started on localhost:35299 > 2024-01-03 15:35:31,676 INFO > org.apache.beam.runners.jobsubmission.JobServerDriver [] - JobService > started on localhost:42519 > 2024-01-03 15:35:31,677 INFO > org.apache.beam.runners.jobsubmission.JobServerDriver [] - Job server > now running, terminate with Ctrl+C > 2024-01-03 15:35:31,996 INFO > org.apache.beam.runners.flink.FlinkPortableClientEntryPoint [] - Started > driver program > 2024-01-03 15:35:43,899 INFO > org.apache.beam.runners.fnexecution.artifact.ArtifactStagingService [] - > Staging artifacts for job_12e792dc-6e6f-417f-aad3-0da89df2b6d8. > 2024-01-03 15:35:43,899 INFO > org.apache.beam.runners.fnexecution.artifact.ArtifactStagingService [] - > Resolving artifacts for > job_12e792dc-6e6f-417f-aad3-0da89df2b6d8.ref_Environment_d > efault_environment_2. > 2024-01-03 15:35:43,902 INFO > org.apache.beam.runners.fnexecution.artifact.ArtifactStagingService [] - > Getting 0 artifacts for > job_12e792dc-6e6f-417f-aad3-0da89df2b6d8.external_1beam:en > v:process:v1. > 2024-01-03 15:35:43,902 INFO > org.apache.beam.runners.fnexecution.artifact.ArtifactStagingService [] - > Resolving artifacts for > job_12e792dc-6e6f-417f-aad3-0da89df2b6d8.external_1beam:en > v:process:v1. > 2024-01-03 15:35:43,903 INFO > org.apache.beam.runners.fnexecution.artifact.ArtifactStagingService [] - > Getting 1 artifacts for job_12e792dc-6e6f-417f-aad3-0da89df2b6d8.null. > 2024-01-03 15:36:02,047 INFO > org.apache.beam.runners.flink.FlinkPortableClientEntryPoint [] - Stopping > job service > 2024-01-03 15:36:02,050 INFO > org.apache.beam.runners.jobsubmission.JobServerDriver [] - JobServer > stopped on localhost:42519 > ``` > It seems like the error is related to the ArtifactStagingService, but I am > having trouble identifying the root cause. Wondering if someone would be able > to help me figure out how to pull more informative debug logging to fix this > issue. Thanks! > > Sincerely, > Lydian Lee >
