Hi Danny, Thank you for your reply. I don't use splittable DoFn
explicitly. The problem appears when I attach write function at the end of my
pipeline After ~10 hours of running a job with 30 workers first I start
getting a lot of errors like this from Dataflow Error syncing pod
3cfc1e3bc4c70cd9c668944ce9769a42
("df-go-job-1-1648642256936168-03300511-upar-harness-nn4g_default(3cfc1e3bc4c70cd9c668944ce9769a42)"),
skipping: failed to "StartContainer" for "sdk-0-0" with
CrashLoopBackOff: "back-off 40s restarting failed container=sdk-0-0
pod=df-go-job-1-1648642256936168-03300511-upar-harness-nn4g_default(3cfc1e3bc4c70cd9c668944ce9769a42)
Then the only 2 errors which appeared on workers are unable to split
process_bundle-2455586051109759173-4329: failed to split DataSource (at index:
1) at requested splits: {[0 1]} Just after that the pipeline was marked as
failed in Dataflow. This makes me think that this split is causing the pipeline
to fail. I am pretty sure this happened in the last step which is simply
textio.Write to write. Maybe the size of the output file was a problem because
I'm expecting between 1TB and 2TB file to be written to GCS, maybe this is
the problem. -- Paweł
Dnia 30 marca 2022 14:47 Danny McCormick
<[email protected]> napisał(a):
Hey Pawel, are you able to provide any more information about the
splittable DoFn that is failing and what its SplitRestriction function looks
like? I'm not sure that we have enough information to fully diagnose the
issue with the error alone. The error itself is coming from here - github.com
https://github.com/apache/beam/blob/ec62f3d9fa38e67279a70df28f2be8b17f344e27/sdks/go/pkg/beam/core/runtime/exec/datasource.go#L538
- and is pretty much what it sounds like - the runner is trying to initiate a
split but it fails because the sdk can't find a split point. I am kinda
surprised that this causes the whole pipeline to fail though - was there any
additional logging around the job crash? I wouldn't expect a failed split
to be enough to bring the whole job down. @Daniel Oliveira probably has more
context here, Daniel do you have anything to add? Thanks, Danny On Wed, Mar
30, 2022 at 1:01 AM Pawel < [email protected] > wrote: Hi I have very
limited knowledge about Apache Beam and I'm facing a strange problem when
starting job in Dataflow using Go SDK. First I get this debug message (I see
more of such Splits) PB Split:
instruction_id:"process_bundle-2455586051109757402-4328"
desired_splits:{key:"ptransform-11" value:{fraction_of_remainder:0.5
allowed_split_points:0 allowed_split_points:1 estimated_input_elements:1}}
And just after this debug message there is an error github.com
github.com/apache/beam/sdks/[email protected]/go/pkg/beam/core/runtime/harness/harness.go:432
" message: "unable to split process_bundle-2455586051109757402-4328:
failed to split DataSource (at index: 1) at requested splits: {[0 1]}"
After this error whole jobs crashes. Is this likely a problem in my in my
pipeline or a bug in Dataflow and/or Beam? Thanks in advance -- Paweł