Hi Beam community,

We have a batch pipeline which does not run regularly. Recently we
have upgraded to Beam 2.36 and this broke the FileIO WriteDynamic
process.

We are using Dataflow Runner, and the errors are like this when there
are multiple workers:

Error message from worker: java.lang.NoClassDefFoundError: Could not
initialize class
org.apache.beam.runners.dataflow.worker.ApplianceShuffleWriter
org.apache.beam.runners.dataflow.worker.ShuffleSink.writer(ShuffleSink.java:348)
org.apache.beam.runners.dataflow.worker.SizeReportingSinkWrapper.writer(SizeReportingSinkWrapper.java:46)
org.apache.beam.runners.dataflow.worker.util.common.worker.WriteOperation.initializeWriter(WriteOperation.java:71)
org.apache.beam.runners.dataflow.worker.util.common.worker.WriteOperation.start(WriteOperation.java:78)
org.apache.beam.runners.dataflow.worker.util.common.worker.MapTaskExecutor.execute(MapTaskExecutor.java:92)
org.apache.beam.runners.dataflow.worker.BatchDataflowWorker.executeWork(BatchDataflowWorker.java:420)
org.apache.beam.runners.dataflow.worker.BatchDataflowWorker.doWork(BatchDataflowWorker.java:389)
org.apache.beam.runners.dataflow.worker.BatchDataflowWorker.getAndPerformWork(BatchDataflowWorker.java:314)
org.apache.beam.runners.dataflow.worker.DataflowBatchWorkerHarness$WorkerThread.doWork(DataflowBatchWorkerHarness.java:140)
org.apache.beam.runners.dataflow.worker.DataflowBatchWorkerHarness$WorkerThread.call(DataflowBatchWorkerHarness.java:120)
org.apache.beam.runners.dataflow.worker.DataflowBatchWorkerHarness$WorkerThread.call(DataflowBatchWorkerHarness.java:107)
java.util.concurrent.FutureTask.run(FutureTask.java:266)
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
java.lang.Thread.run(Thread.java:748)

However when there is only a single worker, the error is like this:

The job failed because a work item has failed 4 times. Look in
previous log entries for the cause of each one of the 4 failures. For
more information, see
https://cloud.google.com/dataflow/docs/guides/common-errors. The work
item was attempted on these workers: xxx Root cause: The worker lost
contact with the service.,

The error guided suggested upgrade machine type.

Those errors happen when using SDK 2.34+. When I switched to SDK 2.33,
everything worked well without any issues. Tried SDK 2.34, 2.35 and
2.36, and found all of them got the same issue.

Context: The code simply just reads from BigQuery with a fixed table
of 4,034 records, does some transform, and outputs to GCS with
FileIO.WriteDynamic. All tests were performed using the same machine
type with the same worker number.

Does anyone know if there are any breaking changes in this SDK /
Dataflow runner?

Thanks so much!
Siyu

Reply via email to