You could clone the 2.X.0 branch of the Beam repo and patch in the commit I
referenced above. Then build the gradle target
:runners:google-cloud-dataflow-java:worker:legacy-worker:shadowJar
which should produce a jar in the build/libs/ directory. Pass the
additional flag --dataflowWorkerJar=path/to/worker/shadow.jar when running
your pipeline using 2.X

The code in the patched file hasn't changed in a while and you might be
fine building it using 2.11.0 but if that doesn't work try using one of the
newer releases.


On Tue, Aug 27, 2019 at 1:12 PM Zhiheng Huang <[email protected]> wrote:

> Thanks! Glad that this will be fixed in future release.
>
> Is there anyway that I can avoid hitting this problem before 2.16 is
> released?
>
> On Tue, Aug 27, 2019 at 12:57 PM Lukasz Cwik <[email protected]> wrote:
>
>> This is a known issue and was fixed with
>> https://github.com/apache/beam/commit/5d9bb4595c763025a369a959e18c6dd288e72314#diff-f149847d2c06f56ea591cab8d862c960
>>
>> It is meant to be released as part of 2.16.0
>>
>> On Tue, Aug 27, 2019 at 11:41 AM Zhiheng Huang <[email protected]>
>> wrote:
>>
>>> Hi all,
>>>
>>> Looking for help to understand an internal error from beam dataflow
>>> runner.
>>>
>>> I have a streaming pipeline that is on google dataflow runner(beam
>>> version 2.11, java). Recently I added a SplittableDoFn to my pipeline to
>>> continuously generate a sequence. However, after the job runs for a few
>>> hours, I start to see the following exception:
>>>
>>> java.util.NoSuchElementException
>>>  
>>> org.apache.beam.vendor.guava.v20_0.com.google.common.collect.MultitransformedIterator.next(MultitransformedIterator.java:63)
>>>
>>> org.apache.beam.vendor.guava.v20_0.com.google.common.collect.TransformedIterator.next(TransformedIterator.java:47)
>>>
>>> org.apache.beam.vendor.guava.v20_0.com.google.common.collect.Iterators.getOnlyElement(Iterators.java:308)
>>>
>>> org.apache.beam.vendor.guava.v20_0.com.google.common.collect.Iterables.getOnlyElement(Iterables.java:294)
>>>
>>> org.apache.beam.runners.dataflow.worker.DataflowProcessFnRunner.getUnderlyingWindow(DataflowProcessFnRunner.java:98)
>>>
>>> org.apache.beam.runners.dataflow.worker.DataflowProcessFnRunner.placeIntoElementWindow(DataflowProcessFnRunner.java:72)
>>>
>>> org.apache.beam.runners.dataflow.worker.DataflowProcessFnRunner.processElement(DataflowProcessFnRunner.java:62)
>>>
>>> org.apache.beam.runners.dataflow.worker.SimpleParDoFn.processElement(SimpleParDoFn.java:325)
>>>
>>> org.apache.beam.runners.dataflow.worker.util.common.worker.ParDoOperation.process(ParDoOperation.java:44)
>>>
>>> org.apache.beam.runners.dataflow.worker.util.common.worker.OutputReceiver.process(OutputReceiver.java:49)
>>>
>>> org.apache.beam.runners.dataflow.worker.util.common.worker.ReadOperation.runReadLoop(ReadOperation.java:201)
>>>
>>> org.apache.beam.runners.dataflow.worker.util.common.worker.ReadOperation.start(ReadOperation.java:159)
>>>
>>> org.apache.beam.runners.dataflow.worker.util.common.worker.MapTaskExecutor.execute(MapTaskExecutor.java:77)
>>>
>>> org.apache.beam.runners.dataflow.worker.StreamingDataflowWorker.process(StreamingDataflowWorker.java:1269)
>>>
>>> org.apache.beam.runners.dataflow.worker.StreamingDataflowWorker.access$1000(StreamingDataflowWorker.java:146)
>>>
>>> org.apache.beam.runners.dataflow.worker.StreamingDataflowWorker$6.run(StreamingDataflowWorker.java:1008)
>>>
>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>>>
>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>>>         java.lang.Thread.run(Thread.java:745)
>>>
>>> This is 100% reproducible and I am clueless about how this stacktrace
>>> can be debugged since there's no pointer to user code nor the failing
>>> PTransform. This happens only if I add the SplittableDoFn, whose rough
>>> structure is like this
>>> <https://gist.github.com/sylvon/cbcccdcb64aeb15002721977398dc308>.
>>>
>>> Appreciate any pointers on what might be going wrong or how this
>>> exception can be debugged.
>>>
>>> Thanks a lot!
>>>
>>>
>
> --
> Sylvon Huang
>

Reply via email to