Benjamin. boot command also failed when process is started from yarn (aws emr flink). Below is the log from TaskManager on worker node.
===================================================================================-- 2019-09-19 06:40:01,244 INFO org.apache.flink.runtime.taskmanager.Task - MapPartition (MapPartition at [3]{Create, ParDo(EsOutputFn)}) (1/1) (5ab2e91944b4e39079fad1bbc95db630) switched from DEPLOYING to RUNNING. 2019-09-19 06:40:03,348 INFO org.apache.beam.runners.fnexecution.artifact.BeamFileSystemArtifactRetrievalService - GetManifest for /tmp/artifactskjsmgv8p/job_ea4673de-71d0-4dd9-b683-fe8c52464666/MANIFEST 2019-09-19 06:40:03,349 INFO org.apache.beam.runners.fnexecution.artifact.BeamFileSystemArtifactRetrievalService - Loading manifest for retrieval token /tmp/artifactskjsmgv8p/job_ea4673de-71d0-4dd9-b683-fe8c52464666/MANIFEST 2019-09-19 06:40:03,356 INFO org.apache.beam.runners.fnexecution.artifact.BeamFileSystemArtifactRetrievalService - GetManifest for /tmp/artifactskjsmgv8p/job_ea4673de-71d0-4dd9-b683-fe8c52464666/MANIFEST failed java.util.concurrent.ExecutionException: java.io.FileNotFoundException: /tmp/artifactskjsmgv8p/job_ea4673de-71d0-4dd9-b683-fe8c52464666/MANIFEST (No such file or directory) at org.apache.beam.vendor.guava.v26_0_jre.com.google.common.util.concurrent.AbstractFuture.getDoneValue(AbstractFuture.java:531) at org.apache.beam.vendor.guava.v26_0_jre.com.google.common.util.concurrent.AbstractFuture.get(AbstractFuture.java:492) at org.apache.beam.vendor.guava.v26_0_jre.com.google.common.util.concurrent.AbstractFuture$TrustedFuture.get(AbstractFuture.java:83) at org.apache.beam.vendor.guava.v26_0_jre.com.google.common.util.concurrent.Uninterruptibles.getUninterruptibly(Uninterruptibles.java:196) at org.apache.beam.vendor.guava.v26_0_jre.com.google.common.cache.LocalCache$Segment.getAndRecordStats(LocalCache.java:2312) at org.apache.beam.vendor.guava.v26_0_jre.com.google.common.cache.LocalCache$Segment.loadSync(LocalCache.java:2278) at org.apache.beam.vendor.guava.v26_0_jre.com.google.common.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2154) at org.apache.beam.vendor.guava.v26_0_jre.com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2044) at org.apache.beam.vendor.guava.v26_0_jre.com.google.common.cache.LocalCache.get(LocalCache.java:3952) at org.apache.beam.vendor.guava.v26_0_jre.com.google.common.cache.LocalCache.getOrLoad(LocalCache.java:3974) at org.apache.beam.vendor.guava.v26_0_jre.com.google.common.cache.LocalCache$LocalLoadingCache.get(LocalCache.java:4958) at org.apache.beam.runners.fnexecution.artifact.BeamFileSystemArtifactRetrievalService.getManifest(BeamFileSystemArtifactRetrievalService.java:80) at org.apache.beam.model.jobmanagement.v1.ArtifactRetrievalServiceGrpc$MethodHandlers.invoke(ArtifactRetrievalServiceGrpc.java:308) at org.apache.beam.vendor.grpc.v1p21p0.io.grpc.stub.ServerCalls$UnaryServerCallHandler$UnaryServerCallListener.onHalfClose(ServerCalls.java:171) at org.apache.beam.vendor.grpc.v1p21p0.io.grpc.PartialForwardingServerCallListener.onHalfClose(PartialForwardingServerCallListener.java:35) at org.apache.beam.vendor.grpc.v1p21p0.io.grpc.ForwardingServerCallListener.onHalfClose(ForwardingServerCallListener.java:23) at org.apache.beam.vendor.grpc.v1p21p0.io.grpc.ForwardingServerCallListener$SimpleForwardingServerCallListener.onHalfClose(ForwardingServerCallListener.java:40) at org.apache.beam.vendor.grpc.v1p21p0.io.grpc.Contexts$ContextualizedServerCallListener.onHalfClose(Contexts.java:86) at org.apache.beam.vendor.grpc.v1p21p0.io.grpc.internal.ServerCallImpl$ServerStreamListenerImpl.halfClosed(ServerCallImpl.java:322) at org.apache.beam.vendor.grpc.v1p21p0.io.grpc.internal.ServerImpl$JumpToApplicationThreadServerStreamListener$1HalfClosed.runInContext(ServerImpl.java:762) at org.apache.beam.vendor.grpc.v1p21p0.io.grpc.internal.ContextRunnable.run(ContextRunnable.java:37) at org.apache.beam.vendor.grpc.v1p21p0.io.grpc.internal.SerializingExecutor.run(SerializingExecutor.java:123) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Caused by: java.io.FileNotFoundException: /tmp/artifactskjsmgv8p/job_ea4673de-71d0-4dd9-b683-fe8c52464666/MANIFEST (No such file or directory) at java.io.FileInputStream.open0(Native Method) at java.io.FileInputStream.open(FileInputStream.java:195) at java.io.FileInputStream.<init>(FileInputStream.java:138) at org.apache.beam.sdk.io.LocalFileSystem.open(LocalFileSystem.java:114) at org.apache.beam.sdk.io.LocalFileSystem.open(LocalFileSystem.java:81) at org.apache.beam.sdk.io.FileSystems.open(FileSystems.java:252) at org.apache.beam.runners.fnexecution.artifact.BeamFileSystemArtifactRetrievalService.loadManifest(BeamFileSystemArtifactRetrievalService.java:185) at org.apache.beam.runners.fnexecution.artifact.BeamFileSystemArtifactRetrievalService.loadManifest(BeamFileSystemArtifactRetrievalService.java:180) at org.apache.beam.runners.fnexecution.artifact.BeamFileSystemArtifactRetrievalService$1.load(BeamFileSystemArtifactRetrievalService.java:171) at org.apache.beam.runners.fnexecution.artifact.BeamFileSystemArtifactRetrievalService$1.load(BeamFileSystemArtifactRetrievalService.java:168) at org.apache.beam.vendor.guava.v26_0_jre.com.google.common.cache.LocalCache$LoadingValueReference.loadFuture(LocalCache.java:3528) at org.apache.beam.vendor.guava.v26_0_jre.com.google.common.cache.LocalCache$Segment.loadSync(LocalCache.java:2277) ... 19 more 2019-09-19 06:42:03,100 INFO org.apache.beam.runners.fnexecution.environment.ProcessEnvironmentFactory - Still waiting for startup of environment '/opt/apache/beam/boot' for worker id 1 2019-09-19 06:42:03,101 ERROR org.apache.flink.runtime.operators.BatchTask - Error in task code: MapPartition (MapPartition at [3]{Create, ParDo(EsOutputFn)}) (1/1) java.lang.Exception: The user defined 'open()' method caused an exception: java.lang.IllegalStateException: Process died with exit code 1 at org.apache.flink.runtime.operators.BatchTask.run(BatchTask.java:498) at org.apache.flink.runtime.operators.BatchTask.invoke(BatchTask.java:368) at org.apache.flink.runtime.taskmanager.Task.run(Task.java:711) at java.lang.Thread.run(Thread.java:748) Caused by: org.apache.beam.vendor.guava.v26_0_jre.com.google.common.util.concurrent.UncheckedExecutionException: java.lang.IllegalStateException: Process died with exit code 1 at org.apache.beam.vendor.guava.v26_0_jre.com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2050) at org.apache.beam.vendor.guava.v26_0_jre.com.google.common.cache.LocalCache.get(LocalCache.java:3952) at org.apache.beam.vendor.guava.v26_0_jre.com.google.common.cache.LocalCache.getOrLoad(LocalCache.java:3974) at org.apache.beam.vendor.guava.v26_0_jre.com.google.common.cache.LocalCache$LocalLoadingCache.get(LocalCache.java:4958) at org.apache.beam.vendor.guava.v26_0_jre.com.google.common.cache.LocalCache$LocalLoadingCache.getUnchecked(LocalCache.java:4964) at org.apache.beam.runners.fnexecution.control.DefaultJobBundleFactory$SimpleStageBundleFactory.<init>(DefaultJobBundleFactory.java:211) at org.apache.beam.runners.fnexecution.control.DefaultJobBundleFactory$SimpleStageBundleFactory.<init>(DefaultJobBundleFactory.java:202) at org.apache.beam.runners.fnexecution.control.DefaultJobBundleFactory.forStage(DefaultJobBundleFactory.java:185) at org.apache.beam.runners.flink.translation.functions.FlinkDefaultExecutableStageContext.getStageBundleFactory(FlinkDefaultExecutableStageContext.java:49) at org.apache.beam.runners.flink.translation.functions.ReferenceCountingFlinkExecutableStageContextFactory$WrappedContext.getStageBundleFactory(ReferenceCountingFlinkExecutableStageContextFactory.java:203) at org.apache.beam.runners.flink.translation.functions.FlinkExecutableStageFunction.open(FlinkExecutableStageFunction.java:129) at org.apache.flink.api.common.functions.util.FunctionUtils.openFunction(FunctionUtils.java:36) at org.apache.flink.runtime.operators.BatchTask.run(BatchTask.java:494) ... 3 more Caused by: java.lang.IllegalStateException: Process died with exit code 1 at org.apache.beam.runners.fnexecution.environment.ProcessManager$RunningProcess.isAliveOrThrow(ProcessManager.java:76) at org.apache.beam.runners.fnexecution.environment.ProcessEnvironmentFactory.createEnvironment(ProcessEnvironmentFactory.java:125) at org.apache.beam.runners.fnexecution.control.DefaultJobBundleFactory$1.load(DefaultJobBundleFactory.java:178) at org.apache.beam.runners.fnexecution.control.DefaultJobBundleFactory$1.load(DefaultJobBundleFactory.java:162) at org.apache.beam.vendor.guava.v26_0_jre.com.google.common.cache.LocalCache$LoadingValueReference.loadFuture(LocalCache.java:3528) at org.apache.beam.vendor.guava.v26_0_jre.com.google.common.cache.LocalCache$Segment.loadSync(LocalCache.java:2277) at org.apache.beam.vendor.guava.v26_0_jre.com.google.common.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2154) at org.apache.beam.vendor.guava.v26_0_jre.com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2044) ... 15 more 2019-09-19 06:42:03,103 INFO org.apache.flink.runtime.taskmanager.Task - MapPartition (MapPartition at [3]{Create, ParDo(EsOutputFn)}) (1/1) (5ab2e91944b4e39079fad1bbc95db630) switched from RUNNING to FAILED. java.lang.Exception: The user defined 'open()' method caused an exception: java.lang.IllegalStateException: Process died with exit code 1 at org.apache.flink.runtime.operators.BatchTask.run(BatchTask.java:498) at org.apache.flink.runtime.operators.BatchTask.invoke(BatchTask.java:368) at org.apache.flink.runtime.taskmanager.Task.run(Task.java:711) at java.lang.Thread.run(Thread.java:748) Caused by: org.apache.beam.vendor.guava.v26_0_jre.com.google.common.util.concurrent.UncheckedExecutionException: java.lang.IllegalStateException: Process died with exit code 1 at org.apache.beam.vendor.guava.v26_0_jre.com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2050) at org.apache.beam.vendor.guava.v26_0_jre.com.google.common.cache.LocalCache.get(LocalCache.java:3952) at org.apache.beam.vendor.guava.v26_0_jre.com.google.common.cache.LocalCache.getOrLoad(LocalCache.java:3974) at org.apache.beam.vendor.guava.v26_0_jre.com.google.common.cache.LocalCache$LocalLoadingCache.get(LocalCache.java:4958) at org.apache.beam.vendor.guava.v26_0_jre.com.google.common.cache.LocalCache$LocalLoadingCache.getUnchecked(LocalCache.java:4964) at org.apache.beam.runners.fnexecution.control.DefaultJobBundleFactory$SimpleStageBundleFactory.<init>(DefaultJobBundleFactory.java:211) at org.apache.beam.runners.fnexecution.control.DefaultJobBundleFactory$SimpleStageBundleFactory.<init>(DefaultJobBundleFactory.java:202) at org.apache.beam.runners.fnexecution.control.DefaultJobBundleFactory.forStage(DefaultJobBundleFactory.java:185) at org.apache.beam.runners.flink.translation.functions.FlinkDefaultExecutableStageContext.getStageBundleFactory(FlinkDefaultExecutableStageContext.java:49) at org.apache.beam.runners.flink.translation.functions.ReferenceCountingFlinkExecutableStageContextFactory$WrappedContext.getStageBundleFactory(ReferenceCountingFlinkExecutableStageContextFactory.java:203) at org.apache.beam.runners.flink.translation.functions.FlinkExecutableStageFunction.open(FlinkExecutableStageFunction.java:129) at org.apache.flink.api.common.functions.util.FunctionUtils.openFunction(FunctionUtils.java:36) at org.apache.flink.runtime.operators.BatchTask.run(BatchTask.java:494) ... 3 more Caused by: java.lang.IllegalStateException: Process died with exit code 1 at org.apache.beam.runners.fnexecution.environment.ProcessManager$RunningProcess.isAliveOrThrow(ProcessManager.java:76) at org.apache.beam.runners.fnexecution.environment.ProcessEnvironmentFactory.createEnvironment(ProcessEnvironmentFactory.java:125) at org.apache.beam.runners.fnexecution.control.DefaultJobBundleFactory$1.load(DefaultJobBundleFactory.java:178) at org.apache.beam.runners.fnexecution.control.DefaultJobBundleFactory$1.load(DefaultJobBundleFactory.java:162) at org.apache.beam.vendor.guava.v26_0_jre.com.google.common.cache.LocalCache$LoadingValueReference.loadFuture(LocalCache.java:3528) at org.apache.beam.vendor.guava.v26_0_jre.com.google.common.cache.LocalCache$Segment.loadSync(LocalCache.java:2277) at org.apache.beam.vendor.guava.v26_0_jre.com.google.common.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2154) at org.apache.beam.vendor.guava.v26_0_jre.com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2044) ... 15 more 2019-09-19 06:42:03,103 INFO org.apache.flink.runtime.taskmanager.Task - Freeing task resources for MapPartition (MapPartition at [3]{Create, ParDo(EsOutputFn)}) (1/1) (5ab2e91944b4e39079fad1bbc95db630). ===================================================================================-- Pipeline Option is ===================================================================================-- options = PipelineOptions([ "--runner=FlinkRunner", "--flink_version=1.8", "--flink_master_url=ip-172-31-1-84.ap-northeast-1.compute.internal:45437", '--environment_config={"command": "/opt/apache/beam/boot"}', "--environment_type=PROCESS", "--experiments=beam_fn_api", "--retain_docker_container" ]) ===================================================================================-- I have copied "boot" file to all corresponding flink nodes ===================================================================================-- FROM: beam-release-2.15.0/sdks/python/container/build/target/launcher/linux_amd64/boot TO: Master and Data nodes (/opt/apache/beam) ===================================================================================-- I have set permission of all boot file as ===================================================================================-- -rwxrwxrwx 1 yarn ec2-user 16543786 Sep 19 04:39 boot ===================================================================================-- Looks like the PROCESS command was blocked because of this error. ===================================================================================-- java.util.concurrent.ExecutionException: java.io.FileNotFoundException: /tmp/artifactskjsmgv8p/job_ea4673de-71d0-4dd9-b683-fe8c52464666/MANIFEST (No such file or directory) ===================================================================================-- Thanks, Yu Watanabe On Wed, Sep 18, 2019 at 11:37 PM Benjamin Tan <benjamintanwei...@gmail.com> wrote: > Try this as part of PipelineOptions: > > --environment_config={\"command\":\"/opt/apache/beam/boot\"} > > On 2019/09/18 10:40:42, Yu Watanabe <yu.w.ten...@gmail.com> wrote: > > Hello. > > > > I am trying to run FlinkRunner (2.15.0) on AWS EC2 instance and submit > job > > to AWS EMR (5.26.0). > > > > However, I get below error when I run the pipeline and fail. > > > > ========================================================- > > Caused by: java.lang.Exception: The user defined 'open()' method caused > an > > exception: java.io.IOException: Cannot run program "docker": error=2, No > > such file or directory > > at > > org.apache.flink.runtime.operators.BatchTask.run(BatchTask.java:498) > > at > > org.apache.flink.runtime.operators.BatchTask.invoke(BatchTask.java:368) > > at org.apache.flink.runtime.taskmanager.Task.run(Task.java:711) > > ... 1 more > > Caused by: > > > org.apache.beam.vendor.guava.v26_0_jre.com.google.common.util.concurrent.UncheckedExecutionException: > > java.io.IOException: Cannot run program "docker": error=2, No such file > or > > directory > > at > > > org.apache.beam.vendor.guava.v26_0_jre.com.google.common.cache.LocalCache$LocalLoadingCache.getUnchecked(LocalCache.java:4966) > > at > > > org.apache.beam.runners.fnexecution.control.DefaultJobBundleFactory$SimpleStageBundleFactory.<init>(DefaultJobBundleFactory.java:211) > > at > > > org.apache.beam.runners.fnexecution.control.DefaultJobBundleFactory$SimpleStageBundleFactory.<init>(DefaultJobBundleFactory.java:202) > > at > > > org.apache.beam.runners.fnexecution.control.DefaultJobBundleFactory.forStage(DefaultJobBundleFactory.java:185) > > at > > > org.apache.beam.runners.flink.translation.functions.FlinkDefaultExecutableStageContext.getStageBundleFactory(FlinkDefaultExecutableStageContext.java:49) > > at > > > org.apache.beam.runners.flink.translation.functions.ReferenceCountingFlinkExecutableStageContextFactory$WrappedContext.getStageBundleFactory(ReferenceCountingFlinkExecutableStageContextFactory.java:203) > > at > > > org.apache.beam.runners.flink.translation.functions.FlinkExecutableStageFunction.open(FlinkExecutableStageFunction.java:129) > > at > > > org.apache.flink.api.common.functions.util.FunctionUtils.openFunction(FunctionUtils.java:36) > > at > > org.apache.flink.runtime.operators.BatchTask.run(BatchTask.java:494) > > ... 3 more > > Caused by: java.io.IOException: Cannot run program "docker": error=2, No > > such file or directory > > at java.lang.ProcessBuilder.start(ProcessBuilder.java:1048) > > at > > > org.apache.beam.runners.fnexecution.environment.DockerCommand.runShortCommand(DockerCommand.java:141) > > at > > > org.apache.beam.runners.fnexecution.environment.DockerCommand.runImage(DockerCommand.java:92) > > at > > > org.apache.beam.runners.fnexecution.environment.DockerEnvironmentFactory.createEnvironment(DockerEnvironmentFactory.java:152) > > at > > > org.apache.beam.runners.fnexecution.control.DefaultJobBundleFactory$1.load(DefaultJobBundleFactory.java:178) > > at > > > org.apache.beam.runners.fnexecution.control.DefaultJobBundleFactory$1.load(DefaultJobBundleFactory.java:162) > > at > > > org.apache.beam.vendor.guava.v26_0_jre.com.google.common.cache.LocalCache$LoadingValueReference.loadFuture(LocalCache.java:3528) > > at > > > org.apache.beam.vendor.guava.v26_0_jre.com.google.common.cache.LocalCache$Segment.loadSync(LocalCache.java:2277) > > at > > > org.apache.beam.vendor.guava.v26_0_jre.com.google.common.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2154) > > at > > > org.apache.beam.vendor.guava.v26_0_jre.com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2044) > > at > > > org.apache.beam.vendor.guava.v26_0_jre.com.google.common.cache.LocalCache.get(LocalCache.java:3952) > > at > > > org.apache.beam.vendor.guava.v26_0_jre.com.google.common.cache.LocalCache.getOrLoad(LocalCache.java:3974) > > at > > > org.apache.beam.vendor.guava.v26_0_jre.com.google.common.cache.LocalCache$LocalLoadingCache.get(LocalCache.java:4958) > > at > > > org.apache.beam.vendor.guava.v26_0_jre.com.google.common.cache.LocalCache$LocalLoadingCache.getUnchecked(LocalCache.java:4964) > > ... 11 more > > Caused by: java.io.IOException: error=2, No such file or directory > > at java.lang.UNIXProcess.forkAndExec(Native Method) > > at java.lang.UNIXProcess.<init>(UNIXProcess.java:247) > > at java.lang.ProcessImpl.start(ProcessImpl.java:134) > > at java.lang.ProcessBuilder.start(ProcessBuilder.java:1029) > > ... 24 more > > ========================================================- > > > > Pipeline options are below. > > ========================================================- > > options = PipelineOptions([ > > "--runner=FlinkRunner", > > "--flink_version=1.8", > > > > > "--flink_master_url=ip-172-31-1-84.ap-northeast-1.compute.internal:43581", > > "--environment_config= > > asia.gcr.io/PROJECTNAME/beam/python3", > > "--experiments=beam_fn_api" > > ]) > > > > > > p = beam.Pipeline(options=options) > > ========================================================- > > > > I am able to run docker info ec2-user on the server where script is > > running.. > > > > ========================================================- > > (python) [ec2-user@ip-172-31-2-121 ~]$ docker info > > Containers: 0 > > Running: 0 > > Paused: 0 > > Stopped: 0 > > ... > > ========================================================- > > > > I used "debian-stretch" . > > > > ========================================================- > > > debian-stretch-hvm-x86_64-gp2-2019-09-08-17994-572488bb-fc09-4638-8628-e1e1d26436f4-ami-0ed2d2283aa1466df.4 > > (ami-06f16171199d98c63) > > ========================================================- > > > > This seems to not happen when flink runs locally. > > > > ========================================================- > > admin@ip-172-31-9-89:/opt/flink$ sudo ss -atunp | grep 8081 > > tcp LISTEN 0 128 :::8081 :::* > > users:(("java",pid=18420,fd=82)) > > admin@ip-172-31-9-89:/opt/flink$ sudo ps -ef | grep java | head -1 > > admin 17698 1 0 08:59 ? 00:00:12 java -jar > > > /home/admin/.apache_beam/cache/beam-runners-flink-1.8-job-server-2.15.0.jar > > --flink-master-url ip-172-31-1-84.ap-northeast-1.compute.internal:43581 > > --artifacts-dir /tmp/artifactskj47j8yn --job-port 48205 --artifact-port 0 > > --expansion-port 0 > > admin@ip-172-31-9-89:/opt/flink$ > > ========================================================- > > > > Would there be any other setting I need to look for when running on EC2 > > instance ? > > > > Thanks, > > Yu Watanabe > > > > -- > > Yu Watanabe > > Weekend Freelancer who loves to challenge building data platform > > yu.w.ten...@gmail.com > > [image: LinkedIn icon] <https://www.linkedin.com/in/yuwatanabe1> > [image: > > Twitter icon] <https://twitter.com/yuwtennis> > > > -- Yu Watanabe Weekend Freelancer who loves to challenge building data platform yu.w.ten...@gmail.com [image: LinkedIn icon] <https://www.linkedin.com/in/yuwatanabe1> [image: Twitter icon] <https://twitter.com/yuwtennis>