Seems like some file is missing. Why not manually docker pull the image 
(remember to adjust the container location to omit to registry) locally first? 
At least you can eliminate another source of trouble. 

Also, depending on your pipeline, if you are running distributed, you must make 
sure your files are accessible. For example, local paths won’t work at all. 

So maybe you can force a single worker first too. 

Sent from my iPhone

> On 20 Sep 2019, at 5:11 PM, Yu Watanabe <[email protected]> wrote:
> 
> Ankur
> 
> Thank you for the advice. 
> You're right. Looking at the task manager's log,  looks like first "docker 
> pull" fails from yarn user and then couple of errors comes after.
> As a result, "docker run" seems to fail.
> I have been working on whole week and still not manage through from  yarn 
> session to get authenticated against Google Container Registry...
> 
> ==============================================================================================
> 2019-09-19 06:47:38,196 INFO  org.apache.flink.runtime.taskmanager.Task       
>               - MapPartition (MapPartition at [3]{Create, ParDo(EsOutputFn)}) 
> (1/1) (d2f0d79e4614c3b0cb5a8cbd38de37da) switched from DEPLOYING to RUNNING.
> 2019-09-19 06:47:41,181 WARN  
> org.apache.beam.runners.fnexecution.environment.DockerCommand  - Unable to 
> pull docker image asia.gcr.io/PROJECTNAME/beam/python3:latest, cause: 
> Received exit code 1 for command 'docker pull 
> asia.gcr.io/creationline001/beam/python3:latest'. stderr: Error response from 
> daemon: unauthorized: You don't have the needed permissions to perform this 
> operation, and you may have invalid credentials. To authenticate your 
> request, follow the steps in: 
> https://cloud.google.com/container-registry/docs/advanced-authentication
> 2019-09-19 06:47:44,035 INFO  
> org.apache.beam.runners.fnexecution.artifact.BeamFileSystemArtifactRetrievalService
>   - GetManifest for 
> /tmp/artifactsknnvmjj8/job_fc5fff58-4408-4e0d-833b-675215218234/MANIFEST
> 2019-09-19 06:47:44,037 INFO  
> org.apache.beam.runners.fnexecution.artifact.BeamFileSystemArtifactRetrievalService
>   - Loading manifest for retrieval token 
> /tmp/artifactsknnvmjj8/job_fc5fff58-4408-4e0d-833b-675215218234/MANIFEST
> 2019-09-19 06:47:44,046 INFO  
> org.apache.beam.runners.fnexecution.artifact.BeamFileSystemArtifactRetrievalService
>   - GetManifest for 
> /tmp/artifactsknnvmjj8/job_fc5fff58-4408-4e0d-833b-675215218234/MANIFEST 
> failed
> java.util.concurrent.ExecutionException: java.io.FileNotFoundException: 
> /tmp/artifactsknnvmjj8/job_fc5fff58-4408-4e0d-833b-675215218234/MANIFEST (No 
> such file or directory)
>         at 
> org.apache.beam.vendor.guava.v26_0_jre.com.google.common.util.concurrent.AbstractFuture.getDoneValue(AbstractFuture.java:531)
>         at 
> org.apache.beam.vendor.guava.v26_0_jre.com.google.common.util.concurrent.AbstractFuture.get(AbstractFuture.java:492)
> ...
> Caused by: java.io.FileNotFoundException: 
> /tmp/artifactsknnvmjj8/job_fc5fff58-4408-4e0d-833b-675215218234/MANIFEST (No 
> such file or directory)
>         at java.io.FileInputStream.open0(Native Method)
>         at java.io.FileInputStream.open(FileInputStream.java:195)
>         at java.io.FileInputStream.<init>(FileInputStream.java:138)
>         at 
> org.apache.beam.sdk.io.LocalFileSystem.open(LocalFileSystem.java:114)
> ...
> 2019-09-19 06:48:43,952 ERROR org.apache.flink.runtime.operators.BatchTask    
>               - Error in task code:  MapPartition (MapPartition at 
> [3]{Create, ParDo(EsOutputFn)}) (1/1)
> java.lang.Exception: The user defined 'open()' method caused an exception: 
> java.io.IOException: Received exit code 1 for command 'docker inspect -f 
> {{.State.Running}} 
> 7afbdcfd241629d24872ba1c74ef10f3d07c854c9cc675a65d4d16b9fdbde752'. stderr: 
> Error: No such object: 
> 7afbdcfd241629d24872ba1c74ef10f3d07c854c9cc675a65d4d16b9fdbde752
>         at 
> org.apache.flink.runtime.operators.BatchTask.run(BatchTask.java:498)
>         at 
> org.apache.flink.runtime.operators.BatchTask.invoke(BatchTask.java:368)
>         at org.apache.flink.runtime.taskmanager.Task.run(Task.java:711)
>         at java.lang.Thread.run(Thread.java:748)
> ...
> ==============================================================================================
> 
> Thanks,
> Yu Watanabe
> 
>> On Thu, Sep 19, 2019 at 3:47 AM Ankur Goenka <[email protected]> wrote:
>> Adding to the previous suggestions.
>> You can also add "--retain_docker_container" to your pipeline option and 
>> later login to the machine to check the docker container log.
>> 
>> Also, in my experience running on yarn, the yarn user some time do not have 
>> access to use docker. I would suggest checking if the yarn user on 
>> TaskManagers have permission to use docker.
>> 
>> 
>>> On Wed, Sep 18, 2019 at 11:23 AM Kyle Weaver <[email protected]> wrote:
>>> > Per your suggest, I read the design sheet and  it states that harness 
>>> > container is a mandatory settings for  all TaskManger.
>>> 
>>> That doc is out of date. As Benjamin said, it's not strictly required any 
>>> more to use Docker. However, it is still recommended, as Docker makes 
>>> managing dependencies a lot easier, whereas PROCESS mode involves managing 
>>> dependencies via shell scripts.
>>> 
>>> > Caused by: java.lang.Exception: The user defined 'open()' method caused 
>>> > an exception: java.io.IOException: Received exit code 1 for command 
>>> > 'docker inspect -f {{.State.Running}} 
>>> > 3e7e0d2d9d8362995d4b566ce1611834d56c5ca550ae89ae698a279271e4c33b'. 
>>> > stderr: Error: No such object: 
>>> > 3e7e0d2d9d8362995d4b566ce1611834d56c5ca550ae89ae698a279271e4c33b
>>> 
>>> This means your Docker container is failing to start up for some reason. I 
>>> recommend either a) running the container manually and inspecting the logs, 
>>> or b) you can use the master or Beam 2.16 branches, which have better 
>>> Docker logging (https://github.com/apache/beam/pull/9389).
>>> 
>>> Kyle Weaver | Software Engineer | github.com/ibzib | [email protected]
>>> 
>>> 
>>>> On Wed, Sep 18, 2019 at 8:04 AM Yu Watanabe <[email protected]> wrote:
>>>> Thank you for the reply.
>>>> 
>>>> I see files "boot" under below directories. 
>>>> But these seems to be used for containers.
>>>> 
>>>>   (python) admin@ip-172-31-9-89:~/beam-release-2.15.0$ find ./ -name 
>>>> "boot" -exec ls -l {} \;
>>>> lrwxrwxrwx 1 admin admin 23 Sep 16 23:43 
>>>> ./sdks/python/container/.gogradle/project_gopath/src/github.com/apache/beam/sdks/python/boot
>>>>  -> ../../../../../../../..
>>>> -rwxr-xr-x 1 admin admin 16543786 Sep 16 23:48 
>>>> ./sdks/python/container/build/target/launcher/linux_amd64/boot
>>>> -rwxr-xr-x 1 admin admin 16358928 Sep 16 23:48 
>>>> ./sdks/python/container/build/target/launcher/darwin_amd64/boot
>>>> -rwxr-xr-x 1 admin admin 16543786 Sep 16 23:48 
>>>> ./sdks/python/container/py3/build/docker/target/linux_amd64/boot
>>>> -rwxr-xr-x 1 admin admin 16358928 Sep 16 23:48 
>>>> ./sdks/python/container/py3/build/docker/target/darwin_amd64/boot
>>>> -rwxr-xr-x 1 admin admin 16543786 Sep 16 23:48 
>>>> ./sdks/python/container/py3/build/target/linux_amd64/boot
>>>> -rwxr-xr-x 1 admin admin 16358928 Sep 16 23:48 
>>>> ./sdks/python/container/py3/build/target/darwin_amd64/boot  
>>>> 
>>>>> On Wed, Sep 18, 2019 at 11:37 PM Benjamin Tan 
>>>>> <[email protected]> wrote:
>>>> 
>>>>> Try this as part of PipelineOptions:
>>>>> 
>>>>> --environment_config={\"command\":\"/opt/apache/beam/boot\"}
>>>>> 
>>>>> On 2019/09/18 10:40:42, Yu Watanabe <[email protected]> wrote: 
>>>>> > Hello.
>>>>> > 
>>>>> > I am trying to run FlinkRunner (2.15.0) on AWS EC2 instance and submit 
>>>>> > job
>>>>> > to AWS EMR (5.26.0).
>>>>> > 
>>>>> > However, I get below error when I run the pipeline and fail.
>>>>> > 
>>>>> > ========================================================-
>>>>> > Caused by: java.lang.Exception: The user defined 'open()' method caused 
>>>>> > an
>>>>> > exception: java.io.IOException: Cannot run program "docker": error=2, No
>>>>> > such file or directory
>>>>> >         at
>>>>> > org.apache.flink.runtime.operators.BatchTask.run(BatchTask.java:498)
>>>>> >         at
>>>>> > org.apache.flink.runtime.operators.BatchTask.invoke(BatchTask.java:368)
>>>>> >         at org.apache.flink.runtime.taskmanager.Task.run(Task.java:711)
>>>>> >         ... 1 more
>>>>> > Caused by:
>>>>> > org.apache.beam.vendor.guava.v26_0_jre.com.google.common.util.concurrent.UncheckedExecutionException:
>>>>> > java.io.IOException: Cannot run program "docker": error=2, No such file 
>>>>> > or
>>>>> > directory
>>>>> >         at
>>>>> > org.apache.beam.vendor.guava.v26_0_jre.com.google.common.cache.LocalCache$LocalLoadingCache.getUnchecked(LocalCache.java:4966)
>>>>> >         at
>>>>> > org.apache.beam.runners.fnexecution.control.DefaultJobBundleFactory$SimpleStageBundleFactory.<init>(DefaultJobBundleFactory.java:211)
>>>>> >         at
>>>>> > org.apache.beam.runners.fnexecution.control.DefaultJobBundleFactory$SimpleStageBundleFactory.<init>(DefaultJobBundleFactory.java:202)
>>>>> >         at
>>>>> > org.apache.beam.runners.fnexecution.control.DefaultJobBundleFactory.forStage(DefaultJobBundleFactory.java:185)
>>>>> >         at
>>>>> > org.apache.beam.runners.flink.translation.functions.FlinkDefaultExecutableStageContext.getStageBundleFactory(FlinkDefaultExecutableStageContext.java:49)
>>>>> >         at
>>>>> > org.apache.beam.runners.flink.translation.functions.ReferenceCountingFlinkExecutableStageContextFactory$WrappedContext.getStageBundleFactory(ReferenceCountingFlinkExecutableStageContextFactory.java:203)
>>>>> >         at
>>>>> > org.apache.beam.runners.flink.translation.functions.FlinkExecutableStageFunction.open(FlinkExecutableStageFunction.java:129)
>>>>> >         at
>>>>> > org.apache.flink.api.common.functions.util.FunctionUtils.openFunction(FunctionUtils.java:36)
>>>>> >         at
>>>>> > org.apache.flink.runtime.operators.BatchTask.run(BatchTask.java:494)
>>>>> >         ... 3 more
>>>>> > Caused by: java.io.IOException: Cannot run program "docker": error=2, No
>>>>> > such file or directory
>>>>> >         at java.lang.ProcessBuilder.start(ProcessBuilder.java:1048)
>>>>> >         at
>>>>> > org.apache.beam.runners.fnexecution.environment.DockerCommand.runShortCommand(DockerCommand.java:141)
>>>>> >         at
>>>>> > org.apache.beam.runners.fnexecution.environment.DockerCommand.runImage(DockerCommand.java:92)
>>>>> >         at
>>>>> > org.apache.beam.runners.fnexecution.environment.DockerEnvironmentFactory.createEnvironment(DockerEnvironmentFactory.java:152)
>>>>> >         at
>>>>> > org.apache.beam.runners.fnexecution.control.DefaultJobBundleFactory$1.load(DefaultJobBundleFactory.java:178)
>>>>> >         at
>>>>> > org.apache.beam.runners.fnexecution.control.DefaultJobBundleFactory$1.load(DefaultJobBundleFactory.java:162)
>>>>> >         at
>>>>> > org.apache.beam.vendor.guava.v26_0_jre.com.google.common.cache.LocalCache$LoadingValueReference.loadFuture(LocalCache.java:3528)
>>>>> >         at
>>>>> > org.apache.beam.vendor.guava.v26_0_jre.com.google.common.cache.LocalCache$Segment.loadSync(LocalCache.java:2277)
>>>>> >         at
>>>>> > org.apache.beam.vendor.guava.v26_0_jre.com.google.common.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2154)
>>>>> >         at
>>>>> > org.apache.beam.vendor.guava.v26_0_jre.com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2044)
>>>>> >         at
>>>>> > org.apache.beam.vendor.guava.v26_0_jre.com.google.common.cache.LocalCache.get(LocalCache.java:3952)
>>>>> >         at
>>>>> > org.apache.beam.vendor.guava.v26_0_jre.com.google.common.cache.LocalCache.getOrLoad(LocalCache.java:3974)
>>>>> >         at
>>>>> > org.apache.beam.vendor.guava.v26_0_jre.com.google.common.cache.LocalCache$LocalLoadingCache.get(LocalCache.java:4958)
>>>>> >         at
>>>>> > org.apache.beam.vendor.guava.v26_0_jre.com.google.common.cache.LocalCache$LocalLoadingCache.getUnchecked(LocalCache.java:4964)
>>>>> >         ... 11 more
>>>>> > Caused by: java.io.IOException: error=2, No such file or directory
>>>>> >         at java.lang.UNIXProcess.forkAndExec(Native Method)
>>>>> >         at java.lang.UNIXProcess.<init>(UNIXProcess.java:247)
>>>>> >         at java.lang.ProcessImpl.start(ProcessImpl.java:134)
>>>>> >         at java.lang.ProcessBuilder.start(ProcessBuilder.java:1029)
>>>>> >         ... 24 more
>>>>> >  ========================================================-
>>>>> > 
>>>>> > Pipeline options are below.
>>>>> >  ========================================================-
>>>>> >         options = PipelineOptions([
>>>>> >                       "--runner=FlinkRunner",
>>>>> >                       "--flink_version=1.8",
>>>>> > 
>>>>> > "--flink_master_url=ip-172-31-1-84.ap-northeast-1.compute.internal:43581",
>>>>> >                       "--environment_config=
>>>>> > asia.gcr.io/PROJECTNAME/beam/python3",
>>>>> >                       "--experiments=beam_fn_api"
>>>>> >                   ])
>>>>> > 
>>>>> > 
>>>>> >         p = beam.Pipeline(options=options)
>>>>> >  ========================================================-
>>>>> > 
>>>>> > I am able to run docker info ec2-user on the server where script is
>>>>> > running..
>>>>> > 
>>>>> >  ========================================================-
>>>>> > (python) [ec2-user@ip-172-31-2-121 ~]$ docker info
>>>>> > Containers: 0
>>>>> >  Running: 0
>>>>> >  Paused: 0
>>>>> >  Stopped: 0
>>>>> > ...
>>>>> >  ========================================================-
>>>>> > 
>>>>> > I used  "debian-stretch" .
>>>>> > 
>>>>> > ========================================================-
>>>>> > debian-stretch-hvm-x86_64-gp2-2019-09-08-17994-572488bb-fc09-4638-8628-e1e1d26436f4-ami-0ed2d2283aa1466df.4
>>>>> > (ami-06f16171199d98c63)
>>>>> > ========================================================-
>>>>> > 
>>>>> > This seems to not happen when flink runs locally.
>>>>> > 
>>>>> > ========================================================-
>>>>> > admin@ip-172-31-9-89:/opt/flink$ sudo ss -atunp | grep 8081
>>>>> > tcp    LISTEN     0      128      :::8081                 :::*
>>>>> >       users:(("java",pid=18420,fd=82))
>>>>> > admin@ip-172-31-9-89:/opt/flink$ sudo ps -ef | grep java | head -1
>>>>> > admin    17698     1  0 08:59 ?        00:00:12 java -jar
>>>>> > /home/admin/.apache_beam/cache/beam-runners-flink-1.8-job-server-2.15.0.jar
>>>>> > --flink-master-url ip-172-31-1-84.ap-northeast-1.compute.internal:43581
>>>>> > --artifacts-dir /tmp/artifactskj47j8yn --job-port 48205 --artifact-port >>>>> > 0
>>>>> > --expansion-port 0
>>>>> > admin@ip-172-31-9-89:/opt/flink$
>>>>> > ========================================================-
>>>>> > 
>>>>> > Would there be any other setting I need to look for when running on EC2
>>>>> > instance ?
>>>>> > 
>>>>> > Thanks,
>>>>> > Yu Watanabe
>>>>> > 
>>>>> > -- 
>>>>> > Yu Watanabe
>>>>> > Weekend Freelancer who loves to challenge building data platform
>>>>> > [email protected]
>>>>> > [image: LinkedIn icon] <https://www.linkedin.com/in/yuwatanabe1>  
>>>>> > [image:
>>>>> > Twitter icon] <https://twitter.com/yuwtennis>
>>>>> > 
>>>> 
>>>> 
>>>> -- 
>>>> Yu Watanabe
>>>> Weekend Freelancer who loves to challenge building data platform
>>>> [email protected]
>>>>     
> 
> 
> -- 
> Yu Watanabe
> Weekend Freelancer who loves to challenge building data platform
> [email protected]
>     

Reply via email to