Hi Javier,
I don't see anything that's configured in the wrong way based on the
jobmanager logs you've provided. Have you been able to deploy other
applications to this Mesos cluster? Do the Mesos master logs reveal
anything? The variable resolution on the TaskManager side is a valid
concern shared by Roman since it's easy to run into such an issue. But the
JobManager logs indicate that the JobManager is not able to contact the
Mesos master. Hence, I'd assume that it's not related to the TaskManagers
not coming up.

Best,
Matthias

On Tue, Sep 28, 2021 at 2:45 PM Roman Khachatryan <ro...@apache.org> wrote:

> Hi,
>
> No additional ports need to be open as far as I know.
>
> Probably, $HOSTNAME is substituted for something not resolvable on TMs?
>
> Please also make sure that the following gets executed before
> mesos-appmaster.sh:
> export HADOOP_CLASSPATH=$(hadoop classpath)
> export MESOS_NATIVE_JAVA_LIBRARY=/path/to/lib/libmesos.so
> (as per the documentation you linked)
>
> Regards,
> Roman
>
> On Mon, Sep 27, 2021 at 7:38 PM Javier Vegas <jve...@strava.com> wrote:
> >
> > I am trying to start Flink 1.13.2 on Mesos following the instrucions in
> https://nightlies.apache.org/flink/flink-docs-release-1.13/docs/deployment/resource-providers/mesos/
> and using Marathon to deploy a Docker image with both the Flink and my
> binaries.
> >
> > My entrypoint for the Docker image is:
> >
> >
> > /opt/flink/bin/mesos-appmaster.sh \
> >
> >       -Djobmanager.rpc.address=$HOSTNAME \
> >
> >       -Dmesos.resourcemanager.framework.user=flink \
> >
> >       -Dmesos.master=10.0.18.246:5050 \
> >
> >       -Dmesos.resourcemanager.tasks.cpus=6
> >
> >
> >
> > When mesos-appmaster.sh starts, in the stderr I see this:
> >
> >
> > I0927 16:50:32.306691 801308 exec.cpp:164] Version: 1.7.3
> >
> > I0927 16:50:32.310277 801345 exec.cpp:238] Executor registered on agent
> f671d9ee-57f6-4f92-b1b2-3137676f6cdf-S6090
> >
> > I0927 16:50:32.311120 801355 executor.cpp:130] Registered docker
> executor on 10.0.20.177
> >
> > I0927 16:50:32.311394 801345 executor.cpp:186] Starting task
> tl_flink_prod.fb215c64-1fb2-11ec-9ce6-aaa2e9cb6ba0
> >
> > WARNING: Your kernel does not support swap limit capabilities or the
> cgroup is not mounted. Memory limited without swap.
> >
> > WARNING: An illegal reflective access operation has occurred
> >
> > WARNING: Illegal reflective access by
> org.apache.hadoop.security.authentication.util.KerberosUtil
> (file:/opt/flink/lib/flink-shaded-hadoop-2-uber-2.8.3-10.0.jar) to method
> sun.security.krb5.Config.getInstance()
> >
> > WARNING: Please consider reporting this to the maintainers of
> org.apache.hadoop.security.authentication.util.KerberosUtil
> >
> > WARNING: Use --illegal-access=warn to enable warnings of further illegal
> reflective access operations
> >
> > WARNING: All illegal access operations will be denied in a future release
> >
> > I0927 16:50:43.622053   237 sched.cpp:232] Version: 1.7.3
> >
> > I0927 16:50:43.624439   328 sched.cpp:336] New master detected at
> master@10.0.18.246:5050
> >
> > I0927 16:50:43.624779   328 sched.cpp:356] No credentials provided.
> Attempting to register without authentication
> >
> >
> > where the "New master detected" line is promising.
> >
> > However, on the Flink UI I see only the jobmanager started, and there
> are no task managers.  Getting into the Docker container, I see this in the
> log:
> >
> > WARN  org.apache.flink.mesos.scheduler.ConnectionMonitor  - Unable to
> connect to Mesos; still trying...
> >
> >
> > I have verified that from the container I can access the Mesos container
> 10.0.18.246:5050
> >
> >
> > Does any other port besides the web UI port 5050 need to be open for
> mesos-appmaster to connect with the Mesos master?
> >
> >
> > In the appmaster log (attached) I see one exception that I don't know if
> they are related to the Mesos connection problem, one is
> >
> >
> > java.io.FileNotFoundException: HADOOP_HOME and hadoop.home.dir are unset.
> >
> >         at
> org.apache.hadoop.util.Shell.checkHadoopHomeInner(Shell.java:448)
> >
> >         at org.apache.hadoop.util.Shell.checkHadoopHome(Shell.java:419)
> >
> >         at org.apache.hadoop.util.Shell.<clinit>(Shell.java:496)
> >
> >         at
> org.apache.hadoop.util.StringUtils.<clinit>(StringUtils.java:79)
> >
> >         at
> org.apache.hadoop.conf.Configuration.getBoolean(Configuration.java:1555)
> >
> >         at
> org.apache.hadoop.security.SecurityUtil.getLogSlowLookupsEnabled(SecurityUtil.java:497)
> >
> >         at
> org.apache.hadoop.security.SecurityUtil.<clinit>(SecurityUtil.java:90)
> >
> >         at
> org.apache.hadoop.security.UserGroupInformation.initialize(UserGroupInformation.java:289)
> >
> >         at
> org.apache.hadoop.security.UserGroupInformation.ensureInitialized(UserGroupInformation.java:277)
> >
> >         at
> org.apache.hadoop.security.UserGroupInformation.loginUserFromSubject(UserGroupInformation.java:833)
> >
> >         at
> org.apache.hadoop.security.UserGroupInformation.getLoginUser(UserGroupInformation.java:803)
> >
> >         at
> org.apache.hadoop.security.UserGroupInformation.getCurrentUser(UserGroupInformation.java:676)
> >
> >         at
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native
> Method)
> >
> >         at
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(Unknown
> Source)
> >
> >         at
> java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(Unknown
> Source)
> >
> >         at java.base/java.lang.reflect.Method.invoke(Unknown Source)
> >
> >         at
> org.apache.flink.runtime.util.EnvironmentInformation.getHadoopUser(EnvironmentInformation.java:215)
> >
> >         at
> org.apache.flink.runtime.util.EnvironmentInformation.logEnvironmentInfo(EnvironmentInformation.java:432)
> >
> >         at
> org.apache.flink.mesos.entrypoint.MesosSessionClusterEntrypoint.main(MesosSessionClusterEntrypoint.java:95)
> >
> >
> >
> >
> > I am not trying (yet) to run in high availability mode, so I am not sure
> if I need to have HADOOP_HOME set or not, but I don't see anything about
> HADOOP_HOME in the FLink docs.
> >
> >
> >
> > Any tips on how I can fix my Docker+Marathon+Mesos environment so Flink
> can connect to my Mesos master?
> >
> >
> > Thanks,
> >
> >
> > Javier Vegas
> >
> >

Reply via email to