I am trying to start Flink 1.13.2 on Mesos following the instrucions in
https://nightlies.apache.org/flink/flink-docs-release-1.13/docs/deployment/resource-providers/mesos/
and using Marathon to deploy a Docker image with both the Flink and my
binaries.

My entrypoint for the Docker image is:


/opt/flink/bin/mesos-appmaster.sh \

      -Djobmanager.rpc.address=$HOSTNAME \

      -Dmesos.resourcemanager.framework.user=flink \

      -Dmesos.master=10.0.18.246:5050 \

      -Dmesos.resourcemanager.tasks.cpus=6



When mesos-appmaster.sh starts, in the stderr I see this:


I0927 16:50:32.306691 801308 exec.cpp:164] Version: 1.7.3

I0927 16:50:32.310277 801345 exec.cpp:238] Executor registered on agent
f671d9ee-57f6-4f92-b1b2-3137676f6cdf-S6090

I0927 16:50:32.311120 801355 executor.cpp:130] Registered docker executor
on 10.0.20.177

I0927 16:50:32.311394 801345 executor.cpp:186] Starting task
tl_flink_prod.fb215c64-1fb2-11ec-9ce6-aaa2e9cb6ba0

WARNING: Your kernel does not support swap limit capabilities or the cgroup
is not mounted. Memory limited without swap.

WARNING: An illegal reflective access operation has occurred

WARNING: Illegal reflective access by
org.apache.hadoop.security.authentication.util.KerberosUtil
(file:/opt/flink/lib/flink-shaded-hadoop-2-uber-2.8.3-10.0.jar) to method
sun.security.krb5.Config.getInstance()

WARNING: Please consider reporting this to the maintainers of
org.apache.hadoop.security.authentication.util.KerberosUtil

WARNING: Use --illegal-access=warn to enable warnings of further illegal
reflective access operations

WARNING: All illegal access operations will be denied in a future release

I0927 16:50:43.622053   237 sched.cpp:232] Version: 1.7.3

I0927 16:50:43.624439   328 sched.cpp:336] New master detected at
master@10.0.18.246:5050

I0927 16:50:43.624779   328 sched.cpp:356] No credentials provided.
Attempting to register without authentication


where the "New master detected" line is promising.

However, on the Flink UI I see only the jobmanager started, and there are
no task managers.  Getting into the Docker container, I see this in the log:

WARN  org.apache.flink.mesos.scheduler.ConnectionMonitor  - Unable to
connect to Mesos; still trying...


I have verified that from the container I can access the Mesos container
10.0.18.246:5050


Does any other port besides the web UI port 5050 need to be open for
mesos-appmaster to connect with the Mesos master?


In the appmaster log (attached) I see one exception that I don't know if
they are related to the Mesos connection problem, one is


java.io.FileNotFoundException: HADOOP_HOME and hadoop.home.dir are unset.

        at org.apache.hadoop.util.Shell.checkHadoopHomeInner(Shell.java:448)

        at org.apache.hadoop.util.Shell.checkHadoopHome(Shell.java:419)

        at org.apache.hadoop.util.Shell.<clinit>(Shell.java:496)

        at org.apache.hadoop.util.StringUtils.<clinit>(StringUtils.java:79)

        at
org.apache.hadoop.conf.Configuration.getBoolean(Configuration.java:1555)

        at
org.apache.hadoop.security.SecurityUtil.getLogSlowLookupsEnabled(SecurityUtil.java:497)

        at
org.apache.hadoop.security.SecurityUtil.<clinit>(SecurityUtil.java:90)

        at
org.apache.hadoop.security.UserGroupInformation.initialize(UserGroupInformation.java:289)

        at
org.apache.hadoop.security.UserGroupInformation.ensureInitialized(UserGroupInformation.java:277)

        at
org.apache.hadoop.security.UserGroupInformation.loginUserFromSubject(UserGroupInformation.java:833)

        at
org.apache.hadoop.security.UserGroupInformation.getLoginUser(UserGroupInformation.java:803)

        at
org.apache.hadoop.security.UserGroupInformation.getCurrentUser(UserGroupInformation.java:676)

        at
java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native
Method)

        at
java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(Unknown
Source)

        at
java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(Unknown
Source)

        at java.base/java.lang.reflect.Method.invoke(Unknown Source)

        at
org.apache.flink.runtime.util.EnvironmentInformation.getHadoopUser(EnvironmentInformation.java:215)

        at
org.apache.flink.runtime.util.EnvironmentInformation.logEnvironmentInfo(EnvironmentInformation.java:432)

        at
org.apache.flink.mesos.entrypoint.MesosSessionClusterEntrypoint.main(MesosSessionClusterEntrypoint.java:95)




I am not trying (yet) to run in high availability mode, so I am not sure if
I need to have HADOOP_HOME set or not, but I don't see anything about
HADOOP_HOME in the FLink docs.



Any tips on how I can fix my Docker+Marathon+Mesos environment so Flink can
connect to my Mesos master?


Thanks,


Javier Vegas

Attachment: flink--mesos-appmaster-6c49aa87e1d4.log
Description: Binary data

Reply via email to