Hello all, This is Bo, I met some problems when I tried to use flink in my mesos cluster (1 master, 2 slaves (cpu has 32 cores)). I tried to start the mesos-appmaster.sh in marathon, the job manager is started without problem.
mesos-appmaster.sh -Djobmanager.heap.mb=1024 -Dtaskmanager.heap.mb=1024 -Dtaskmanager.numberOfTaskSlots=32 My problem is the task managers are all located in one single slave. 1. (log1) The initial tasks in "/usr/local/flink/conf/flink-conf.yaml" is setted as "mesos.initial-tasks: 2" And also set the "mesos.constraints.hard.hostattribute: rack:ak09-27", which is the master node of mesos cluster. 2. (log2) I tried many ways to distribute the tasks to all the available slaves, and without any success. So I decide to try add a group_by operator which I referenced from https://mesosphere.github.io/marathon/docs/constraints.html "mesos.constraints.hard.hostattribute: rack:ak09-27,GROUP_BY:2" According to the log, flink keep waiting for more offers and the tasks never been launched. Sorry, I am a newbie to flink, also on mesos. Please reply if my problem is not clear, and I will be appreciate on any hint about how to distribute task evenly on available resources. Thank you in advance. Best regards, Bo
2017-10-02 10:51:28,023 INFO org.apache.flink.mesos.runtime.clusterframework.MesosJobManager - JobManager akka.tcp://flink@xxxxxxxxx103.tail_of_hostname:6123/user/jobmanager was granted leadership with leader session ID Some(661d24c7-xxxx-4d15-8583-efd2ee3c2e3b). 2017-10-02 10:51:28,032 INFO org.apache.flink.mesos.runtime.clusterframework.MesosJobManager - Delaying recovery of all jobs by 10000 milliseconds. 2017-10-02 10:51:28,041 INFO org.apache.flink.runtime.webmonitor.JobManagerRetriever - New leader reachable under akka.tcp://flink@xxxxxxxxx103.tail_of_hostname:6123/user/jobmanager:661d24c7-xxxx-4d15-8583-efd2ee3c2e3b. 2017-10-02 10:51:28,044 INFO org.apache.flink.mesos.runtime.clusterframework.MesosFlinkResourceManager - Trying to associate with JobManager leader akka.tcp://flink@xxxxxxxxx103.tail_of_hostname:6123/user/jobmanager 2017-10-02 10:51:28,051 INFO org.apache.flink.mesos.runtime.clusterframework.MesosFlinkResourceManager - Resource Manager associating with leading JobManager Actor[akka://flink/user/jobmanager#923003410] - leader session 661d24c7-xxxx-4d15-8583-efd2ee3c2e3b 2017-10-02 10:51:28,176 INFO org.apache.flink.mesos.runtime.clusterframework.MesosFlinkResourceManager - Scheduling Mesos task taskmanager-00001 with (1024.0 MB, 1.0 cpus). 2017-10-02 10:51:28,193 INFO org.apache.flink.mesos.runtime.clusterframework.MesosFlinkResourceManager - Scheduling Mesos task taskmanager-00002 with (1024.0 MB, 1.0 cpus). 2017-10-02 10:51:28,195 INFO org.apache.flink.mesos.scheduler.LaunchCoordinator - Now gathering offers for at least 2 task(s). 2017-10-02 10:51:28,207 INFO org.apache.flink.mesos.scheduler.LaunchCoordinator - Received offer(s) of 252958.0 MB, 61.25 cpus: 2017-10-02 10:51:28,208 INFO org.apache.flink.mesos.scheduler.LaunchCoordinator - 18f3d9d0-7982-491a-8397-0e799272fd2d-O1246 from xxxxxxxxx101.tail_of_hostname of 127375.0 MB, 31.5 cpus 2017-10-02 10:51:28,208 INFO org.apache.flink.mesos.scheduler.LaunchCoordinator - 18f3d9d0-7982-491a-8397-0e799272fd2d-O1247 from xxxxxxxxx103.tail_of_hostname of 125583.0 MB, 29.75 cpus 2017-10-02 10:51:29,225 INFO org.apache.flink.mesos.scheduler.LaunchCoordinator - Processing 2 task(s) against 2 new offer(s) plus outstanding offers. 2017-10-02 10:51:29,241 INFO org.apache.flink.mesos.scheduler.LaunchCoordinator - Resources considered: (note: expired offers not deducted from below) 2017-10-02 10:51:29,243 INFO org.apache.flink.mesos.scheduler.LaunchCoordinator - xxxxxxxxx103.tail_of_hostname has 125583.0 MB, 29.75 cpus 2017-10-02 10:51:29,243 INFO org.apache.flink.mesos.scheduler.LaunchCoordinator - xxxxxxxxx101.tail_of_hostname has 127375.0 MB, 31.5 cpus 2017-10-02 10:51:29,413 INFO org.apache.flink.mesos.scheduler.LaunchCoordinator - Launched 2 task(s) on xxxxxxxxx101.tail_of_hostname using 1 offer(s): 2017-10-02 10:51:29,413 INFO org.apache.flink.mesos.scheduler.LaunchCoordinator - 18f3d9d0-7982-491a-8397-0e799272fd2d-O1246 2017-10-02 10:51:29,414 INFO org.apache.flink.mesos.scheduler.LaunchCoordinator - No longer gathering offers; all requests fulfilled. 2017-10-02 10:51:29,414 INFO com.netflix.fenzo.TaskScheduler - Expiring all leases 2017-10-02 10:51:29,415 INFO org.apache.flink.mesos.scheduler.LaunchCoordinator - Declined offer 18f3d9d0-7982-491a-8397-0e799272fd2d-O1247 from xxxxxxxxx103.tail_of_hostname of 125583.0 MB, 29.75 cpus. 2017-10-02 10:51:29,518 INFO org.apache.flink.mesos.runtime.clusterframework.MesosFlinkResourceManager - Launching Mesos task taskmanager-00002 on host xxxxxxxxx101.tail_of_hostname. 2017-10-02 10:51:29,571 INFO org.apache.flink.mesos.runtime.clusterframework.MesosFlinkResourceManager - Launching Mesos task taskmanager-00001 on host xxxxxxxxx101.tail_of_hostname. 2017-10-02 10:51:32,487 INFO org.apache.flink.mesos.scheduler.TaskMonitor - Mesos task taskmanager-00002 is running. 2017-10-02 10:51:32,821 INFO org.apache.flink.mesos.scheduler.TaskMonitor - Mesos task taskmanager-00001 is running. 2017-10-02 10:51:35,118 INFO org.apache.flink.mesos.runtime.clusterframework.MesosFlinkResourceManager - TaskManager taskmanager-00002 has started. 2017-10-02 10:51:35,123 INFO org.apache.flink.runtime.instance.InstanceManager - Registered TaskManager at xxxxxxxxx101 (akka.tcp://flink@xxxxxxxxx101.tail_of_hostname:31002/user/taskmanager) as a8c1e687923c9649c27145797d1796c5. Current number of registered hosts is 1. Current number of alive task slots is 32. 2017-10-02 10:51:35,362 INFO org.apache.flink.mesos.runtime.clusterframework.MesosFlinkResourceManager - TaskManager taskmanager-00001 has started. 2017-10-02 10:51:35,362 INFO org.apache.flink.runtime.instance.InstanceManager - Registered TaskManager at xxxxxxxxx101 (akka.tcp://flink@xxxxxxxxx101.tail_of_hostname:31000/user/taskmanager) as 50111f20b91ef30e6bba43bf2058b6d4. Current number of registered hosts is 2. Current number of alive task slots is 64. 2017-10-02 10:51:38,045 INFO org.apache.flink.mesos.runtime.clusterframework.MesosJobManager - Attempting to recover all jobs. 2017-10-02 10:51:38,051 INFO org.apache.flink.mesos.runtime.clusterframework.MesosJobManager - There are no jobs to recover.
2017-10-02 10:33:02,006 INFO org.apache.flink.mesos.runtime.clusterframework.MesosJobManager - JobManager akka.tcp://flink@xxxxxxxxx101.tail_of_hostname:6123/user/jobmanager was granted leadership with leader session ID Some(d862154d-xxxx-48e7-876d-90c709a34f9c). 2017-10-02 10:33:02,015 INFO org.apache.flink.mesos.runtime.clusterframework.MesosJobManager - Delaying recovery of all jobs by 10000 milliseconds. 2017-10-02 10:33:02,018 INFO org.apache.flink.runtime.webmonitor.JobManagerRetriever - New leader reachable under akka.tcp://flink@xxxxxxxxx101.tail_of_hostname:6123/user/jobmanager:d862154d-xxxx-48e7-876d-90c709a34f9c. 2017-10-02 10:33:02,029 INFO org.apache.flink.mesos.runtime.clusterframework.MesosFlinkResourceManager - Trying to associate with JobManager leader akka.tcp://flink@xxxxxxxxx101.tail_of_hostname:6123/user/jobmanager 2017-10-02 10:33:02,034 INFO org.apache.flink.mesos.runtime.clusterframework.MesosFlinkResourceManager - Resource Manager associating with leading JobManager Actor[akka://flink/user/jobmanager#-698489119] - leader session d862154d-xxxx-48e7-876d-90c709a34f9c 2017-10-02 10:33:02,102 INFO org.apache.flink.mesos.runtime.clusterframework.MesosFlinkResourceManager - Scheduling Mesos task taskmanager-00001 with (1024.0 MB, 1.0 cpus). 2017-10-02 10:33:02,119 INFO org.apache.flink.mesos.runtime.clusterframework.MesosFlinkResourceManager - Scheduling Mesos task taskmanager-00002 with (1024.0 MB, 1.0 cpus). 2017-10-02 10:33:02,121 INFO org.apache.flink.mesos.scheduler.LaunchCoordinator - Now gathering offers for at least 2 task(s). 2017-10-02 10:33:02,133 INFO org.apache.flink.mesos.scheduler.LaunchCoordinator - Received offer(s) of 252958.0 MB, 61.25 cpus: 2017-10-02 10:33:02,134 INFO org.apache.flink.mesos.scheduler.LaunchCoordinator - 18f3d9d0-7982-491a-8397-0e799272fd2d-O1224 from xxxxxxxxx101.tail_of_hostname of 126351.0 MB, 30.5 cpus 2017-10-02 10:33:02,134 INFO org.apache.flink.mesos.scheduler.LaunchCoordinator - 18f3d9d0-7982-491a-8397-0e799272fd2d-O1225 from xxxxxxxxx103.tail_of_hostname of 126607.0 MB, 30.75 cpus 2017-10-02 10:33:03,149 INFO org.apache.flink.mesos.scheduler.LaunchCoordinator - Processing 2 task(s) against 2 new offer(s) plus outstanding offers. 2017-10-02 10:33:03,311 INFO org.apache.flink.mesos.scheduler.LaunchCoordinator - Resources considered: (note: expired offers not deducted from below) 2017-10-02 10:33:03,314 INFO org.apache.flink.mesos.scheduler.LaunchCoordinator - xxxxxxxxx103.tail_of_hostname has 126607.0 MB, 30.75 cpus 2017-10-02 10:33:03,314 INFO org.apache.flink.mesos.scheduler.LaunchCoordinator - xxxxxxxxx101.tail_of_hostname has 126351.0 MB, 30.5 cpus 2017-10-02 10:33:03,317 INFO org.apache.flink.mesos.scheduler.LaunchCoordinator - Waiting for more offers; 2 task(s) are not yet launched. 2017-10-02 10:33:08,328 INFO org.apache.flink.mesos.scheduler.LaunchCoordinator - Processing 2 task(s) against 0 new offer(s) plus outstanding offers. 2017-10-02 10:33:08,331 INFO org.apache.flink.mesos.scheduler.LaunchCoordinator - Resources considered: (note: expired offers not deducted from below) 2017-10-02 10:33:08,331 INFO org.apache.flink.mesos.scheduler.LaunchCoordinator - xxxxxxxxx103.tail_of_hostname has 126607.0 MB, 30.75 cpus 2017-10-02 10:33:08,331 INFO org.apache.flink.mesos.scheduler.LaunchCoordinator - xxxxxxxxx101.tail_of_hostname has 126351.0 MB, 30.5 cpus 2017-10-02 10:33:08,332 INFO org.apache.flink.mesos.scheduler.LaunchCoordinator - Waiting for more offers; 2 task(s) are not yet launched. 2017-10-02 10:33:12,030 INFO org.apache.flink.mesos.runtime.clusterframework.MesosJobManager - Attempting to recover all jobs. 2017-10-02 10:33:12,033 INFO org.apache.flink.mesos.runtime.clusterframework.MesosJobManager - There are no jobs to recover. 2017-10-02 10:33:13,348 INFO org.apache.flink.mesos.scheduler.LaunchCoordinator - Processing 2 task(s) against 0 new offer(s) plus outstanding offers. 2017-10-02 10:33:13,350 INFO org.apache.flink.mesos.scheduler.LaunchCoordinator - Resources considered: (note: expired offers not deducted from below) 2017-10-02 10:33:13,351 INFO org.apache.flink.mesos.scheduler.LaunchCoordinator - xxxxxxxxx103.tail_of_hostname has 126607.0 MB, 30.75 cpus 2017-10-02 10:33:13,351 INFO org.apache.flink.mesos.scheduler.LaunchCoordinator - xxxxxxxxx101.tail_of_hostname has 126351.0 MB, 30.5 cpus 2017-10-02 10:33:13,351 INFO org.apache.flink.mesos.scheduler.LaunchCoordinator - Waiting for more offers; 2 task(s) are not yet launched. 2017-10-02 10:33:18,368 INFO org.apache.flink.mesos.scheduler.LaunchCoordinator - Processing 2 task(s) against 0 new offer(s) plus outstanding offers. 2017-10-02 10:33:18,370 INFO org.apache.flink.mesos.scheduler.LaunchCoordinator - Resources considered: (note: expired offers not deducted from below) 2017-10-02 10:33:18,370 INFO org.apache.flink.mesos.scheduler.LaunchCoordinator - xxxxxxxxx103.tail_of_hostname has 126607.0 MB, 30.75 cpus 2017-10-02 10:33:18,371 INFO org.apache.flink.mesos.scheduler.LaunchCoordinator - xxxxxxxxx101.tail_of_hostname has 126351.0 MB, 30.5 cpus 2017-10-02 10:33:18,371 INFO org.apache.flink.mesos.scheduler.LaunchCoordinator - Waiting for more offers; 2 task(s) are not yet launched. 2017-10-02 10:33:23,388 INFO org.apache.flink.mesos.scheduler.LaunchCoordinator - Processing 2 task(s) against 0 new offer(s) plus outstanding offers. 2017-10-02 10:33:23,390 INFO org.apache.flink.mesos.scheduler.LaunchCoordinator - Resources considered: (note: expired offers not deducted from below) 2017-10-02 10:33:23,390 INFO org.apache.flink.mesos.scheduler.LaunchCoordinator - xxxxxxxxx103.tail_of_hostname has 126607.0 MB, 30.75 cpus 2017-10-02 10:33:23,391 INFO org.apache.flink.mesos.scheduler.LaunchCoordinator - xxxxxxxxx101.tail_of_hostname has 126351.0 MB, 30.5 cpus 2017-10-02 10:33:23,391 INFO org.apache.flink.mesos.scheduler.LaunchCoordinator - Waiting for more offers; 2 task(s) are not yet launched. 2017-10-02 10:33:28,408 INFO org.apache.flink.mesos.scheduler.LaunchCoordinator - Processing 2 task(s) against 0 new offer(s) plus outstanding offers. 2017-10-02 10:33:28,410 INFO org.apache.flink.mesos.scheduler.LaunchCoordinator - Resources considered: (note: expired offers not deducted from below) 2017-10-02 10:33:28,410 INFO org.apache.flink.mesos.scheduler.LaunchCoordinator - xxxxxxxxx103.tail_of_hostname has 126607.0 MB, 30.75 cpus 2017-10-02 10:33:28,411 INFO org.apache.flink.mesos.scheduler.LaunchCoordinator - xxxxxxxxx101.tail_of_hostname has 126351.0 MB, 30.5 cpus 2017-10-02 10:33:28,411 INFO org.apache.flink.mesos.scheduler.LaunchCoordinator - Waiting for more offers; 2 task(s) are not yet launched. 2017-10-02 10:33:33,428 INFO org.apache.flink.mesos.scheduler.LaunchCoordinator - Processing 2 task(s) against 0 new offer(s) plus outstanding offers. 2017-10-02 10:33:33,430 INFO org.apache.flink.mesos.scheduler.LaunchCoordinator - Resources considered: (note: expired offers not deducted from below) 2017-10-02 10:33:33,430 INFO org.apache.flink.mesos.scheduler.LaunchCoordinator - xxxxxxxxx103.tail_of_hostname has 126607.0 MB, 30.75 cpus 2017-10-02 10:33:33,431 INFO org.apache.flink.mesos.scheduler.LaunchCoordinator - xxxxxxxxx101.tail_of_hostname has 126351.0 MB, 30.5 cpus 2017-10-02 10:33:33,432 INFO org.apache.flink.mesos.scheduler.LaunchCoordinator - Waiting for more offers; 2 task(s) are not yet launched. 2017-10-02 10:33:38,448 INFO org.apache.flink.mesos.scheduler.LaunchCoordinator - Processing 2 task(s) against 0 new offer(s) plus outstanding offers. 2017-10-02 10:33:38,450 INFO org.apache.flink.mesos.scheduler.LaunchCoordinator - Resources considered: (note: expired offers not deducted from below) 2017-10-02 10:33:38,451 INFO org.apache.flink.mesos.scheduler.LaunchCoordinator - xxxxxxxxx103.tail_of_hostname has 126607.0 MB, 30.75 cpus 2017-10-02 10:33:38,451 INFO org.apache.flink.mesos.scheduler.LaunchCoordinator - xxxxxxxxx101.tail_of_hostname has 126351.0 MB, 30.5 cpus 2017-10-02 10:33:38,452 INFO org.apache.flink.mesos.scheduler.LaunchCoordinator - Waiting for more offers; 2 task(s) are not yet launched. 2017-10-02 10:33:43,468 INFO org.apache.flink.mesos.scheduler.LaunchCoordinator - Processing 2 task(s) against 0 new offer(s) plus outstanding offers. 2017-10-02 10:33:43,470 INFO com.netflix.fenzo.TaskScheduler - Purging inactive VMs 2017-10-02 10:33:43,471 INFO org.apache.flink.mesos.scheduler.LaunchCoordinator - Resources considered: (note: expired offers not deducted from below) 2017-10-02 10:33:43,471 INFO org.apache.flink.mesos.scheduler.LaunchCoordinator - xxxxxxxxx-slave103.tail_of_hostname has 126607.0 MB, 30.75 cpus 2017-10-02 10:33:43,471 INFO org.apache.flink.mesos.scheduler.LaunchCoordinator - xxxxxxxxx-slave101.tail_of_hostname has 126351.0 MB, 30.5 cpus 2017-10-02 10:33:43,471 INFO org.apache.flink.mesos.scheduler.LaunchCoordinator - Waiting for more offers; 2 task(s) are not yet launched. 2017-10-02 10:33:48,488 INFO org.apache.flink.mesos.scheduler.LaunchCoordinator - Processing 2 task(s) against 0 new offer(s) plus outstanding offers. 2017-10-02 10:33:48,491 INFO org.apache.flink.mesos.scheduler.LaunchCoordinator - Resources considered: (note: expired offers not deducted from below) 2017-10-02 10:33:48,491 INFO org.apache.flink.mesos.scheduler.LaunchCoordinator - xxxxxxxxx103.tail_of_hostname has 126607.0 MB, 30.75 cpus 2017-10-02 10:33:48,491 INFO org.apache.flink.mesos.scheduler.LaunchCoordinator - xxxxxxxxx101.tail_of_hostname has 126351.0 MB, 30.5 cpus 2017-10-02 10:33:48,492 INFO org.apache.flink.mesos.scheduler.LaunchCoordinator - Waiting for more offers; 2 task(s) are not yet launched. 2017-10-02 10:33:53,508 INFO org.apache.flink.mesos.scheduler.LaunchCoordinator - Processing 2 task(s) against 0 new offer(s) plus outstanding offers. 2017-10-02 10:33:53,511 INFO org.apache.flink.mesos.scheduler.LaunchCoordinator - Resources considered: (note: expired offers not deducted from below) 2017-10-02 10:33:53,511 INFO org.apache.flink.mesos.scheduler.LaunchCoordinator - xxxxxxxxx103.tail_of_hostname has 126607.0 MB, 30.75 cpus 2017-10-02 10:33:53,511 INFO org.apache.flink.mesos.scheduler.LaunchCoordinator - xxxxxxxxx101.tail_of_hostname has 126351.0 MB, 30.5 cpus 2017-10-02 10:33:53,512 INFO org.apache.flink.mesos.scheduler.LaunchCoordinator - Waiting for more offers; 2 task(s) are not yet launched. 2017-10-02 10:33:58,528 INFO org.apache.flink.mesos.scheduler.LaunchCoordinator - Processing 2 task(s) against 0 new offer(s) plus outstanding offers. 2017-10-02 10:33:58,530 INFO org.apache.flink.mesos.scheduler.LaunchCoordinator - Resources considered: (note: expired offers not deducted from below) 2017-10-02 10:33:58,531 INFO org.apache.flink.mesos.scheduler.LaunchCoordinator - xxxxxxxxx103.tail_of_hostname has 126607.0 MB, 30.75 cpus 2017-10-02 10:33:58,531 INFO org.apache.flink.mesos.scheduler.LaunchCoordinator - xxxxxxxxx101.tail_of_hostname has 126351.0 MB, 30.5 cpus 2017-10-02 10:33:58,531 INFO org.apache.flink.mesos.scheduler.LaunchCoordinator - Waiting for more offers; 2 task(s) are not yet launched. 2017-10-02 10:34:03,548 INFO org.apache.flink.mesos.scheduler.LaunchCoordinator - Processing 2 task(s) against 0 new offer(s) plus outstanding offers. 2017-10-02 10:34:03,550 INFO org.apache.flink.mesos.scheduler.LaunchCoordinator - Resources considered: (note: expired offers not deducted from below) 2017-10-02 10:34:03,551 INFO org.apache.flink.mesos.scheduler.LaunchCoordinator - xxxxxxxxx103.tail_of_hostname has 126607.0 MB, 30.75 cpus 2017-10-02 10:34:03,551 INFO org.apache.flink.mesos.scheduler.LaunchCoordinator - xxxxxxxxx101.tail_of_hostname has 126351.0 MB, 30.5 cpus 2017-10-02 10:34:03,552 INFO org.apache.flink.mesos.scheduler.LaunchCoordinator - Waiting for more offers; 2 task(s) are not yet launched. 2017-10-02 10:34:08,568 INFO org.apache.flink.mesos.scheduler.LaunchCoordinator - Processing 2 task(s) against 0 new offer(s) plus outstanding offers. 2017-10-02 10:34:08,570 INFO org.apache.flink.mesos.scheduler.LaunchCoordinator - Resources considered: (note: expired offers not deducted from below) 2017-10-02 10:34:08,570 INFO org.apache.flink.mesos.scheduler.LaunchCoordinator - xxxxxxxxx103.tail_of_hostname has 126607.0 MB, 30.75 cpus 2017-10-02 10:34:08,571 INFO org.apache.flink.mesos.scheduler.LaunchCoordinator - xxxxxxxxx101.tail_of_hostname has 126351.0 MB, 30.5 cpus 2017-10-02 10:34:08,571 INFO org.apache.flink.mesos.scheduler.LaunchCoordinator - Waiting for more offers; 2 task(s) are not yet launched.
part_of_flink-conf.yaml
Description: Binary data