Following 
https://github.com/apache/flink/tree/release-1.7/flink-container/docker 
<https://github.com/apache/flink/tree/release-1.7/flink-container/docker>
I have created an entry point, which looks like follows:
#!/bin/sh

################################################################################
#   from 
https://github.com/apache/flink/blob/release-1.7/flink-container/docker/docker-entrypoint.sh
#   and 
https://github.com/docker-flink/docker-flink/blob/63b19a904fa8bfd1322f1d59fdb226c82b9186c7/1.7/scala_2.11-alpine/docker-entrypoint.sh
################################################################################

# If unspecified, the hostname of the container is taken as the JobManager 
address
JOB_MANAGER_RPC_ADDRESS=${JOB_MANAGER_RPC_ADDRESS:-$(hostname -f)}

drop_privs_cmd() {
    if [ $(id -u) != 0 ]; then
        # Don't need to drop privs if EUID != 0
        return
    elif [ -x /sbin/su-exec ]; then
        # Alpine
        echo su-exec flink
    else
        # Others
        echo gosu flink
    fi
}

JOB_MANAGER="jobmanager"
TASK_MANAGER="taskmanager"

CMD="$1"
shift

if [ "${CMD}" = "help" ]; then
    echo "Usage: $(basename $0) (${JOB_MANAGER}|${TASK_MANAGER}|help)"
    exit 0
elif [ "${CMD}" = "${JOB_MANAGER}" -o "${CMD}" = "${TASK_MANAGER}" ]; then
    if [ "${CMD}" = "${TASK_MANAGER}" ]; then
        
TASK_MANAGER_NUMBER_OF_TASK_SLOTS=${TASK_MANAGER_NUMBER_OF_TASK_SLOTS:-$(grep 
-c ^processor /proc/cpuinfo)}

        sed -i -e "s/jobmanager.rpc.address: localhost/jobmanager.rpc.address: 
${JOB_MANAGER_RPC_ADDRESS}/g" "$FLINK_HOME/conf/flink-conf.yaml"
        sed -i -e "s/taskmanager.numberOfTaskSlots: 
1/taskmanager.numberOfTaskSlots: $TASK_MANAGER_NUMBER_OF_TASK_SLOTS/g" 
"$FLINK_HOME/conf/flink-conf.yaml"
        echo "blob.server.port: 6124" >> "$FLINK_HOME/conf/flink-conf.yaml"
        echo "query.server.port: 6125" >> "$FLINK_HOME/conf/flink-conf.yaml"

        echo "Starting Task Manager"
        echo "config file: " && grep '^[^\n#]' 
"$FLINK_HOME/conf/flink-conf.yaml"
        exec $(drop_privs_cmd) "$FLINK_HOME/bin/taskmanager.sh" start-foreground
    else
        sed -i -e "s/jobmanager.rpc.address: localhost/jobmanager.rpc.address: 
${JOB_MANAGER_RPC_ADDRESS}/g" "$FLINK_HOME/conf/flink-conf.yaml"
        echo "blob.server.port: 6124" >> "$FLINK_HOME/conf/flink-conf.yaml"
        echo "query.server.port: 6125" >> "$FLINK_HOME/conf/flink-conf.yaml"
        echo "config file: " && grep '^[^\n#]' 
"$FLINK_HOME/conf/flink-conf.yaml"

        if [ -z "$1" ]; then
           exec $(drop_privs_cmd) "$FLINK_HOME/bin/jobmanager.sh" 
start-foreground "$@"
        else
            exec $FLINK_HOME/bin/standalone-job.sh start-foreground "$@"
        fi
    fi
fi

exec "$@"
It does work for all the cases, except running standalone job.
The problem, the way I understand it, is a racing condition.
In kubernetes it takes several attempts for establish connection between Job 
and Task manager, while standalone-job.sh
 tries to start a job immediately once the cluster is created (before 
connection is established).
Is there a better option to implement it starting a job on container startup?
 

Reply via email to