Re: EXT :Re: Jar Uploads in High Availability (Flink 1.7.2)

Till Rohrmann Fri, 18 Oct 2019 06:43:57 -0700

Hi Martin,

Flink's web UI based job submission is not well suited to be run behind a
load balancer at the moment. The problem is that the web based job
submission is actually a two phase operation: Uploading the jars and then
starting the job. Since Flink's RestServer stores the uploaded files
locally, it is required that the web submission is executed on the same
RestServer to which you also uploaded the files before. Note, however, that
the cli client job submission is not affected by this since the job graph
upload and submission is one request.


A workaround to make the uploads accessible to all RestServers is to
configure a DFS for the `web.upload.dir` as Ravi suggested or to use
Flink's CLI to submit jobs instead.

A quick note about the old behaviour with the redirects. The redirects
actually defied the purpose of load balancers because all requests were
redirected to a single RestServer instance. Hence, running it with or w/o
load balancer should not have made a big difference.

Cheers,
Till

On Wed, Oct 16, 2019 at 5:58 PM Martin, Nick J [US] (IS) <
nick.mar...@ngc.com> wrote:

> Yeah, I’ll do that if I have to. I’m hoping there’s a ‘right’ way to do it
> that’s easier. If I have to implement the zookeeper lookups in my load
> balancer myself, that feels like a definite step backwards from the pre-1.5
> days when the cluster would give 307 redirects to the current leader
>
>
>
> *From:* Ravi Bhushan Ratnakar [mailto:ravibhushanratna...@gmail.com]
> *Sent:* Tuesday, October 15, 2019 10:35 PM
> *To:* Martin, Nick J [US] (IS) <nick.mar...@ngc.com>
> *Cc:* user <user@flink.apache.org>
> *Subject:* EXT :Re: Jar Uploads in High Availability (Flink 1.7.2)
>
>
>
> Hi,
>
>
>
> i was also experiencing with the similar behavior. I adopted following
> approach
>
>    -  used a distributed file system(in my case aws efs) and set the
>    attribute "web.upload.dir", this way both the job manager have same
>    location.
>    - on the load balancer side(aws elb), i used "readiness probe" based
>    on zookeeper entry for active jobmanager address, this way elb always point
>    to the active job manager and if the active jobmanager changes then it
>    automatically point to the new active jobmanager and as both are using the
>    same location by configuring distributed file system so new active job is
>    able to find the same jar.
>
>
>
> Regards,
>
> Ravi
>
>
>
> On Wed, Oct 16, 2019 at 1:15 AM Martin, Nick J [US] (IS) <
> nick.mar...@ngc.com> wrote:
>
> I’m seeing that when I upload a jar through the rest API, it looks like
> only the Jobmanager that received the upload request is aware of the newly
> uploaded jar. That worked fine for me in older versions where all clients
> were redirected to connect to the leader, but now that each Jobmanager
> accepts requests, if I send a jar upload request, it could end up on any
> one (and only one) of the Jobmanagers, not necessarily the leader. Further,
> each Jobmanager responds to a GET request on the /jars endpoint with its
> own local list of jars. If I try and use one of the Jar IDs from that
> request, my next request may not go to the same Jobmanager (requests are
> going through Docker and being load-balanced), and so the Jar ID isn’t
> found on the new Jobmanager handling that request.
>
>
>
>
>
>
>
>
>
>

Re: EXT :Re: Jar Uploads in High Availability (Flink 1.7.2)

Reply via email to