Re: EXT :Re: Jar Uploads in High Availability (Flink 1.7.2)

Zili Chen Mon, 21 Oct 2019 01:04:41 -0700

FYI there is already a corresponding issue
https://issues.apache.org/jira/browse/FLINK-13660


Best,
tison.


Till Rohrmann <trohrm...@apache.org> 于2019年10月18日周五 下午9:42写道：

> Hi Martin,
>
> Flink's web UI based job submission is not well suited to be run behind a
> load balancer at the moment. The problem is that the web based job
> submission is actually a two phase operation: Uploading the jars and then
> starting the job. Since Flink's RestServer stores the uploaded files
> locally, it is required that the web submission is executed on the same
> RestServer to which you also uploaded the files before. Note, however, that
> the cli client job submission is not affected by this since the job graph
> upload and submission is one request.
>
> A workaround to make the uploads accessible to all RestServers is to
> configure a DFS for the `web.upload.dir` as Ravi suggested or to use
> Flink's CLI to submit jobs instead.
>
> A quick note about the old behaviour with the redirects. The redirects
> actually defied the purpose of load balancers because all requests were
> redirected to a single RestServer instance. Hence, running it with or w/o
> load balancer should not have made a big difference.
>
> Cheers,
> Till
>
> On Wed, Oct 16, 2019 at 5:58 PM Martin, Nick J [US] (IS) <
> nick.mar...@ngc.com> wrote:
>
>> Yeah, I’ll do that if I have to. I’m hoping there’s a ‘right’ way to do
>> it that’s easier. If I have to implement the zookeeper lookups in my load
>> balancer myself, that feels like a definite step backwards from the pre-1.5
>> days when the cluster would give 307 redirects to the current leader
>>
>>
>>
>> *From:* Ravi Bhushan Ratnakar [mailto:ravibhushanratna...@gmail.com]
>> *Sent:* Tuesday, October 15, 2019 10:35 PM
>> *To:* Martin, Nick J [US] (IS) <nick.mar...@ngc.com>
>> *Cc:* user <user@flink.apache.org>
>> *Subject:* EXT :Re: Jar Uploads in High Availability (Flink 1.7.2)
>>
>>
>>
>> Hi,
>>
>>
>>
>> i was also experiencing with the similar behavior. I adopted following
>> approach
>>
>>    -  used a distributed file system(in my case aws efs) and set the
>>    attribute "web.upload.dir", this way both the job manager have same
>>    location.
>>    - on the load balancer side(aws elb), i used "readiness probe" based
>>    on zookeeper entry for active jobmanager address, this way elb always 
>> point
>>    to the active job manager and if the active jobmanager changes then it
>>    automatically point to the new active jobmanager and as both are using the
>>    same location by configuring distributed file system so new active job is
>>    able to find the same jar.
>>
>>
>>
>> Regards,
>>
>> Ravi
>>
>>
>>
>> On Wed, Oct 16, 2019 at 1:15 AM Martin, Nick J [US] (IS) <
>> nick.mar...@ngc.com> wrote:
>>
>> I’m seeing that when I upload a jar through the rest API, it looks like
>> only the Jobmanager that received the upload request is aware of the newly
>> uploaded jar. That worked fine for me in older versions where all clients
>> were redirected to connect to the leader, but now that each Jobmanager
>> accepts requests, if I send a jar upload request, it could end up on any
>> one (and only one) of the Jobmanagers, not necessarily the leader. Further,
>> each Jobmanager responds to a GET request on the /jars endpoint with its
>> own local list of jars. If I try and use one of the Jar IDs from that
>> request, my next request may not go to the same Jobmanager (requests are
>> going through Docker and being load-balanced), and so the Jar ID isn’t
>> found on the new Jobmanager handling that request.
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>

Re: EXT :Re: Jar Uploads in High Availability (Flink 1.7.2)

Reply via email to