Hi Martin, Flink's web UI based job submission is not well suited to be run behind a load balancer at the moment. The problem is that the web based job submission is actually a two phase operation: Uploading the jars and then starting the job. Since Flink's RestServer stores the uploaded files locally, it is required that the web submission is executed on the same RestServer to which you also uploaded the files before. Note, however, that the cli client job submission is not affected by this since the job graph upload and submission is one request.
A workaround to make the uploads accessible to all RestServers is to configure a DFS for the `web.upload.dir` as Ravi suggested or to use Flink's CLI to submit jobs instead. A quick note about the old behaviour with the redirects. The redirects actually defied the purpose of load balancers because all requests were redirected to a single RestServer instance. Hence, running it with or w/o load balancer should not have made a big difference. Cheers, Till On Wed, Oct 16, 2019 at 5:58 PM Martin, Nick J [US] (IS) < nick.mar...@ngc.com> wrote: > Yeah, I’ll do that if I have to. I’m hoping there’s a ‘right’ way to do it > that’s easier. If I have to implement the zookeeper lookups in my load > balancer myself, that feels like a definite step backwards from the pre-1.5 > days when the cluster would give 307 redirects to the current leader > > > > *From:* Ravi Bhushan Ratnakar [mailto:ravibhushanratna...@gmail.com] > *Sent:* Tuesday, October 15, 2019 10:35 PM > *To:* Martin, Nick J [US] (IS) <nick.mar...@ngc.com> > *Cc:* user <user@flink.apache.org> > *Subject:* EXT :Re: Jar Uploads in High Availability (Flink 1.7.2) > > > > Hi, > > > > i was also experiencing with the similar behavior. I adopted following > approach > > - used a distributed file system(in my case aws efs) and set the > attribute "web.upload.dir", this way both the job manager have same > location. > - on the load balancer side(aws elb), i used "readiness probe" based > on zookeeper entry for active jobmanager address, this way elb always point > to the active job manager and if the active jobmanager changes then it > automatically point to the new active jobmanager and as both are using the > same location by configuring distributed file system so new active job is > able to find the same jar. > > > > Regards, > > Ravi > > > > On Wed, Oct 16, 2019 at 1:15 AM Martin, Nick J [US] (IS) < > nick.mar...@ngc.com> wrote: > > I’m seeing that when I upload a jar through the rest API, it looks like > only the Jobmanager that received the upload request is aware of the newly > uploaded jar. That worked fine for me in older versions where all clients > were redirected to connect to the leader, but now that each Jobmanager > accepts requests, if I send a jar upload request, it could end up on any > one (and only one) of the Jobmanagers, not necessarily the leader. Further, > each Jobmanager responds to a GET request on the /jars endpoint with its > own local list of jars. If I try and use one of the Jar IDs from that > request, my next request may not go to the same Jobmanager (requests are > going through Docker and being load-balanced), and so the Jar ID isn’t > found on the new Jobmanager handling that request. > > > > > > > > > >