Thank you, answers my questions. -- Regards, Juha
On Wed, Aug 18, 2021 at 2:28 PM Chesnay Schepler <ches...@apache.org> wrote: > You've pretty much answered the question yourself. *thumbs up* > > For the vast majority of cases you can call any JobManager. > The exceptions are jar operations (because they are persisted in the > JM-local filesystem, and other JMs don't know about them) and triggering > savepoints (because metadata for on-going savepoint operations (i.e., the > information returned when querying the savepoint operation status) is also > kept locally in the JM). > > This does indeed imply that on JM failover all this information is lost. > > There are ideas to solve is, but no concrete timeline. See > https://issues.apache.org/jira/browse/FLINK-18312 > > On 18/08/2021 11:54, Juha Mynttinen wrote: > > I have questions related to REST API in the case of ZooKeeper HA and a > standalone cluster. But I think the questions apply to other setups too > such as YARN. > > Let's assume a standalone cluster with multiple JobManagers. The > JobManagers elect the leader among themselves and register that to > ZooKeeper. When using the Flink command line, AFAIK the code will go to > ZooKeeper to find the host and port of the leading JobManager and send HTTP > requests there. > > My question is: when accessing the REST API directly (e.g. curl) does one > need to call the leading JobManager or will any up and running JobManager > do? And if the leader needs to be called, why is it so? > > Behind the scenes the REST API will connect to the leading "JobManager" > over RPC, making it irrelevant which JobManager receives the HTTP request. > > By experimenting, I found the Web UI works fine if all the JobManagers are > behind a load balancer and leading and standby JobManagers are called. The > only issue I found was that when a jar is submitted (/jars/upload), it is > stored on the local disk of the JobManager that happens to handle that > request. As a consequence, creating a job from that jar only succeeds if > the HTTP request hits the JobManager that has the file. There might be a > "hack" to overcome this limitation, set web.upload.dir to be in S3 / GCS or > elsewhere accessible by all JobManagers. I didn't try this. Or in the case > of uploading jars and creating jobs, ensure the same JobManager is called > (bypass loadbalancer). > > But I wonder if there's something else why the leading JM should be called. > > A follow-up question arises. If the jars are stored only on the leading > JobManager, doesn't that mean that if the leader changes, the new leader is > not aware of the jars uploaded to the old leader? From the REST > API's perspective this means that even in the JobManager HA setup and when > always calling the leader, a simple "upload a jar and a deploy a job"-cycle > is not guaranteed to work if the leader happens to change between the > requests. Did I miss something? > > -- > Regards, > Juha > > >