Thank you, answers my questions.


On Wed, Aug 18, 2021 at 2:28 PM Chesnay Schepler <> wrote:

> You've pretty much answered the question yourself. *thumbs up*
> For the vast majority of cases you can call any JobManager.
> The exceptions are jar operations (because they are persisted in the
> JM-local filesystem, and other JMs don't know about them) and triggering
> savepoints (because metadata for on-going savepoint operations (i.e., the
> information returned when querying the savepoint operation status) is also
> kept locally in the JM).
> This does indeed imply that on JM failover all this information is lost.
> There are ideas to solve is, but no concrete timeline. See
> On 18/08/2021 11:54, Juha Mynttinen wrote:
> I have questions related to REST API in the case of ZooKeeper HA and a
> standalone cluster. But I think the questions apply to other setups too
> such as YARN.
> Let's assume a standalone cluster with multiple JobManagers. The
> JobManagers elect the leader among themselves and register that to
> ZooKeeper. When using the Flink command line, AFAIK the code will go to
> ZooKeeper to find the host and port of the leading JobManager and send HTTP
> requests there.
> My question is: when accessing the REST API directly (e.g. curl) does one
> need to call the leading JobManager or will any up and running JobManager
> do? And if the leader needs to be called, why is it so?
> Behind the scenes the REST API will connect to the leading "JobManager"
> over RPC, making it irrelevant which JobManager receives the HTTP request.
> By experimenting, I found the Web UI works fine if all the JobManagers are
> behind a load balancer and leading and standby JobManagers are called. The
> only issue I found was that when a jar is submitted (/jars/upload), it is
> stored on the local disk of the JobManager that happens to handle that
> request. As a consequence, creating a job from that jar only succeeds if
> the HTTP request hits the JobManager that has the file. There might be a
> "hack" to overcome this limitation, set web.upload.dir to be in S3 / GCS or
> elsewhere accessible by all JobManagers. I didn't try this. Or in the case
> of uploading jars and creating jobs, ensure the same JobManager is called
> (bypass loadbalancer).
> But I wonder if there's something else why the leading JM should be called.
> A follow-up question arises. If the jars are stored only on the leading
> JobManager, doesn't that mean that if the leader changes, the new leader is
> not aware of the jars uploaded to the old leader? From the REST
> API's perspective this means that even in the JobManager HA setup and when
> always calling the leader, a simple "upload a jar and a deploy a job"-cycle
> is not guaranteed to work if the leader happens to change between the
> requests. Did I miss something?
> --
> Regards,
> Juha

Reply via email to