Thank you, answers my questions.

--
Regards,
Juha

On Wed, Aug 18, 2021 at 2:28 PM Chesnay Schepler <ches...@apache.org> wrote:

> You've pretty much answered the question yourself. *thumbs up*
>
> For the vast majority of cases you can call any JobManager.
> The exceptions are jar operations (because they are persisted in the
> JM-local filesystem, and other JMs don't know about them) and triggering
> savepoints (because metadata for on-going savepoint operations (i.e., the
> information returned when querying the savepoint operation status) is also
> kept locally in the JM).
>
> This does indeed imply that on JM failover all this information is lost.
>
> There are ideas to solve is, but no concrete timeline. See
> https://issues.apache.org/jira/browse/FLINK-18312
>
> On 18/08/2021 11:54, Juha Mynttinen wrote:
>
> I have questions related to REST API in the case of ZooKeeper HA and a
> standalone cluster. But I think the questions apply to other setups too
> such as YARN.
>
> Let's assume a standalone cluster with multiple JobManagers. The
> JobManagers elect the leader among themselves and register that to
> ZooKeeper. When using the Flink command line, AFAIK the code will go to
> ZooKeeper to find the host and port of the leading JobManager and send HTTP
> requests there.
>
> My question is: when accessing the REST API directly (e.g. curl) does one
> need to call the leading JobManager or will any up and running JobManager
> do? And if the leader needs to be called, why is it so?
>
> Behind the scenes the REST API will connect to the leading "JobManager"
> over RPC, making it irrelevant which JobManager receives the HTTP request.
>
> By experimenting, I found the Web UI works fine if all the JobManagers are
> behind a load balancer and leading and standby JobManagers are called. The
> only issue I found was that when a jar is submitted (/jars/upload), it is
> stored on the local disk of the JobManager that happens to handle that
> request. As a consequence, creating a job from that jar only succeeds if
> the HTTP request hits the JobManager that has the file. There might be a
> "hack" to overcome this limitation, set web.upload.dir to be in S3 / GCS or
> elsewhere accessible by all JobManagers. I didn't try this. Or in the case
> of uploading jars and creating jobs, ensure the same JobManager is called
> (bypass loadbalancer).
>
> But I wonder if there's something else why the leading JM should be called.
>
> A follow-up question arises. If the jars are stored only on the leading
> JobManager, doesn't that mean that if the leader changes, the new leader is
> not aware of the jars uploaded to the old leader? From the REST
> API's perspective this means that even in the JobManager HA setup and when
> always calling the leader, a simple "upload a jar and a deploy a job"-cycle
> is not guaranteed to work if the leader happens to change between the
> requests. Did I miss something?
>
> --
> Regards,
> Juha
>
>
>

Reply via email to