Re: Rethinking the Flink Runner modes

2019-10-31 Thread Robert Bradshaw
Yes. If someone starts up the job server manually, they would have to manually specify LOOPBACK if they want it. Python's FlinkRunner does not use a pre-configured job server, it starts one up itself (making the default scenario simple). On Thu, Oct 31, 2019 at 10:19 AM Thomas Weise wrote: > >

Re: Rethinking the Flink Runner modes

2019-10-31 Thread Thomas Weise
True, but it would probably be a good tradeoff to make the default scenario simple (just for FlinkRunner, not for PortableRunner). If someone configures the job server with options, they probably also know how to control the environment? On Thu, Oct 31, 2019 at 9:09 AM Maximilian Michels wrote:

Re: Rethinking the Flink Runner modes

2019-10-31 Thread Maximilian Michels
When the FlinkRunner (python client) sees flink_master as [auto] or [local], then it could set the default environment to LOOPBACK before the pipeline is constructed and provide the loopback environment. Isn't that fully client side controlled? There is the case of a pre-configured job

Re: Rethinking the Flink Runner modes

2019-10-31 Thread Thomas Weise
On Thu, Oct 31, 2019 at 3:55 AM Maximilian Michels wrote: > > Thanks for clarifying. So when I run "./flink my_pipeline.jar" or > > upload the jar via the REST API (and its main method invoked on the > > master) then [auto] reads the config and does the right thing, but if > > I do java

Re: Rethinking the Flink Runner modes

2019-10-31 Thread Maximilian Michels
Thanks for clarifying. So when I run "./flink my_pipeline.jar" or upload the jar via the REST API (and its main method invoked on the master) then [auto] reads the config and does the right thing, but if I do java my_pipeline.jar it'll run locally. Correct. Python needs to know even whether

Re: Rethinking the Flink Runner modes

2019-10-30 Thread Robert Bradshaw
On Wed, Oct 30, 2019 at 3:34 PM Maximilian Michels wrote: > > > One thing I don't understand is what it means for "CLI or REST API > > context [to be] present." Where does this context come from? A config > > file in a standard location on the user's machine? Or is this > > something that is only

Re: Rethinking the Flink Runner modes

2019-10-30 Thread Maximilian Michels
One thing I don't understand is what it means for "CLI or REST API context [to be] present." Where does this context come from? A config file in a standard location on the user's machine? Or is this something that is only present when a user uploads a jar and then Flink runs it in a specific

Re: Rethinking the Flink Runner modes

2019-10-30 Thread Robert Bradshaw
One more question: https://issues.apache.org/jira/browse/BEAM-8396 still seems valuable, but with [auto] as the default, how should we detect whether LOOPBACK is safe to enable from Python? On Wed, Oct 30, 2019 at 11:53 AM Robert Bradshaw wrote: > > Sounds good to me. > > One thing I don't

Re: Rethinking the Flink Runner modes

2019-10-30 Thread Robert Bradshaw
Sounds good to me. One thing I don't understand is what it means for "CLI or REST API context [to be] present." Where does this context come from? A config file in a standard location on the user's machine? Or is this something that is only present when a user uploads a jar and then Flink runs it

Re: Rethinking the Flink Runner modes

2019-10-29 Thread Maximilian Michels
tl;dr: - I see consensus for inferring "http://; in Python to align it with the Java behavior which currently requires leaving out the protocol scheme. Optionally, Java could also accept a scheme which gets removed as required by the Flink Java Rest client. - We won't support "https://; in

Re: Rethinking the Flink Runner modes

2019-10-29 Thread Jan Lukavský
Hi, +1 for empty string being interpreted as [auto] and anything else having explicit notation. One more reason that was not part of this discussion yet. In [1] there was a discussion about LocalEnvironment (that is the one that is responsible for spawning in process Flink cluster) not

Re: Rethinking the Flink Runner modes

2019-10-28 Thread Thomas Weise
The current semantics of flink_master are tied to the Flink Java API. The Flink client / Java API isn't a "REST API". It now uses the REST API somewhere deep in RemoteEnvironment when the flink_master value is host:port, but it does a lot of other things as well, such are parsing config files and

Re: Rethinking the Flink Runner modes

2019-10-28 Thread Kyle Weaver
Filed https://issues.apache.org/jira/browse/BEAM-8507 for the issue I mentioned. On Mon, Oct 28, 2019 at 4:12 PM Kyle Weaver wrote: > > I'd like to see this issue resolved before 2.17 as changing the public > API once it's released will be harder. > > +1. In particular, I misunderstood that

Re: Rethinking the Flink Runner modes

2019-10-28 Thread Kyle Weaver
> I'd like to see this issue resolved before 2.17 as changing the public API once it's released will be harder. +1. In particular, I misunderstood that [auto] is not supported by `FlinkUberJarJobServer`. Since [auto] is now the default, it's broken for Python 3.6+.

Re: Rethinking the Flink Runner modes

2019-10-28 Thread Robert Bradshaw
Thanks for bringing this to the list. Some comments below, though it would be good to get additional feedback beyond those that have been participating on the PR, if any. I'd like to see this issue resolved before 2.17 as changing the public API once it's released will be harder. On Mon, Oct 28,

Rethinking the Flink Runner modes

2019-10-28 Thread Maximilian Michels
Hi, Robert and Kyle have been doing great work to simplify submitting portable pipelines with the Flink Runner. Part of this is having a Python "FlinkRunner" which handles bringing up a Beam job server and submitting the pipeline directly via the Flink REST API. One building block is the