Hello, Emanuele, thanks for posting your example. Congratulations, you just had an amazing idea. First to help people to start with Beam, but also to provide an easy way to run/test Beam pipelines on the Flink runner. This second idea is really useful for me because I am running/testing ideas in all the runners and this is a perfect way to do it.
I started working to make the docker image you were based on (the one on https://github.com/apache/flink/tree/master/flink-contrib/docker-flink smaller. I just created FLINK-4118 and a PR that reduces the default image in 460 MB. https://github.com/apache/flink/pull/2176 I hope the Flink guys accept the changes. For anyone interested the final flink image is also available from my docker account docker pull iemejia/flink I also started a project to contrib the integration of this smaller version of the Flink image with Beam into Apache Beam, this probably goes in the same line of work of the previous email from Max.I took the freedom to rebase Emanuele changes into a big commit, and start working from there https://github.com/iemejia/incubator-beam/tree/docker-flink. I hope we can share our work there, of course with the people interested (e.g. Emanuele and Maximilian). Max, I have two questions: 1. My current approach is based on Emanuele’s idea to create an uber jar with the Beam SDK + the Flink Runner + all the Beam IOs and their dependencies (I exclude all org.apache.flink because those are provided by Flink). I put this big jar on $FLINK_HOME/lib and I start Flink. However I created a small Beam example jar and submitted it into Flink but I am having classpath issues. Do you have any suggestions, is there a better way to do this ? I suppose my approach is far from the best but I don’t know how Flink deals with this 'extension' cases. 2. I only found a way to run both the Flink’s JobManager and TaskManager in daemon mode. Is there an easy way to run both as normal processes? I ask this because the current docker image uses supervisor to keep the processes alive, but if we can get rid of supervisor the image will be reduced in 40 more MB, and be really minimalistic, any ideas? Regards, Ismael ps. Amit and JB, if you want I can prepare a docker image for the spark runner, probably using the spark-job-server image as a base, I still have to check how viable is this but I think is feasible. On Tue, Jun 28, 2016 at 1:39 PM, Maximilian Michels <[email protected]> wrote: > Thanks for sharing Emanuele! Looking forward to providing built-in > Docker support in Beam. > > On Fri, Jun 24, 2016 at 9:30 AM, Amit Sela <[email protected]> wrote: > > You're right about standalone, I know many companies (small-medium) > > companies that prefer spawning standalone per use case/s. I'm currently > > biased now towards large clusters because of my current work place ;) > which > > relates better to my previous comment. > > > > > > On Fri, Jun 24, 2016, 03:42 Emanuele Cesena <[email protected]> > wrote: > >> > >> Thanks Amit! > >> > >> I chose Flink because of the current capability support and for the > nicer > >> front end UI, but I have nothing against Spark — actually I’m using > Spark in > >> my daily job, and chances are that if we’ll use Beam, it will be on > Spark > >> first. > >> > >> I can also tell you that I know of 2 instances (MemSQL, that distribute > >> its own Spark, and our parent company SK Planet in Korea) that prefer > Spark > >> standalone, mostly for performance and easy of setup. So I can see a > lot of > >> potential even in production environments. > >> > >> Best, > >> > >> > >> > On Jun 23, 2016, at 3:42 PM, Amit Sela <[email protected]> wrote: > >> > > >> > Thanks for sharing Emanuele, I will definitely look into trying > >> > something like that with Spark as well :) > >> > While production clusters (usually) use YARN/Mesos to manage > resources, > >> > this could be really great for developers to use on a virtual > environment. > >> > Really interesting! > >> > > >> > On Thu, Jun 23, 2016 at 7:21 PM Emanuele Cesena < > [email protected]> > >> > wrote: > >> > Thank you Aljoscha! > >> > > >> > > On Jun 23, 2016, at 1:19 AM, Aljoscha Krettek <[email protected]> > >> > > wrote: > >> > > > >> > > It's a very nice write up indeed! Thanks for sharing. :-) > >> > > > >> > > On Thu, 23 Jun 2016 at 07:35 Jean-Baptiste Onofré <[email protected]> > >> > > wrote: > >> > > Hi Emanuele, > >> > > > >> > > this is a great example ! > >> > > > >> > > It shows Beam with Flink. Maybe we can enhance a bit showing how the > >> > > same pipeline can result to different docker depending of the > backend. > >> > > > >> > > I'm working on new "concrete" Beam samples showing that: > >> > > > >> > > https://github.com/jbonofre/beam-samples > >> > > > >> > > Great work anyway ! > >> > > > >> > > Regards > >> > > JB > >> > > > >> > > On 06/22/2016 10:18 PM, Emanuele Cesena wrote: > >> > > > Hi, > >> > > > > >> > > > I just published a "quick start" with Beam and wanted to share: > >> > > > > >> > > > > https://medium.com/@ecesena/a-quick-demo-of-apache-beam-with-docker-da98b99a502a > >> > > > > >> > > > Related repos: > >> > > > https://github.com/ecesena/docker-beam-flink > >> > > > https://github.com/ecesena/beam-starter > >> > > > > >> > > > Any feedback is more than welcome! > >> > > > > >> > > > Best, > >> > > > E. > >> > > > > >> > > > >> > > -- > >> > > Jean-Baptiste Onofré > >> > > [email protected] > >> > > http://blog.nanthrax.net > >> > > Talend - http://www.talend.com > >> > > >> > > >
