On Mon, May 6, 2019 at 12:09 PM Juan Carlos Garcia <jcgarc...@gmail.com> wrote:
> As everyone has pointed out there will be a small overhead added by the > abstraction but in my own experience its totally worth it. > > Almost two years ago we decided to jump into the beam wagon, by first > deploying into an on-premises hadoop cluster with the Spark engine (just > because spark was already available and we didn't want to introduce a new > stack in our hadoop cluster), then we moved to a Flink cluster (due to > others reason) and few months later we moved 90% of our streaming > processing to Dataflow (in order to migrate the on-premises cluster to the > cloud), all that wouldn't have been possible without the beam abstraction. > > In conclusion beam abstraction rocks, it's not perfect, but it's really > good. > > Just my 2 cents. > > Matt Casters <mattcast...@gmail.com> schrieb am Mo., 6. Mai 2019, 15:33: > >> I've dealt with responses like this for a number of decades. With Kettle >> Beam I could say: "here, in 20 minutes of visual programming you have your >> pipeline up and running". It's easy to set up, maintain, debug, unit test, >> version control... the whole thing. And then someone would say: Naaah, if I >> don't code it myself I don't trust it. Usually it's worded differently but >> that's what it comes down to. >> Some people think in terms of impossibilities instead of possibilities >> and will always find some reason why they fall in that 0.1% of the cases. >> >> > Lets say Beam came up with the abstractions long before other runners >> but to map things to runners it is going to take time (that's where things >> are today). so its always a moving target. >> >> Any scaleable data processing problem you might have that can't be solved >> by Spark, Flink or DataFlow is pretty obscure don't you think? >> >> Great discussion :-) >> >> Cheers, >> Matt >> --- >> Matt Casters <m <mcast...@pentaho.org>attcast...@gmail.com> >> Senior Solution Architect, Kettle Project Founder >> >> >> >> Op zo 5 mei 2019 om 00:18 schreef kant kodali <kanth...@gmail.com>: >> >>> I believe this comes down to more of abstractions vs execution engines >>> and I am sure people can take on both sides. I think both are important >>> however It is worth noting that the execution framework themselves have a >>> lot of abstractions but sure more generic ones can be built on top. Are >>> abstractions always good?! I will just point to this book >>> <https://www.amazon.com/Philosophy-Software-Design-John-Ousterhout/dp/1732102201/ref=sr_1_1?keywords=john+ousterhout+book&qid=1557008185&s=gateway&sr=8-1> >>> >>> I tend to lean more on the execution engines side because I can build >>> something on top. I am also not sure if Beam is the first one to come up >>> with these ideas since Frameworks like Cascading had existed long before. >>> >>> Lets say Beam came up with the abstractions long before other runners >>> but to map things to runners it is going to take time (that's where things >>> are today). so its always a moving target. >>> >>> >>> >>> >>> >>> >>> On Tue, Apr 30, 2019 at 3:15 PM Kenneth Knowles <k...@apache.org> wrote: >>> >>>> It is worth noting that Beam isn't solely a portability layer that >>>> exposes underlying API features, but a feature-rich layer in its own right, >>>> with carefully coherent abstractions. For example, quite early on the >>>> SparkRunner supported streaming aspects of the Beam model - watermarks, >>>> windowing, triggers - that were not really available any other way. Beam's >>>> various features sometimes requires just a pass-through API and sometimes >>>> requires clever new implementation. And everything is moving constantly. I >>>> don't see Beam as following the features of any engine, but rather coming >>>> up with new needed data processing abstractions and figuring out how to >>>> efficiently implement them on top of various architectures. >>>> >>>> Kenn >>>> >>>> On Tue, Apr 30, 2019 at 8:37 AM kant kodali <kanth...@gmail.com> wrote: >>>> >>>>> Staying behind doesn't imply one is better than the other and I didn't >>>>> mean that in any way but I fail to see how an abstraction framework like >>>>> Beam can stay ahead of the underlying execution engines? >>>>> >>>>> For example, If a new feature is added into the underlying execution >>>>> engine that doesn't fit the interface of Beam or breaks then I would think >>>>> the interface would need to be changed. Another example would say the >>>>> underlying execution engines take different kind's of parameters for the >>>>> same feature then it isn't so straight forward to come up with an >>>>> interface >>>>> since there might be very little in common in the first place so, in that >>>>> sense, I fail to see how Beam can stay ahead. >>>>> >>>>> "Of course the API itself is Spark-specific, but it borrows heavily >>>>> (among other things) on ideas that Beam itself pioneered long before Spark >>>>> 2.0" Good to know. >>>>> >>>>> "one of the things Beam has focused on was a language portability >>>>> framework" Sure but how important is this for a typical user? Do people >>>>> stop using a particular tool because it is in an X language? I personally >>>>> would put features first over language portability and it's completely >>>>> fine >>>>> that may not be in line with Beam's priorities. All said I can agree Beam >>>>> focus on language portability is great. >>>>> >>>>> On Tue, Apr 30, 2019 at 2:48 AM Maximilian Michels <m...@apache.org> >>>>> wrote: >>>>> >>>>>> > I wouldn't say one is, or will always be, in front of or behind >>>>>> another. >>>>>> >>>>>> That's a great way to phrase it. I think it is very common to jump to >>>>>> the conclusion that one system is better than the other. In reality >>>>>> it's >>>>>> often much more complicated. >>>>>> >>>>>> For example, one of the things Beam has focused on was a language >>>>>> portability framework. Do I get this with Flink? No. Does that mean >>>>>> Beam >>>>>> is better than Flink? No. Maybe a better question would be, do I want >>>>>> to >>>>>> be able to run Python pipelines? >>>>>> >>>>>> This is just an example, there are many more factors to consider. >>>>>> >>>>>> Cheers, >>>>>> Max >>>>>> >>>>>> On 30.04.19 10:59, Robert Bradshaw wrote: >>>>>> > Though we all certainly have our biases, I think it's fair to say >>>>>> that >>>>>> > all of these systems are constantly innovating, borrowing ideas from >>>>>> > one another, and have their strengths and weaknesses. I wouldn't say >>>>>> > one is, or will always be, in front of or behind another. >>>>>> > >>>>>> > Take, as the given example Spark Structured Streaming. Of course the >>>>>> > API itself is spark-specific, but it borrows heavily (among other >>>>>> > things) on ideas that Beam itself pioneered long before Spark 2.0, >>>>>> > specifically the unification of batch and streaming processing into >>>>>> a >>>>>> > single API, and the event-time based windowing (triggering) model >>>>>> for >>>>>> > consistently and correctly handling distributed, out-of-order data >>>>>> > streams. >>>>>> > >>>>>> > Of course there are also operational differences. Spark, for >>>>>> example, >>>>>> > is very tied to the micro-batch style of execution whereas Flink is >>>>>> > fundamentally very continuous, and Beam delegates to the underlying >>>>>> > runner. >>>>>> > >>>>>> > It is certainly Beam's goal to keep overhead minimal, and one of the >>>>>> > primary selling points is the flexibility of portability (of both >>>>>> the >>>>>> > execution runtime and the SDK) as your needs change. >>>>>> > >>>>>> > - Robert >>>>>> > >>>>>> > >>>>>> > On Tue, Apr 30, 2019 at 5:29 AM <kanth...@gmail.com> wrote: >>>>>> >> >>>>>> >> Ofcourse! I suspect beam will always be one or two step backwards >>>>>> to the new functionality that is available or yet to come. >>>>>> >> >>>>>> >> For example: Spark Structured Streaming is still not available, no >>>>>> CEP apis yet and much more. >>>>>> >> >>>>>> >> Sent from my iPhone >>>>>> >> >>>>>> >> On Apr 30, 2019, at 12:11 AM, Pankaj Chand < >>>>>> pankajchanda...@gmail.com> wrote: >>>>>> >> >>>>>> >> Will Beam add any overhead or lack certain API/functions available >>>>>> in Spark/Flink? >>>>>> >>>>>