On Mon, May 6, 2019 at 12:09 PM Juan Carlos Garcia <jcgarc...@gmail.com>
wrote:

> As everyone has pointed out there will be a small overhead added by the
> abstraction but in my own experience its totally worth it.
>
> Almost two years ago we decided to jump into the beam wagon, by first
> deploying into an on-premises hadoop cluster with the Spark engine (just
> because spark was already available and we didn't want to introduce a new
> stack in our hadoop cluster), then we moved to a Flink cluster (due to
> others reason) and few months later we moved 90% of our streaming
> processing to Dataflow (in order to migrate the on-premises cluster to the
> cloud), all that wouldn't have been possible without the beam abstraction.
>
> In conclusion beam abstraction rocks, it's not perfect, but it's really
> good.
>
> Just my 2 cents.
>
> Matt Casters <mattcast...@gmail.com> schrieb am Mo., 6. Mai 2019, 15:33:
>
>> I've dealt with responses like this for a number of decades.  With Kettle
>> Beam I could say: "here, in 20 minutes of visual programming you have your
>> pipeline up and running".  It's easy to set up, maintain, debug, unit test,
>> version control... the whole thing. And then someone would say: Naaah, if I
>> don't code it myself I don't trust it.  Usually it's worded differently but
>> that's what it comes down to.
>> Some people think in terms of impossibilities instead of possibilities
>> and will always find some reason why they fall in that 0.1% of the cases.
>>
>> > Lets say Beam came up with the abstractions long before other runners
>> but to map things to runners it is going to take time (that's where things
>> are today). so its always a moving target.
>>
>> Any scaleable data processing problem you might have that can't be solved
>> by Spark, Flink or DataFlow is pretty obscure don't you think?
>>
>> Great discussion :-)
>>
>> Cheers,
>> Matt
>> ---
>> Matt Casters <m <mcast...@pentaho.org>attcast...@gmail.com>
>> Senior Solution Architect, Kettle Project Founder
>>
>>
>>
>> Op zo 5 mei 2019 om 00:18 schreef kant kodali <kanth...@gmail.com>:
>>
>>> I believe this comes down to more of abstractions vs execution engines
>>> and I am sure people can take on both sides. I think both are important
>>> however It is worth noting that the execution framework themselves have a
>>> lot of abstractions but sure more generic ones can be built on top. Are
>>> abstractions always good?! I will just point to this book
>>> <https://www.amazon.com/Philosophy-Software-Design-John-Ousterhout/dp/1732102201/ref=sr_1_1?keywords=john+ousterhout+book&qid=1557008185&s=gateway&sr=8-1>
>>>
>>> I tend to lean more on the execution engines side because I can build
>>> something on top. I am also not sure if Beam is the first one to come up
>>> with these ideas since Frameworks like Cascading had existed long before.
>>>
>>> Lets say Beam came up with the abstractions long before other runners
>>> but to map things to runners it is going to take time (that's where things
>>> are today). so its always a moving target.
>>>
>>>
>>>
>>>
>>>
>>>
>>> On Tue, Apr 30, 2019 at 3:15 PM Kenneth Knowles <k...@apache.org> wrote:
>>>
>>>> It is worth noting that Beam isn't solely a portability layer that
>>>> exposes underlying API features, but a feature-rich layer in its own right,
>>>> with carefully coherent abstractions. For example, quite early on the
>>>> SparkRunner supported streaming aspects of the Beam model - watermarks,
>>>> windowing, triggers - that were not really available any other way. Beam's
>>>> various features sometimes requires just a pass-through API and sometimes
>>>> requires clever new implementation. And everything is moving constantly. I
>>>> don't see Beam as following the features of any engine, but rather coming
>>>> up with new needed data processing abstractions and figuring out how to
>>>> efficiently implement them on top of various architectures.
>>>>
>>>> Kenn
>>>>
>>>> On Tue, Apr 30, 2019 at 8:37 AM kant kodali <kanth...@gmail.com> wrote:
>>>>
>>>>> Staying behind doesn't imply one is better than the other and I didn't
>>>>> mean that in any way but I fail to see how an abstraction framework like
>>>>> Beam can stay ahead of the underlying execution engines?
>>>>>
>>>>> For example, If a new feature is added into the underlying execution
>>>>> engine that doesn't fit the interface of Beam or breaks then I would think
>>>>> the interface would need to be changed. Another example would say the
>>>>> underlying execution engines take different kind's of parameters for the
>>>>> same feature then it isn't so straight forward to come up with an 
>>>>> interface
>>>>> since there might be very little in common in the first place so, in that
>>>>> sense, I fail to see how Beam can stay ahead.
>>>>>
>>>>> "Of course the API itself is Spark-specific, but it borrows heavily
>>>>> (among other things) on ideas that Beam itself pioneered long before Spark
>>>>> 2.0" Good to know.
>>>>>
>>>>> "one of the things Beam has focused on was a language portability
>>>>> framework"  Sure but how important is this for a typical user? Do people
>>>>> stop using a particular tool because it is in an X language? I personally
>>>>> would put features first over language portability and it's completely 
>>>>> fine
>>>>> that may not be in line with Beam's priorities. All said I can agree Beam
>>>>> focus on language portability is great.
>>>>>
>>>>> On Tue, Apr 30, 2019 at 2:48 AM Maximilian Michels <m...@apache.org>
>>>>> wrote:
>>>>>
>>>>>> > I wouldn't say one is, or will always be, in front of or behind
>>>>>> another.
>>>>>>
>>>>>> That's a great way to phrase it. I think it is very common to jump to
>>>>>> the conclusion that one system is better than the other. In reality
>>>>>> it's
>>>>>> often much more complicated.
>>>>>>
>>>>>> For example, one of the things Beam has focused on was a language
>>>>>> portability framework. Do I get this with Flink? No. Does that mean
>>>>>> Beam
>>>>>> is better than Flink? No. Maybe a better question would be, do I want
>>>>>> to
>>>>>> be able to run Python pipelines?
>>>>>>
>>>>>> This is just an example, there are many more factors to consider.
>>>>>>
>>>>>> Cheers,
>>>>>> Max
>>>>>>
>>>>>> On 30.04.19 10:59, Robert Bradshaw wrote:
>>>>>> > Though we all certainly have our biases, I think it's fair to say
>>>>>> that
>>>>>> > all of these systems are constantly innovating, borrowing ideas from
>>>>>> > one another, and have their strengths and weaknesses. I wouldn't say
>>>>>> > one is, or will always be, in front of or behind another.
>>>>>> >
>>>>>> > Take, as the given example Spark Structured Streaming. Of course the
>>>>>> > API itself is spark-specific, but it borrows heavily (among other
>>>>>> > things) on ideas that Beam itself pioneered long before Spark 2.0,
>>>>>> > specifically the unification of batch and streaming processing into
>>>>>> a
>>>>>> > single API, and the event-time based windowing (triggering) model
>>>>>> for
>>>>>> > consistently and correctly handling distributed, out-of-order data
>>>>>> > streams.
>>>>>> >
>>>>>> > Of course there are also operational differences. Spark, for
>>>>>> example,
>>>>>> > is very tied to the micro-batch style of execution whereas Flink is
>>>>>> > fundamentally very continuous, and Beam delegates to the underlying
>>>>>> > runner.
>>>>>> >
>>>>>> > It is certainly Beam's goal to keep overhead minimal, and one of the
>>>>>> > primary selling points is the flexibility of portability (of both
>>>>>> the
>>>>>> > execution runtime and the SDK) as your needs change.
>>>>>> >
>>>>>> > - Robert
>>>>>> >
>>>>>> >
>>>>>> > On Tue, Apr 30, 2019 at 5:29 AM <kanth...@gmail.com> wrote:
>>>>>> >>
>>>>>> >> Ofcourse! I suspect beam will always be one or two step backwards
>>>>>> to the new functionality that is available or yet to come.
>>>>>> >>
>>>>>> >> For example: Spark Structured Streaming is still not available, no
>>>>>> CEP apis yet and much more.
>>>>>> >>
>>>>>> >> Sent from my iPhone
>>>>>> >>
>>>>>> >> On Apr 30, 2019, at 12:11 AM, Pankaj Chand <
>>>>>> pankajchanda...@gmail.com> wrote:
>>>>>> >>
>>>>>> >> Will Beam add any overhead or lack certain API/functions available
>>>>>> in Spark/Flink?
>>>>>>
>>>>>

Reply via email to