Flink?

Juan Carlos Garcia Mon, 06 May 2019 13:57:28 -0700

Hi,

I don't want to hijack the thread regarding as of why, but to keep it short
we experienced a lots of problems with Spark (streaming pipeline) +
checkpoints, at the point it was like a gambling to restart a pipeline
without spark going nuts while restoring from a checkpoint resulting then
on data lost(spark bugs). There were pipelines under heavy development
which requires to redeploy them multiple time a day.


We found on Flink the stability /features we needed while we planned a
migration to a managed environment (luckily 'Dataflow' which at that time
was not yet approved) and in our case as you mentioned, we were lucky to be
able to switch across runners without major problems.

Thanks

kant kodali <kanth...@gmail.com> schrieb am Mo., 6. Mai 2019, 21:34:

> 1) It would be good to know the reasons why you guys moved from one
> execution to another?
> 2) You are lucky to have your problem fit into all three execution engines
> and supported by the Beam at the same time. This is certainly not the case
> for me since some runners that Beam supports are still a Work in progress
> while the execution engine had the support since 2 years at very least.
>
>
>
> On Mon, May 6, 2019 at 12:24 PM kant kodali <kanth...@gmail.com> wrote:
>
>>
>>
>> On Mon, May 6, 2019 at 12:09 PM Juan Carlos Garcia <jcgarc...@gmail.com>
>> wrote:
>>
>>> As everyone has pointed out there will be a small overhead added by the
>>> abstraction but in my own experience its totally worth it.
>>>
>>> Almost two years ago we decided to jump into the beam wagon, by first
>>> deploying into an on-premises hadoop cluster with the Spark engine (just
>>> because spark was already available and we didn't want to introduce a new
>>> stack in our hadoop cluster), then we moved to a Flink cluster (due to
>>> others reason) and few months later we moved 90% of our streaming
>>> processing to Dataflow (in order to migrate the on-premises cluster to the
>>> cloud), all that wouldn't have been possible without the beam abstraction.
>>>
>>> In conclusion beam abstraction rocks, it's not perfect, but it's really
>>> good.
>>>
>>> Just my 2 cents.
>>>
>>> Matt Casters <mattcast...@gmail.com> schrieb am Mo., 6. Mai 2019, 15:33:
>>>
>>>> I've dealt with responses like this for a number of decades.  With
>>>> Kettle Beam I could say: "here, in 20 minutes of visual programming you
>>>> have your pipeline up and running".  It's easy to set up, maintain, debug,
>>>> unit test, version control... the whole thing. And then someone would say:
>>>> Naaah, if I don't code it myself I don't trust it.  Usually it's worded
>>>> differently but that's what it comes down to.
>>>> Some people think in terms of impossibilities instead of possibilities
>>>> and will always find some reason why they fall in that 0.1% of the cases.
>>>>
>>>> > Lets say Beam came up with the abstractions long before other runners
>>>> but to map things to runners it is going to take time (that's where things
>>>> are today). so its always a moving target.
>>>>
>>>> Any scaleable data processing problem you might have that can't be
>>>> solved by Spark, Flink or DataFlow is pretty obscure don't you think?
>>>>
>>>> Great discussion :-)
>>>>
>>>> Cheers,
>>>> Matt
>>>> ---
>>>> Matt Casters <m <mcast...@pentaho.org>attcast...@gmail.com>
>>>> Senior Solution Architect, Kettle Project Founder
>>>>
>>>>
>>>>
>>>> Op zo 5 mei 2019 om 00:18 schreef kant kodali <kanth...@gmail.com>:
>>>>
>>>>> I believe this comes down to more of abstractions vs execution engines
>>>>> and I am sure people can take on both sides. I think both are important
>>>>> however It is worth noting that the execution framework themselves have a
>>>>> lot of abstractions but sure more generic ones can be built on top. Are
>>>>> abstractions always good?! I will just point to this book
>>>>> <https://www.amazon.com/Philosophy-Software-Design-John-Ousterhout/dp/1732102201/ref=sr_1_1?keywords=john+ousterhout+book&qid=1557008185&s=gateway&sr=8-1>
>>>>>
>>>>> I tend to lean more on the execution engines side because I can build
>>>>> something on top. I am also not sure if Beam is the first one to come up
>>>>> with these ideas since Frameworks like Cascading had existed long before.
>>>>>
>>>>> Lets say Beam came up with the abstractions long before other runners
>>>>> but to map things to runners it is going to take time (that's where things
>>>>> are today). so its always a moving target.
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> On Tue, Apr 30, 2019 at 3:15 PM Kenneth Knowles <k...@apache.org>
>>>>> wrote:
>>>>>
>>>>>> It is worth noting that Beam isn't solely a portability layer that
>>>>>> exposes underlying API features, but a feature-rich layer in its own 
>>>>>> right,
>>>>>> with carefully coherent abstractions. For example, quite early on the
>>>>>> SparkRunner supported streaming aspects of the Beam model - watermarks,
>>>>>> windowing, triggers - that were not really available any other way. 
>>>>>> Beam's
>>>>>> various features sometimes requires just a pass-through API and sometimes
>>>>>> requires clever new implementation. And everything is moving constantly. 
>>>>>> I
>>>>>> don't see Beam as following the features of any engine, but rather coming
>>>>>> up with new needed data processing abstractions and figuring out how to
>>>>>> efficiently implement them on top of various architectures.
>>>>>>
>>>>>> Kenn
>>>>>>
>>>>>> On Tue, Apr 30, 2019 at 8:37 AM kant kodali <kanth...@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Staying behind doesn't imply one is better than the other and I
>>>>>>> didn't mean that in any way but I fail to see how an abstraction 
>>>>>>> framework
>>>>>>> like Beam can stay ahead of the underlying execution engines?
>>>>>>>
>>>>>>> For example, If a new feature is added into the underlying execution
>>>>>>> engine that doesn't fit the interface of Beam or breaks then I would 
>>>>>>> think
>>>>>>> the interface would need to be changed. Another example would say the
>>>>>>> underlying execution engines take different kind's of parameters for the
>>>>>>> same feature then it isn't so straight forward to come up with an 
>>>>>>> interface
>>>>>>> since there might be very little in common in the first place so, in 
>>>>>>> that
>>>>>>> sense, I fail to see how Beam can stay ahead.
>>>>>>>
>>>>>>> "Of course the API itself is Spark-specific, but it borrows heavily
>>>>>>> (among other things) on ideas that Beam itself pioneered long before 
>>>>>>> Spark
>>>>>>> 2.0" Good to know.
>>>>>>>
>>>>>>> "one of the things Beam has focused on was a language portability
>>>>>>> framework"  Sure but how important is this for a typical user? Do people
>>>>>>> stop using a particular tool because it is in an X language? I 
>>>>>>> personally
>>>>>>> would put features first over language portability and it's completely 
>>>>>>> fine
>>>>>>> that may not be in line with Beam's priorities. All said I can agree 
>>>>>>> Beam
>>>>>>> focus on language portability is great.
>>>>>>>
>>>>>>> On Tue, Apr 30, 2019 at 2:48 AM Maximilian Michels <m...@apache.org>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> > I wouldn't say one is, or will always be, in front of or behind
>>>>>>>> another.
>>>>>>>>
>>>>>>>> That's a great way to phrase it. I think it is very common to jump
>>>>>>>> to
>>>>>>>> the conclusion that one system is better than the other. In reality
>>>>>>>> it's
>>>>>>>> often much more complicated.
>>>>>>>>
>>>>>>>> For example, one of the things Beam has focused on was a language
>>>>>>>> portability framework. Do I get this with Flink? No. Does that mean
>>>>>>>> Beam
>>>>>>>> is better than Flink? No. Maybe a better question would be, do I
>>>>>>>> want to
>>>>>>>> be able to run Python pipelines?
>>>>>>>>
>>>>>>>> This is just an example, there are many more factors to consider.
>>>>>>>>
>>>>>>>> Cheers,
>>>>>>>> Max
>>>>>>>>
>>>>>>>> On 30.04.19 10:59, Robert Bradshaw wrote:
>>>>>>>> > Though we all certainly have our biases, I think it's fair to say
>>>>>>>> that
>>>>>>>> > all of these systems are constantly innovating, borrowing ideas
>>>>>>>> from
>>>>>>>> > one another, and have their strengths and weaknesses. I wouldn't
>>>>>>>> say
>>>>>>>> > one is, or will always be, in front of or behind another.
>>>>>>>> >
>>>>>>>> > Take, as the given example Spark Structured Streaming. Of course
>>>>>>>> the
>>>>>>>> > API itself is spark-specific, but it borrows heavily (among other
>>>>>>>> > things) on ideas that Beam itself pioneered long before Spark 2.0,
>>>>>>>> > specifically the unification of batch and streaming processing
>>>>>>>> into a
>>>>>>>> > single API, and the event-time based windowing (triggering) model
>>>>>>>> for
>>>>>>>> > consistently and correctly handling distributed, out-of-order data
>>>>>>>> > streams.
>>>>>>>> >
>>>>>>>> > Of course there are also operational differences. Spark, for
>>>>>>>> example,
>>>>>>>> > is very tied to the micro-batch style of execution whereas Flink
>>>>>>>> is
>>>>>>>> > fundamentally very continuous, and Beam delegates to the
>>>>>>>> underlying
>>>>>>>> > runner.
>>>>>>>> >
>>>>>>>> > It is certainly Beam's goal to keep overhead minimal, and one of
>>>>>>>> the
>>>>>>>> > primary selling points is the flexibility of portability (of both
>>>>>>>> the
>>>>>>>> > execution runtime and the SDK) as your needs change.
>>>>>>>> >
>>>>>>>> > - Robert
>>>>>>>> >
>>>>>>>> >
>>>>>>>> > On Tue, Apr 30, 2019 at 5:29 AM <kanth...@gmail.com> wrote:
>>>>>>>> >>
>>>>>>>> >> Ofcourse! I suspect beam will always be one or two step
>>>>>>>> backwards to the new functionality that is available or yet to come.
>>>>>>>> >>
>>>>>>>> >> For example: Spark Structured Streaming is still not available,
>>>>>>>> no CEP apis yet and much more.
>>>>>>>> >>
>>>>>>>> >> Sent from my iPhone
>>>>>>>> >>
>>>>>>>> >> On Apr 30, 2019, at 12:11 AM, Pankaj Chand <
>>>>>>>> pankajchanda...@gmail.com> wrote:
>>>>>>>> >>
>>>>>>>> >> Will Beam add any overhead or lack certain API/functions
>>>>>>>> available in Spark/Flink?
>>>>>>>>
>>>>>>>

Re: Will Beam add any overhead or lack certain API/functions available in Spark/Flink?

Reply via email to