Add "Release 1.4 and 1.5 Timeline" post
Project: http://git-wip-us.apache.org/repos/asf/flink-web/repo Commit: http://git-wip-us.apache.org/repos/asf/flink-web/commit/b0d7c034 Tree: http://git-wip-us.apache.org/repos/asf/flink-web/tree/b0d7c034 Diff: http://git-wip-us.apache.org/repos/asf/flink-web/diff/b0d7c034 Branch: refs/heads/asf-site Commit: b0d7c0340af642f9c979bfcedd00ff0f030a4651 Parents: e014fa9 Author: Aljoscha Krettek <aljoscha.kret...@gmail.com> Authored: Mon Nov 20 15:24:07 2017 +0100 Committer: Aljoscha Krettek <aljoscha.kret...@gmail.com> Committed: Wed Nov 22 10:51:46 2017 +0100 ---------------------------------------------------------------------- .../2017-11-21-release-1.4-and-1.5-timeline.md | 137 +++++++++++++++++++ 1 file changed, 137 insertions(+) ---------------------------------------------------------------------- http://git-wip-us.apache.org/repos/asf/flink-web/blob/b0d7c034/_posts/2017-11-21-release-1.4-and-1.5-timeline.md ---------------------------------------------------------------------- diff --git a/_posts/2017-11-21-release-1.4-and-1.5-timeline.md b/_posts/2017-11-21-release-1.4-and-1.5-timeline.md new file mode 100644 index 0000000..776cc07 --- /dev/null +++ b/_posts/2017-11-21-release-1.4-and-1.5-timeline.md @@ -0,0 +1,137 @@ +--- +layout: post +title: "Looking Ahead to Apache Flink 1.4.0 and 1.5.0" +date: 2017-11-22 10:00:00 +categories: news +authors: +- stephan: + name: "Stephan Ewen" + twitter: "StephanEwen" +- aljoscha: + name: "Aljoscha Krettek" + twitter: "aljoscha" +- mike: + name: "Mike Winters" + twitter: "wints" +--- + +The Apache Flink 1.4.0 release is on track to happen in the next couple of weeks, and for all of the +readers out there who havenât been following the release discussion on [Flinkâs developer mailing +list](http://flink.apache.org/community.html#mailing-lists), weâd like to provide some details on +whatâs coming in Flink 1.4.0 as well as a preview of what the Flink community will save for 1.5.0. + +Both releases include ambitious features that we believe will move Flink to an entirely new level in +terms of the types of problems it can solve and applications it can support. The community deserves +lots of credit for its hard work over the past few months, and weâre excited to see these features +in the hands of users. + +This post will describe how the community plans to get there and the rationale behind the approach. + +## Coming soon: Major Changes to the Flinkâs Runtime + +There are 3 significant improvements to the Apache Flink engine that the community has nearly +completed and that will have a meaningful impact on Flinkâs operability and performance. + +1. Rework of the deployment model and distributed process +2. Transition from configurable, fixed-interval network I/O to event-driven network I/O and application-level flow control for better backpressure handling +3. Faster recovery from failure + +Next, weâll go through each of these improvements in more detail. + +## Reworking Flinkâs Deployment Model and Distributed Process + +[FLIP-6](https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=65147077) is an initiative +thatâs been in the works for more than a year and represents a major refactor of Flinkâs deployment +model and distributed process. The underlying motivation for FLIP-6 was the fact that Flink is being +adopted by a wider range of developer communities--both developers coming from the big data and +analytics space as well as developers coming from the event-driven applications space. + +Modern, stateful stream processing has served as a convergence for these two developer communities. +Despite a significant overlap of the core concepts in the applications being built, each group of +developers has its own set of common tools, deployment models, and expected behaviors when working +with a stream processing framework like Flink. + +FLIP-6 will ensure that Flink fits naturally in both of these contexts, behaving as though itâs +native to each ecosystem and operating seamlessly within a broader technology stack. A few of the +specific changes in FLIP-6 that will have such an impact: + + - Leveraging cluster management frameworks to support full resource elasticity + - First-class support for containerized environments such as Kubernetes and Docker + - REST-based client-cluster communication to ease operations and 3rd party integrations + +FLIP-6, along with already-introduced features like +[rescalable state](https://data-artisans.com/blog/apache-flink-at-mediamath-rescaling-stateful-applications), +lays the groundwork for dynamic scaling in Flink, meaning that Flink programs will be able to scale up or down +automatically based on required resources--a huge step forward in terms of ease of operability and +the efficiency of Flink applications. + +## Lower Latency via Improvements to the Apache Flink Network Stack + +Speed will always be a key consideration for users who build stream processing applications, and +Flink 1.5 will include a rework of the network stack that will even further improve Flink's latency. +At the heart of this work is a transition from configurable, fixed-interval network I/O to event- +driven network I/O and application-level flow control, ensuring that Flink will use all available +network capacity, as well as credit-based flow control which offers more fine-grained backpressuring +for improved checkpoint alignments. + +In our testing ([see slide 26 here](https://www.slideshare.net/FlinkForward/flink-forward-berlin-2017-nico-kruber-building-a-network-stack-for-optimal-throughput-lowlatency-tradeoffs)), +weâve seen a substantial improvement in latency using event-driven network I/O, and the community +is also doing work to make sure weâre able to provide this increase in speed without a measurable +throughput tradeoff. + +## Faster Recovery from Failures + +Flink 1.3.0 introduced incremental checkpoints, making it possible to take a checkpoint of state +updates since the last successfully-completed checkpoint only rather than the previous behavior of +only taking checkpoints of the entire state of the application. This has led to significant +performance improvements for users with large state. + +Flink 1.5 will introduce task-local recovery, which means that Flink will store a second copy of the +most recent checkpoint on the local disk (or even in main memory) of a task manager. The primary +copy still goes to durable storage so that itâs resilient to machine failures. + +In case of failover, the scheduler will try to reschedule tasks to their previous task manager (in +other words, to the same machine again) if this is possible. The task can then recover from the +locally-kept state. This makes it possible to avoid reading all state from the distributed file +system (which is remote over the network). Especially in applications with very large state, not +having to read many gigabytes over the network and instead from local disk will result in +significant performance gains in recovery. + +## The Proposed Timeline for Flink 1.4 and Flink 1.5 + +The good news is that all 3 of the features described above are well underway, and in fact, much of +the work is already covered by open pull requests. + +But given these featuresâ importance and the complexity of the work involved, the community expected +that the QA and testing required would be extensive and would delay the release of the otherwise- +ready features also on the list for the next release. + +And so the community decided to withhold the 3 features above (deployment model rework, improvements +to the network stack, and faster recovery) to be included a separate Flink 1.5 release that will +come shortly after the Flink 1.4 release. Flink 1.5 is estimated to come just a couple of months +after 1.4 rather than the typical 4-month cycle in between major releases. + +The soon-to-be-released Flink 1.4 represents the current state of Flink without merging those 3 +features. And Flink 1.4 is a substantial release in its own right, including, but not limited to, +the following: + +- **A significantly improved dependency structure**, removing many of Flinkâs dependencies and subtle runtime conflicts. This increases overall stability and removes friction when embedding Flink or calling Flink "library style". +- **Reversed class loading for dynamically-loaded user code**, allowing for different dependencies than those included in the core framework. +- **An Apache Kafka 0.11 exactly-once producer**, making it possible to build end-to-end exactly once applications with Flink and Kafka. +- **Streaming SQL JOIN based on processing time and event time**, which gives users the full advantage of Flinkâs time handling while using a SQL JOIN. +- **Table API / Streaming SQL Source and Sink Additions**, including a Kafka 0.11 source and JDBC sink. +- **Hadoop-free Flink**, meaning that users who donât rely on any Hadoop components (such as YARN or HDFS) in their Flink applications can use Flink without Hadoop for the first time. +- **Improvements to queryable state**, including a more container-friendly architecture, a more user-friendly API that hides configuration parameters, and the groundwork to be able to expose window state (the state of an in-flight window) in the future. +- **Connector improvements and fixes** for a range of connectors including Kafka, Apache Cassandra, Amazon Kinesis, and more. +- **Improved RPC performance** for faster recovery from failure + +The community decided it was best to get these features into a stable version of Flink as soon as +possible, and the separation of what could have been a single (and very substantial) Flink 1.4 +release into 1.4 and 1.5 serves that purpose. + +Weâre excited by what each of these represents for Apache Flink, and weâd like to extend our thanks +to the Flink community for all of their hard work. + +If youâd like to follow along with release discussions, [please subscribe to the dev@ mailing +list](http://flink.apache.org/community.html#mailing-lists). +