Hi Stephan,

Thanks for summarizing the work&discussions into a roadmap. It really helps
users to understand where Flink will forward to. The entire outline looks
good to me. If appropriate, I would recommend to add another two attracting
categories in the roadmap.

*Flink ML Enhancement*
  - Refactor ML pipeline on TableAPI
  - Python support for TableAPI
  - Support streaming training & inference.
  - Seamless integration of DL engines (Tensorflow, PyTorch etc)
  - ML platform with a group of AI tooling
Some of these work have already been discussed in the dev mail list.
Related JIRA (FLINK-11095) and discussion:
http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Embracing-Table-API-in-Flink-ML-td25368.html
;
http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Python-and-Non-JVM-Language-Support-in-Flink-td25905.html


*Flink-Runtime-Web Improvement*
  - Much of this comes via Blink
  - Refactor the entire module to use latest Angular (7.x)
  - Add resource information at three levels including Cluster, TaskManager
and Job
  - Add operator level topology and and data flow tracing
  - Add new metrics to track the back pressure, filter and data skew
  - Add log association to Job, Vertex and SubTasks
Related JIRA (FLINK-10705) and discussion:
http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Change-underlying-Frontend-Architecture-for-Flink-Web-Dashboard-td24902.html


What do you think?

Regards,
Shaoxuan



On Wed, Feb 13, 2019 at 7:21 PM Stephan Ewen <se...@apache.org> wrote:

> Hi all!
>
> Recently several contributors, committers, and users asked about making it
> more visible in which way the project is currently going.
>
> Users and developers can track the direction by following the discussion
> threads and JIRA, but due to the mass of discussions and open issues, it is
> very hard to get a good overall picture.
> Especially for new users and contributors, is is very hard to get a quick
> overview of the project direction.
>
> To fix this, I suggest to add a brief roadmap summary to the homepage. It
> is a bit of a commitment to keep that roadmap up to date, but I think the
> benefit for users justifies that.
> The Apache Beam project has added such a roadmap [1]
> <https://beam.apache.org/roadmap/>, which was received very well by the
> community, I would suggest to follow a similar structure here.
>
> If the community is in favor of this, I would volunteer to write a first
> version of such a roadmap. The points I would include are below.
>
> Best,
> Stephan
>
> [1] https://beam.apache.org/roadmap/
>
> ========================================================
>
> Disclaimer: Apache Flink is not governed or steered by any one single
> entity, but by its community and Project Management Committee (PMC). This
> is not a authoritative roadmap in the sense of a plan with a specific
> timeline. Instead, we share our vision for the future and major initiatives
> that are receiving attention and give users and contributors an
> understanding what they can look forward to.
>
> *Future Role of Table API and DataStream API*
>   - Table API becomes first class citizen
>   - Table API becomes primary API for analytics use cases
>       * Declarative, automatic optimizations
>       * No manual control over state and timers
>   - DataStream API becomes primary API for applications and data pipeline
> use cases
>       * Physical, user controls data types, no magic or optimizer
>       * Explicit control over state and time
>
> *Batch Streaming Unification*
>   - Table API unification (environments) (FLIP-32)
>   - New unified source interface (FLIP-27)
>   - Runtime operator unification & code reuse between DataStream / Table
>   - Extending Table API to make it convenient API for all analytical use
> cases (easier mix in of UDFs)
>   - Same join operators on bounded/unbounded Table API and DataStream API
>
> *Faster Batch (Bounded Streams)*
>   - Much of this comes via Blink contribution/merging
>   - Fine-grained Fault Tolerance on bounded data (Table API)
>   - Batch Scheduling on bounded data (Table API)
>   - External Shuffle Services Support on bounded streams
>   - Caching of intermediate results on bounded data (Table API)
>   - Extending DataStream API to explicitly model bounded streams (API
> breaking)
>   - Add fine fault tolerance, scheduling, caching also to DataStream API
>
> *Streaming State Evolution*
>   - Let all built-in serializers support stable evolution
>   - First class support for other evolvable formats (Protobuf, Thrift)
>   - Savepoint input/output format to modify / adjust savepoints
>
> *Simpler Event Time Handling*
>   - Event Time Alignment in Sources
>   - Simpler out-of-the box support in sources
>
> *Checkpointing*
>   - Consistency of Side Effects: suspend / end with savepoint (FLIP-34)
>   - Failed checkpoints explicitly aborted on TaskManagers (not only on
> coordinator)
>
> *Automatic scaling (adjusting parallelism)*
>   - Reactive scaling
>   - Active scaling policies
>
> *Kubernetes Integration*
>   - Active Kubernetes Integration (Flink actively manages containers)
>
> *SQL Ecosystem*
>   - Extended Metadata Stores / Catalog / Schema Registries support
>   - DDL support
>   - Integration with Hive Ecosystem
>
> *Simpler Handling of Dependencies*
>   - Scala in the APIs, but not in the core (hide in separate class loader)
>   - Hadoop-free by default
>
>

Reply via email to