This is an automated email from the ASF dual-hosted git repository. hangxiang pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/flink.git
The following commit(s) were added to refs/heads/master by this push: new 2ec8f8157f9 [FLINK-34119][doc] Improve description about changelog in document 2ec8f8157f9 is described below commit 2ec8f8157f95a79ee94d609657f9b08f8f0b6a26 Author: Hangxiang Yu <master...@gmail.com> AuthorDate: Sat Jan 13 14:50:36 2024 +0800 [FLINK-34119][doc] Improve description about changelog in document --- docs/content.zh/docs/deployment/config.md | 3 +-- docs/content.zh/docs/ops/state/state_backends.md | 13 +++++++------ docs/content/docs/deployment/config.md | 3 +-- docs/content/docs/ops/state/state_backends.md | 15 ++++++++------- 4 files changed, 17 insertions(+), 17 deletions(-) diff --git a/docs/content.zh/docs/deployment/config.md b/docs/content.zh/docs/deployment/config.md index 34d04f733e5..cf0740bf8de 100644 --- a/docs/content.zh/docs/deployment/config.md +++ b/docs/content.zh/docs/deployment/config.md @@ -370,8 +370,7 @@ Advanced options to tune RocksDB and RocksDB checkpoints. ### State Changelog Options Please refer to [State Backends]({{< ref "docs/ops/state/state_backends#enabling-changelog" >}}) for information on -using State Changelog. {{< hint warning >}} The feature is in experimental status. {{< /hint >}} {{< -generated/state_backend_changelog_section >}} +using State Changelog. #### FileSystem-based Changelog options diff --git a/docs/content.zh/docs/ops/state/state_backends.md b/docs/content.zh/docs/ops/state/state_backends.md index eda37dada7e..5d7d4f92b1c 100644 --- a/docs/content.zh/docs/ops/state/state_backends.md +++ b/docs/content.zh/docs/ops/state/state_backends.md @@ -349,10 +349,6 @@ Python API 中尚不支持该特性。 ## 开启 Changelog -{{< hint warning >}} 该功能处于实验状态。 {{< /hint >}} - -{{< hint warning >}} 开启 Changelog 可能会给您的应用带来性能损失。(见下文) {{< /hint >}} - <a name="introduction"></a> ### 介绍 @@ -372,16 +368,21 @@ Changelog 是一项旨在减少 checkpointing 时间的功能,因此也可以 开启 Changelog 功能之后,Flink 会不断上传状态变更并形成 changelog。创建 checkpoint 时,只有 changelog 中的相关部分需要上传。而配置的状态后端则会定期在后台进行快照,快照成功上传后,相关的changelog 将会被截断。 -基于此,异步阶段的持续时间减少(另外因为不需要将数据刷新到磁盘,同步阶段持续时间也减少了),特别是长尾延迟得到了改善。 +基于此,异步阶段的持续时间减少(另外因为不需要将数据刷新到磁盘,同步阶段持续时间也减少了),特别是长尾延迟得到了改善。同时,还可以获得一些其他好处: +1. 更稳定、更低的端到端时延。 +2. Failover 后数据重放更少。 +3. 资源利用更加稳定。 但是,资源使用会变得更高: - 将会在 DFS 上创建更多文件 -- 将可能在 DFS 上残留更多文件(这将在 FLINK-25511 和 FLINK-25512 之后的新版本中被解决) - 将使用更多的 IO 带宽用来上传状态变更 - 将使用更多 CPU 资源来序列化状态变更 - Task Managers 将会使用更多内存来缓存状态变更 +值得注意的是虽然 Changelog 增加了少量的日常 CPU 和网络带宽资源使用, +但会降低峰值的 CPU 和网络带宽使用量。 + 另一项需要考虑的事情是恢复时间。取决于 `state.backend.changelog.periodic-materialize.interval` 的设置,changelog 可能会变得冗长,因此重放会花费更多时间。即使这样,恢复时间加上 checkpoint 持续时间仍然可能低于不开启 changelog 功能的时间,从而在故障恢复的情况下也能提供更低的端到端延迟。当然,取决于上述时间的实际比例,有效恢复时间也有可能会增加。 有关更多详细信息,请参阅 [FLIP-158](https://cwiki.apache.org/confluence/display/FLINK/FLIP-158%3A+Generalized+incremental+checkpoints)。 diff --git a/docs/content/docs/deployment/config.md b/docs/content/docs/deployment/config.md index cbdc4f25a77..c4e70ba7235 100644 --- a/docs/content/docs/deployment/config.md +++ b/docs/content/docs/deployment/config.md @@ -372,8 +372,7 @@ Advanced options to tune RocksDB and RocksDB checkpoints. ### State Changelog Options Please refer to [State Backends]({{< ref "docs/ops/state/state_backends#enabling-changelog" >}}) for information on -using State Changelog. {{< hint warning >}} The feature is in experimental status. {{< /hint >}} {{< -generated/state_backend_changelog_section >}} +using State Changelog. #### FileSystem-based Changelog options diff --git a/docs/content/docs/ops/state/state_backends.md b/docs/content/docs/ops/state/state_backends.md index b645eefcd8b..bd04491977f 100644 --- a/docs/content/docs/ops/state/state_backends.md +++ b/docs/content/docs/ops/state/state_backends.md @@ -346,10 +346,6 @@ Still not supported in Python API. ## Enabling Changelog -{{< hint warning >}} This feature is in experimental status. {{< /hint >}} - -{{< hint warning >}} Enabling Changelog may have a negative performance impact on your application (see below). {{< /hint >}} - ### Introduction Changelog is a feature that aims to decrease checkpointing time and, therefore, end-to-end latency in exactly-once mode. @@ -361,7 +357,7 @@ Most commonly, checkpoint duration is affected by: and [Buffer debloating]({{< ref "docs/ops/state/checkpointing_under_backpressure#buffer-debloating" >}}) 2. Snapshot creation time (so-called synchronous phase), addressed by asynchronous snapshots (mentioned [above]({{< ref "#the-embeddedrocksdbstatebackend">}})) -4. Snapshot upload time (asynchronous phase) +3. Snapshot upload time (asynchronous phase) Upload time can be decreased by [incremental checkpoints]({{< ref "#incremental-checkpoints" >}}). However, most incremental state backends perform some form of compaction periodically, which results in re-uploading the @@ -373,16 +369,21 @@ part of this changelog needs to be uploaded. The configured state backend is sna background periodically. Upon successful upload, the changelog is truncated. As a result, asynchronous phase duration is reduced, as well as synchronous phase - because no data needs to be flushed -to disk. In particular, long-tail latency is improved. +to disk. In particular, long-tail latency is improved. At the same time, some other benefits could be got: +1. More Stable and Lower End-to-end Latency. +2. Less Data Replay after Failover. +3. More Stable Utilization of Resources. However, resource usage is higher: - more files are created on DFS -- more files can be left undeleted DFS (this will be addressed in the future versions in FLINK-25511 and FLINK-25512) - more IO bandwidth is used to upload state changes - more CPU used to serialize state changes - more memory used by Task Managers to buffer state changes +It is worth noting that changelog adds a small amount of daily CPU and network bandwidth resources, +but reduces peak CPU and network bandwidth usage. + Recovery time is another thing to consider. Depending on the `state.backend.changelog.periodic-materialize.interval` setting, the changelog can become lengthy and replaying it may take more time. However, recovery time combined with checkpoint duration will likely still be lower than in non-changelog setups, providing lower end-to-end latency even in