This sounds like it could be FLINK-17359 [1]. What version of Flink are you using?
Another likely explanation arises from the fact that only the checkpoint data files (the ones created and written by the task managers) will have the _entropy_ replaced. The job manager does not inject entropy into the path of the checkpoint metadata, so that it remains at a predictable URI. Since Flink only writes keyed state larger than state.storage.fs.memory-threshold into the checkpoint data files, and only those files have entropy injected into their paths, if all of your state is small it will all end up in the metadata file and you don't see any entropy injection happening. See the comments on [2] for more on this. FWIW, I would urge you to use presto instead of hadoop for checkpointing on S3. The performance of the hadoop "filesystem" is problematic when it's used for checkpointing. Regards,, David [1] https://issues.apache.org/jira/browse/FLINK-17359 [2] https://issues.apache.org/jira/browse/FLINK-24878 On Wed, May 18, 2022 at 7:48 PM Aeden Jameson <aeden.jame...@gmail.com> wrote: > I have checkpoints setup against s3 using the hadoop plugin. (I'll > migrate to presto at some point) I've setup entropy injection per the > documentation with > > state.checkpoints.dir: s3://my-bucket/_entropy_/my-job/checkpoints > s3.entropy.key: _entropy_ > > I'm seeing some behavior that I don't quite understand. > > 1. The folder s3://my-bucket/_entropy_/my-job/checkpoints/... > literally exists. Meaning that "_entropy_" has not been replaced. At > the same time there are also a bunch of folders where "_entropy_" has > been replaced. Is that to be expected? If so, would someone elaborate > on why this is happening? > > 2. Should the paths in the checkpoints history tab in the FlinkUI > display the path the key? With the current setup it is not. > > Thanks, > Aeden > > GitHub: https://github.com/aedenj > Linked In: http://www.linkedin.com/in/aedenjameson >