Hi, i need to limit the rate of processing in a Flink stream application. Specifically, the number of items processed in a .map() operation has to stay under a certain maximum per second.
At the moment, I have another .map() operation before the actual processing, which just sleeps for a certain time (e.g., 250ms for a limit of 4 requests / sec) and returns the item unchanged: … public T map(final T value) throws Exception { Thread.sleep(delay); return value; } … This works as expected, but is a rather crude approach. Checkpointing the job takes a very long time: minutes for a state of a few kB, which for other jobs is done in a few milliseconds. I assume that letting the whole thread sleep for most of the time interferes with the checkpointing - not good! Would using a different synchronization mechanism (e.g., https://google.github.io/guava/releases/19.0/api/docs/index.html?com/google/common/util/concurrent/RateLimiter.html) help to make checkpointing work better? Or, preferably, is there a mechanism inside Flink that I can use to accomplish the desired rate limiting? I haven’t found anything in the docs. Cheers, Florian