Hi,

i need to limit the rate of processing in a Flink stream application. 
Specifically, the number of items processed in a .map() operation has to stay 
under a certain maximum per second.

At the moment, I have another .map() operation before the actual processing, 
which just sleeps for a certain time (e.g., 250ms for a limit of 4 requests / 
sec) and returns the item unchanged:

…

public T map(final T value) throws Exception {
        Thread.sleep(delay);
        return value;
}

…

This works as expected, but is a rather crude approach. Checkpointing the job 
takes a very long time: minutes for a state of a few kB, which for other jobs 
is done in a few milliseconds. I assume that letting the whole thread sleep for 
most of the time interferes with the checkpointing - not good!

Would using a different synchronization mechanism (e.g., 
https://google.github.io/guava/releases/19.0/api/docs/index.html?com/google/common/util/concurrent/RateLimiter.html)
 help to make checkpointing work better?

Or, preferably, is there a mechanism inside Flink that I can use to accomplish 
the desired rate limiting? I haven’t found anything in the docs.

Cheers,
Florian

Reply via email to