Hey Daniel, I think you're right that the docs are misleading in this case – anything that extends SourceFunction will always execute at parallelism 1, set parallelism is ignored. Explicitly setting parallelism in the example in the docs is unnecessary and confusing. I personally have only used this with exactly-once semantics (thus always configured parallelism 1), so not sure what performance limitations there may be with high volume streams. It might be worth a try to build your own on RichParallelSourceFunction if you're up for it – I'm sure other people would be interested in the response.
Best, Austin On Fri, Feb 18, 2022 at 5:09 PM Daniel Hristov <dhris...@dragos.com> wrote: > Hello, > > > > I’ve noticed that > https://nightlies.apache.org/flink/flink-docs-release-1.14/docs/connectors/datastream/rabbitmq/ > suggests that the RabbitMQ Source can be used with parallelization bigger > than 1, but one can’t get exactly-once delivery guarantee there. However, > the inheritance chain seems to show RMQSource -> > MultipleIdsMessageAcknowledgingSourceBase -> MessageAcknowledgingSourceBase > -> RichSourceFunction (rather than RichParallelSourceFunction) > > I imagine either the documentation or the implementation has to be > corrected here. Any thoughts on what imposes that parallelism on 1 here? > The only place I found it to be checked up the hierarchy seems to be > MessageAcknowledgingSourceBase::initializeState > > > > Have you found this to impose a limitation on the performance of pulling > messages from Rabbit (assuming that heavier enrichments down the chain are > properly parallelized)? > > > > Best, Daniel >