Hi Rajan,

Thanks for sharing the link! I will follow the PR.


Best Regards,

Nikolai


On 2023/12/02 01:21:04 Rajan Dhabalia wrote:
> Hi Nikolai,
>
> I completely understand your concern and pain. But let's understand
> ordering semantics for exclusive/fail-over subscription. This subscription
> makes sure to dispatch new messages which the broker is reading using
> readPosition must be in order but that doesn't mean the broker will never
> handle redelivery of unack message and broker will not stop dispatching by
> considering the end of the world. However, key-Shared sub thinks it's the
> end of the world. key-shared sub extends the shared sub behavior with
> additional speciality of consumer dispatch affinity based on message key.
> Broker tries its best to perform this responsibility but it's still not
> guaranteed as it will be really costly to give such guarantee and if one
> requires it then we can use a failover sub. In shared or even in other
> subscriptions, brokers don't restrict or stop dispatching of redelivery
> messages.
> However, in key-shared sub, broker stops forever dispatching new and
> redelivery messages in certain conditions and additionally broker goes into
> an infinite iterative sequence of reading duplicate cold messages and
> discarding them which causes IO and CPU impact both on broker and bookie.
> Pulsar must not have such broken and incorrect semantic behavior in the
> system. I have created an issue:
> https://github.com/apache/pulsar/issues/21656 which has a simple unit test
> case to reproduce this behavior and that's what is happening in multiple
> production Pulsar systems and users are not happy with it.
>
> Therefore, we should not stop dispatching and restricting redeliver
> messages which will automatically address almost all stuck issues and the
> broker can still follow the ordering guarantee for new messages which it
> promises for other subs as well.
>
> Thanks,
> Rajan
>
> On Fri, Dec 1, 2023 at 12:03 AM Nikolai <ta...@gmail.com> wrote:
>
> > Hi Rajan,
> >
> > Thanks for the review!
> >
> > It looks like the root cause of "stuck" keyword is the nature of
> > Key_Shared subscription which must guarantee messages ordering. I
> > understand it is really hard to achieve such guarantees without having
> > separate physical partitions but this feature is very essential for many
> > use cases. I also understand that the solution I've provided is more a
> > hack than a solid solution. I was trying to touch as little code as I
> > can and, unfortunately, I see no other options to avoid stuck dispatches
> > in the current implementation approach.
> >
> > I also want to highlight that the current implementation leads entire
> > system to hang due to a single message corruption in case consumers
> > disconnects. This breaks one of a big Pulsar advantage: independent keys
> > processing.
> >
> > Do you have plans for Key_Shared dispatcher rework? I would be happy to
> > contribute to such a project if I can help somehow.
> >
> >
> > Best Regards,
> >
> > Nikolai
> >
> >
> >
> > On 2023/12/01 06:34:59 Rajan Dhabalia wrote:
> > > Hi Nikolai,
> > >
> > > Thanks for bringing this to the attention. I have seen multiple PIP
> > and bug
> > > fixes for the Key-Shared subscription and I feel Key_Shared subscription
> > > dispatcher is developed with multiple fundamental issues which
> > required to
> > > introduce scheduled-task to unblock stuck reads, introducing stuck
> > delivery
> > > flag, and now we are seeing at least 2-3 PIP to add few more hacks to
> > > control such stuck dispatch. After reviewing code, it seems
> > >
> >
> > PersistentStickyKeyDispatcherMultipleConsumers::getRestrictedMaxEntriesForConsumer(..)
> > > function can still return 0 dispatch messages with multiple different
> > edge
> > > cases and fundamental dispatch is not clean at all. This is the only
> > > dispatcher introduced after open sourcing Pulsar and unfortunately it
> > was
> > > not developed with a cleaner approach. I understand the pain of such
> > stuck
> > > dispatch and even the current release is having a bugs where dispatch is > > > getting stuck and many production systems are struggling with it and you
> > > must be suggesting this PIP to get work around for this stuck issue.
> > But I
> > > don't think this is the right approach to follow.
> > > I think the correct approach is to get rid of this "stuck" keyword
> > from the
> > > dispatcher because having "stuck" concept in dispatcher means it has a
> > > known flow and bug which we are not solving but we are adding more
> > debt to
> > > perform the work around. We never had any such concept in other
> > dispatchers
> > > which was introduced when Pulsar was introduced originally and it just
> > > works with clear semantics and permit flow mechanism , and we should
> > follow
> > > the same practice for this subscription.
> > >
> > > Thanks,
> > > Rajan
> > >
> > > On Thu, Nov 30, 2023 at 9:18 AM Nikolai <ta...@gmail.com> wrote:
> > >
> > > > Hello!
> > > >
> > > > I submitted a new PIP to add a configuration option which allows to
> > skip
> > > > blocking recently joined consumers for Key_Shared subscriptions
> > > >
> > > > It introduces additional memory consumption, so it will be disabled by
> > > > default. It would fix issues like
> > > > https://github.com/apache/pulsar/issues/21199
> > > >
> > > > Link to the PIP: https://github.com/apache/pulsar/pull/21615
> > > >
> > > > PR with the PIP implementation:
> > > > https://github.com/apache/pulsar/pull/21579
> > > >
> > > >
> > > > Best Regards,
> > > >
> > > > Nikolai
> > > >
> > > >
> > >
> >
>

Reply via email to