Hi! Thanks for the quick reply!
I actually need a shared subscription, so I can have multiple instances of consumer consuming the same topic. I think I didn't explain my issue well, I'll try to explain it again, the flow is like this: 1. Producer - publish events from anywhere in the system (consumer can publish events, producers can publish directly to pulsar or pulsar-proxy) to a topic/topics (this is the question). 2. Service (multiple consumers that scale-out/scale-down) - created a shared subscription that needs to listen to multiple tenants of events (the list of tenants can change dynamically) OR to all events. Now, I am not sure how to implement the event routing and I don't want to have traffic waste, let me elaborate on that. Given that all producers publish all events at 30mb/s I don't want a service that listens to two tenants (let's say 10% of traffic) will consume 30mb/s and filter on the client-side. Looks like my solution will come to a function that will do a routing, so the implementation will be something like this: 1. Producer - publish all events to a topic named "events" 2. Pulsar function - will process all those events and will route to them to service topics 3. Service - will create shared subscription to its topic Producer -> topic "events" -> Pulsar functions routes to "service-a" events -> Service A will listen to "service-a" topic. Is that something that makes sense? If so, about a function runtimes - "thread" - is running inside the pulsar broker OR it runs inside in another process dedicated for functions (different pod in k8s deployment) On Mon, 9 Mar 2020 at 9:02 Sijie Guo <[email protected]> wrote: > Thank you, Yosi! The mailing list is a great place to ask questions since > the emails are indexed and searchable. > > If most of the time, a consumer only listens to a "tenant" topic, you can > use a master topic and a key_shared subscription to distribute your list of > tenants. So each of the consumers of the master topic will be receiving a > subset of the tenants. Then it can listen to those "tenant" topics to > subscribe. So you don't need to all consumers to subscribe to all topics. > > Other comments inline. > > > On Sun, Mar 8, 2020 at 5:02 AM Yosi Attias <[email protected]> wrote: > >> Hi! >> >> *I posted this to google groups and then the message somehow disappeared, >> I will send it again here. Sorry for the duplication.* >> >> I am checking out pulsar for using it as our events bus, and it's awesome! >> >> Our services (written in nodejs) requirements that they need to listen to >> multiple tenants (or all tenants - we have 10k tenants, and it's growing) >> and the list of tenants can change dynamically at runtime (changes are not >> that frequent, we can have 200/300 changes max at a day). >> Pulsar sounds like an excellent fit for this because I can create topic >> per tenant, like "tenant:XX:events" (XX = tenant id) and use shared >> subscription for consumer groups. >> >> As I said, the list of tenants needed to be subscribed all consumers in a >> group gets a message (it's broadcasted via Redis pub/sub). >> >> I am not sure what is the best solution to implement this, I see I have >> two options: >> >> - Client-side: consumer receives a tenant he needs listening to, and >> he adds the topic to the shard subscription - sounds a like a right >> solution, but: >> >> >> - Since all consumers will add the same topic at the same time - is >> there any issues with this? Or I need to make sure it happens once, so >> only >> one consumer mutates the shared subscription? >> >> It sounds like you need to use an exclusive subscription for this case. > > >> >> - There are consumers (small fraction, but important ones) that needs >> to listen to all events - this makes the subscription consume all >> topics - >> is it makes sense in terms of performance? Attaching subscription to >> 10k+ >> topics? >> >> > It is okay to subscribe to a 10k+ topic. However, you need to pay > attention to allocating memory for your client. > > But I would recommend thinking of architecting your service in a different > way to avoid this if possible. > > >> >> - Functions: I thought about creating a function that will have a >> list of application subscriptions (not pulsar subscription) and will >> listen >> to the main topic called "events" (or to all tenant topics? not sure how >> to >> implement this with function) and will route the events based on >> subscriptions to service topic. For example, service named "users" will >> have "users-service" topic and the function will route all events to >> "users-service" topic. This sounds like a good solution as well, but: >> - I am not sure where functions are running, if they are running >> as a separate container we will have massive traffic waste - I see >> there is >> threaded option to run the function - is the function runs inside >> pulsar? >> So I don't have traffic waste? >> >> > Function have different runtimes - thread, process, and Kubernetes. It is > pretty flexible. > > > So I don't have traffic waste? > > I am not sure what does "traffic waste" means. If you are referring > messages that will be read and write multiple times, that's true. If your > "service" topics (like users-service) will be used by different > subscriptions, I would recommend going with function approaches. > > > > >> >> - Is this overkill for functions? >> - Storing of application subscriptions - I can save them inside >> our database, and I see I can store them inside pulsar state tables - >> what >> is most preferred here? >> - Once I want to listen to more topic - Should I notify the >> function somehow to reload the list of subscriptions (since I will >> cache >> it) OR I need to implement some refresh timer? >> >> >> Hopefully, this makes sense! If you have any questions and want me to >> elaborate, please let me know! >> >> If you want me to ask in other places (like Slack) or somewhere else, let >> me know and I will ask their instead. >> >
