2018-12-01 15:47:40 UTC - Olivier Chicha: I did some deep dive into the java client implémentation of the consumer and producer and I am very concerned by the scalability. If I consume via a Pattern Consumer a list of 100 000 topics it will create 100 000 ConsumerImpl in memory (as far as I can see) and consumerImpl is far from being a small Object Same for the producers, if you want to publish to a big list of topics, you either need to have a lot of memory or to recreate the producer each time you need them (which of course makes the performances go down) Looking at the code of Kafka, I have the feeling that Kafka client is way less impacted by the number of topic in which you publish / consume. (I haven't made any test so far)
Is there something I am missing here ? ---- 2018-12-01 16:18:25 UTC - Matteo Merli: @Olivier Chicha First, I seriously doubt you can have a Kafka consumer consuming from 100K topics. Kafka people just announced with great enphasis that they can now reach 200K partions on a whole Kafka cluster, with a max of 4K partitions per broker. On Pulsar, we have production deployments with 2.8M topics per cluster, with ~100K per broker. There are several users that are precisely doing the 100s K producers/consumers from a single client instance, with no problems. To your question, yes, you’re right that Consumer is not a light-weigth object. It was not designed to be that way but rather to be something that opens a “consuming session” with a broker and allows for high-throughput delivery with minimal overhead when established. Each Consumer instance will take ~1530 bytes of heap memory (considering all retained object) (that would be ~146 MB for 100K consumers), plus direct memory for the pre-fetched messages. I don’t think 150 MB of mem is a big problem nowadays. Certainly, a different approach would lead to a smaller memory usage, though that would come with different tradeoffs (eg: reduced throughput, or more wire-protocol overhead). I think the most important thing to configure when consuming from many topics is the receiver queue size. That’s where the bulk of the memory will be used, so it would be recommended to tune it down to 10 or lower (from the 1000 default), so that you control the max amount of direct memory that can be used. ---- 2018-12-01 21:43:20 UTC - Olivier Chicha: thanks for your very precise answer I am modifying the data layer of our product to use event bus. right now I support both Kafka and Pulsar, but I need to decide which we going to use and what is the best way to use it. to make it simple, In our product we have about 100 tables and usually a deployment can cover up to 1000 domains (enterprises) some of the services are more table centric, while some other are more domain centric => my initial design was to create 2 type of topics : table topics and domain topics => each event is sent twice : once in the table topic and once in the domain topic the issue with this is - I send each event twice - a service receive more events than he really needs => I was thinking of creating a topic for each pair (table, domain) -> potentially 100 000 topics But I don't like the idea of having potentially a lot of consumerImpl or producerImpl in my services (we are trying to migrate our architecture to a kind of microservice architecture => potentially a lot of services => service size matters) => another idea is to use pulsar function to generate topics adapted to the specific needs of the services - I would just send the event once with a limited number of producer (100) - the service would receive only the events it needs with a limited number of consumers - the number of topic would probably be around 10000 - the risk is that I may have to store an event several times and I don't know the impact on performances of dispatching event into multiple topic within the broker I hope that what I am saying makes sense. Would you have any recommendation on which of the 3 solutions I should choose (or another) ? regards ---- 2018-12-02 03:18:41 UTC - Masakazu Kitajo: @Sijie Guo @Matteo Merli I don’t see the event on our event page. I guess we have two calendars? ---- 2018-12-02 03:36:17 UTC - Masakazu Kitajo: Seems like we have two. One in @Sijie Guo’s recent commit is not the one we used to use on the event page. ---- 2018-12-02 03:37:46 UTC - Matteo Merli: @Masakazu Kitajo We checked yesterday and the one on webpage was the wrong address for some reason. Once the website gets built, the calendar should be fixed. ---- 2018-12-02 03:37:55 UTC - Matteo Merli: You should also have access now ---- 2018-12-02 03:39:19 UTC - Masakazu Kitajo: @Matteo Merli You mean this fix? <https://github.com/apache/pulsar/pull/3099> ---- 2018-12-02 03:40:22 UTC - Matteo Merli: Yes, we verified it was the correct link ---- 2018-12-02 03:41:18 UTC - Masakazu Kitajo: This uses another calendar, and I guess it doesn’t work because the ampersands are escaped. ---- 2018-12-02 03:43:45 UTC - Matteo Merli: The unescaped address would be : <https://calendar.google.com/calendar/embed?src=apache.pulsar.slack%40gmail.com&ctz=America%2FLos_Angeles> ---- 2018-12-02 03:45:36 UTC - Masakazu Kitajo: Yeah, but the one in the json file is escaped. I don’t think it doesn’t need to be escaped. ---- 2018-12-02 03:46:02 UTC - Masakazu Kitajo: I think it doesn’t need to be escaped. ---- 2018-12-02 03:49:55 UTC - Masakazu Kitajo: The original URL (longer one) is the one I created, and the url worked on the previous version of our site. ---- 2018-12-02 03:50:38 UTC - Masakazu Kitajo: But after we moved to our new site (/site2), the calendar isn’t showed because of ampersands. ---- 2018-12-02 03:51:24 UTC - Masakazu Kitajo: The escaped URL I posted here works, and the calendar have a event we held in Japan. ---- 2018-12-02 03:53:49 UTC - Matteo Merli: let’s see. I just kicked the website build <https://builds.apache.org/job/pulsar-website-build/482/> It was broken for an unrelated issue, it should update the website now ---- 2018-12-02 03:54:49 UTC - Masakazu Kitajo: OK, so we are going to drop the history and move to the new calendar, right? ---- 2018-12-02 03:55:22 UTC - Matteo Merli: at least we should have one that works and multiple people can update :slightly_smiling_face: ---- 2018-12-02 03:56:45 UTC - Masakazu Kitajo: Then I’ll remove the old one. ----
