2020-04-05 12:17:55 UTC - Franck Schmidlin: I want to deploy on aws as well, but remain elastic. Can the bookies be deployed separately from the brokers? As far as I can tell, the bookies are the only components that require long term storage, so could be on ec2 with ebs. Also, long term retention is not a requirement, so could I have a small number of bookkeepers for a larger number of brokers? This way I could use fargate for additional brokers if/when required? Has anyone drawn a deployment diagram anywhere? ---- 2020-04-05 12:27:07 UTC - yujun: @yujun has joined the channel ---- 2020-04-05 16:34:44 UTC - Jesus Ramirez: @Jesus Ramirez has joined the channel ---- 2020-04-05 16:35:12 UTC - Jesus Ramirez: Hi guys! ---- 2020-04-05 16:36:35 UTC - Jesus Ramirez: I've been checking on different solutions like Kafka, pulsar and kubemq. ---- 2020-04-05 16:38:04 UTC - Jesus Ramirez: I've noticed that kafka has a limitation in consumers, you need to have more or equal partitions than consumers, does pulsar have the same limitation? ---- 2020-04-05 16:39:03 UTC - Jesus Ramirez: I've been checking the documentaion and i don't find anything about this ---- 2020-04-05 16:40:06 UTC - Chris Bartholomew: You can connect many consumers to a single topic partition. ---- 2020-04-05 16:43:06 UTC - Jesus Ramirez: thank you! ---- 2020-04-05 16:58:31 UTC - Matteo Merli: There's a limit on the size of the range-set that is stored. By default it would store up to 50K disjoint ranges. After that the delivery to consumer is also stalled, until the "holes" in the ack sequence are filled. ---- 2020-04-05 17:11:47 UTC - Shangpeng Sun: Cool! Another question related to this, are the cursor ledger updated per ack, or periodically with batches? I suppose it’s the latter because the broker needs to accumulate some acks to calculate the range-sets. However will this cause consistency problem? If the broker crashes before updating the cursor ledger, the recent acks will be lost ---- 2020-04-05 17:18:50 UTC - David Kjerrumgaard: @Franck Schmidlin I am not that familiar with AWS Fargate, but based on what I have seen/read it looks like it would be possible to setup a Pulsar cluster to run on that service. However, it looks like there would be a lot of steps required to setup each of these services, i.e. ZK, BookKeeper, and the Brokers. Then you would also need to initialize the cluster metadata, etc. IMHO it would be easier to just setup the Pulsar cluster on EKS using the Helm chart included with the Pulsar distribution which performs all these steps for you. Also, I am not sure how Fargate handles pod failures, but with EKS you can define StatefulSets which insure that a minimum number of pods of a given type are running at all times. +1 : Franck Schmidlin ---- 2020-04-05 17:37:16 UTC - Franck Schmidlin: Which bits of pulsar are elastic? Can i have a fixed set of zk an bk instances and varying numbers of brokers to meet demand? Or is that silly and there is a fixed n to 1 ratio between brokers and bk? Thx ---- 2020-04-05 17:39:24 UTC - Matteo Merli: There are 2 ways the acks are batched: 1. The client library by default groups them by 100 millis (can be turned to 0) 2. The broker only persist on ledger every 1sec by default In case of failures, there will be a limited amount of duplicates. Turning the delays to 0 will reduce the amount of dups, though it will never be guaranteed to have no dups (with Consumer API) ---- 2020-04-05 17:42:12 UTC - Shangpeng Sun: Nice, this makes a lot of sense, thanks for the help! ---- 2020-04-05 17:42:27 UTC - Matteo Merli: ZK is fixed (and having a big ZK cluster doesn't increase the write througput).
Broker and bookies are elastic and they can be independently scaled. There's no pre-fixed ratio across the 2. As a high-level general rule: • Increase broker to increase serving capacity (generally limited by CPU/network bandwidth) • Increase bookies to increase disk IO and storage capacity ---- 2020-04-05 17:42:44 UTC - Matteo Merli: You're welcome ---- 2020-04-05 17:44:50 UTC - Franck Schmidlin: Thx. So containerised brokers on something like fargate, scaling from 0 to x based on load is not an entirely stupid idea? ---- 2020-04-05 17:47:06 UTC - Matteo Merli: No +1 : Franck Schmidlin ---- 2020-04-05 17:48:05 UTC - Matteo Merli: For bookies, it's a bit more complex: • It's easy to scale up • Scale down has to be done 1 by 1, letting data to get re-replicated first +1 : Franck Schmidlin, Pierre Zemb ---- 2020-04-05 17:50:17 UTC - Franck Schmidlin: But could scale up to deal with seasonal spike and slowly scale down as message expire (past retention). Which would work for me as well, i think. Thx ---- 2020-04-05 19:19:16 UTC - Vladimir Shchur: @Matteo Merli can you please add few words about service discovery for such elastic solution? I've failed GCP pulsar k8s try because of the situation where brokers failed to discover bookies after bookies' ip changed, how it is supposed to be handled? ---- 2020-04-05 19:56:25 UTC - Matteo Merli: The bookies need to expose a "stable" identifier. That is used by client to establish a relationship between the data and its location. The stable identifier can be either the IP or the hostname of the bookies. +1 : Franck Schmidlin ---- 2020-04-05 20:07:46 UTC - steven meadows: Do you know whether Pulsar schema registry support integration with Kafka? ---- 2020-04-05 20:56:21 UTC - Franck Schmidlin: Just read about tiered storage. Even better, i probably don't need to worry about scaling bookies. ---- 2020-04-06 02:01:42 UTC - Prashanth Tirupachur Vasanthakrishnan: @Prashanth Tirupachur Vasanthakrishnan has joined the channel ----
