2020-06-01 10:59:41 UTC - Oleg Toubenshlak: @Oleg Toubenshlak has joined the channel ---- 2020-06-01 11:00:56 UTC - Shalom Tuby: @Shalom Tuby has joined the channel ---- 2020-06-01 12:11:30 UTC - Tony Free: @Tony Free has joined the channel ---- 2020-06-01 13:29:11 UTC - Miguel Martins: Pulsar is probably not fit for an event store. I believe it doesn't have a way to perform inserts with optimistic concurrency check and as you mention, a way to retrieve all events by aggregate id without creating a topic for each one. IMO, it would be better to use a dedicated event store and use it as a source for pulsar. E.g postgres as event store and publish to pulsar using debezium ---- 2020-06-01 13:29:41 UTC - Ankush: @Ankush has joined the channel ---- 2020-06-01 13:55:37 UTC - Oleg Toubenshlak: Hi everyone,
I have a question regarding to custom Pulsar connector (sink) deployment with Kubernetes (and Helm latest version). I need to access some files volumed externally in it. Is there a way to trigger pulsar broker to mount the external volume for the automatically generated connector which is run as a separated pod and created by pulsar broker. The connector is deployed by pulsar admin as “Functions-worker with brokers”. Thanks! ---- 2020-06-01 14:34:54 UTC - Hugo Smitter: @Hugo Smitter has joined the channel ---- 2020-06-01 15:00:52 UTC - Oleg Toubenshlak: Hi again, another question regarding to pulsar connector. Is there a way to run connector as worker with java higher than java8? For example by pulsarDockerImageName configuration with docker java13 image? ---- 2020-06-01 15:55:38 UTC - Gary Fredericks: is there any way to know whether a particular version of the pulsar client is compatible with a particular version of the server? ---- 2020-06-01 15:55:54 UTC - Gary Fredericks: I'm wondering about using client 2.5.2 with server 2.4.1 in particular ---- 2020-06-01 16:01:36 UTC - Ebere Abanonu: You look at the protocol version. 2.5.2 protocol version 15 while 4.5.1 protocol version 14. ---- 2020-06-01 16:02:11 UTC - Gary Fredericks: Okay, thanks ---- 2020-06-01 16:02:28 UTC - Ebere Abanonu: The server is compatible with any client as long as the protocol version is indicated when connecting +1 : Frank Kelly ---- 2020-06-01 16:04:05 UTC - Tester T: Hi, folks! We plan to use pulsar as event journal in event sourced system. So, we are planning to use following topic design: 1. topic per ddd aggregate. Won’t have direct subscriptions, only time to time usage of the reader api. Messages must be stored indefinitely (via tiered storage). 2. couple of aggregated streams for other services integration, event processing, job queuing. Planned to be implemented via pulsar function which will listen for topic-pattern or full namespace. Messages could be deleted within some retention period. e.g. 1. event-journaled topics: ‘orders_shop234’, ‘orders_shop124’. 2. integration topics: ‘paid-orders’, ‘placed-orders’ which will be hydrated by pulsar-functions from event-journaled topics. So, the question is what are the possible limitations of this design? Would there be any possible scaling issues e.g for 50k of aggregates (event journaled topics)? Thanks for answers! ---- 2020-06-01 16:04:40 UTC - Gary Fredericks: so I shouldn't _have_ to pay attention to this, ideally? ---- 2020-06-01 16:06:35 UTC - Ebere Abanonu: That will be the job of the client library. Just pay attention to the features you will be getting ---- 2020-06-01 16:06:51 UTC - Gary Fredericks: cool, thanks ---- 2020-06-01 16:08:42 UTC - Ebere Abanonu: How do you intend to replay the events? That will determine your limitations ---- 2020-06-01 16:11:46 UTC - Tester T: Why not to use the reader api for this purpose? event rewind is required only in event-journaled topics ---- 2020-06-01 16:14:29 UTC - Ebere Abanonu: the reader api is the best fit for that ---- 2020-06-01 16:14:33 UTC - David Kjerrumgaard: I am not aware of any limitations beyond what the hardware can handle :smiley: +1 : Kirill Kosenko ---- 2020-06-01 16:15:07 UTC - Ebere Abanonu: Am actually working on something on EventSourcing on Pulsar ---- 2020-06-01 16:19:43 UTC - Tester T: I am little bit worried about functions subs that will potentially listen for ~50-70k topics. > Am actually working on something on EventSourcing on Pulsar Yeah, I saw your contribution to the pulsar <http://akka.net|akka.net> persistence plugin, keep up good work! ---- 2020-06-01 16:23:39 UTC - Ebere Abanonu: I think the best way to know is to take it for that sought of rough ride. But dont think that should be an issue, should it @Sijie Guo? ---- 2020-06-01 16:23:49 UTC - Ebere Abanonu: Thanks ---- 2020-06-01 16:24:44 UTC - Addison Higham: pulsar doesn't have a limit on number of subscriptions, but it does have some cost in the client SDK with keeping track of all of them (which does require a fair bit of memory) but more limiting is that it just takes a while for pulsar to iterate through and create all the subscriptions. It can cause some timeouts. In a flink job we had, it involved changing a couple of timeouts, not sure if that is exposed in the functions though. I would suggest testing that first. ---- 2020-06-01 16:35:15 UTC - Tester T: I am not going to iterate through topics and individually subscribe. Instead there will be couple (from 1 to 20) function subscriptions that will listen full namespace and process events. I’ll definitely give it a try! thanks for answers! ---- 2020-06-01 17:54:41 UTC - Kirill Merkushev: Disagree here, we use pulsar as event source for quite a while now, we have 20M events now and use infinite retention and in case we need to restream something - we just read from the beginning - it takes now something like 30 min to fully recreate a database from the topic (once per quarter its fine on db migrations, esp with no downtime). We use 32 partitions on the topic and user Id as a key. True, that pulsar lacks some optimistic checks, but we rely here on the write through db here, with status update later on consumer thread. Maintaining debezium with pg with pulsar could be a quite heavy thing. I would advise to check if you need to store any personal or deletable data in pulsar, as it could be quite hard to get rid of selected aggregates (we use event gw with custom offloader plugin which stores personal data in the db and keeps in pulsar only the reference, to make this data removable). Also check the latest experimental per key subscription which should solve your per aggregate consume scenario. ---- 2020-06-01 17:55:01 UTC - Kirill Merkushev: We use this event gw <https://github.com/bsideup/liiklus> ---- 2020-06-01 17:57:45 UTC - Kirill Merkushev: Pulsar on regexp sub internally creates consumer per topic, so you would have thousands of consumers sharing the connection pool internally, don’t think its scalable ---- 2020-06-01 20:05:15 UTC - Kirill Kosenko: Thank you guys for your replies ---- 2020-06-01 20:27:03 UTC - lujop: Thank you very much for your response penghui. After processing your responses I've more doubts but mainly are derived or new ones and I will start different threads for each +1 : Penghui Li ---- 2020-06-01 20:38:16 UTC - lujop: I made a little test with delayedDelivery and it seems that when applied the message order is ignored, isn't it? For example for a subscriber with key_shared if I send a message with key=1 and delay 30 seconds and then another with the same key and delay 5 seconds, the second one is consumed first. Is this how it works? ---- 2020-06-01 20:43:17 UTC - Vladimir Shchur: topic pattern does exactly that - it creates a separate consumer for each topic involved. So I would not call it a good idea. ---- 2020-06-01 20:54:16 UTC - lujop: I'm evaluating to use Pulsar for a use case of classic message queues for integrations without realtime needs and not a huge number of messages. Some of the features are a very good match for me, like multi subscriptions, cheap topic creation, and allow to have topics for entity pattern, pulsar SQL, and to have the flexibility to use for more advanced streaming uses cases if later needed. One of my concerns is that reading documentation about production requirements, it says that a single machine instance is only for development purposes, but if no realtime uses cases are needed and it's not expected a lot of use, it is realistic to have only one instance if in case of disaster messages can be regenerated some way? And about disaster recovery, if it would be needed, are backups of Pulsar possible, or the way is to relay on the cluster replication? ---- 2020-06-01 20:59:04 UTC - Greg Methvin: isn’t this what you would expect? you asked for the second message to be delivered in 5 seconds and the first one to be delivered in 30 seconds ---- 2020-06-01 21:04:30 UTC - lujop: No, I expected the message order to have the preference above the delay and that the first one was delivered after 30 seconds, and the second one immediately after the first. But I understand that when you use delay, order is not important? ---- 2020-06-01 21:12:12 UTC - Addison Higham: I can't speak to if there is something about pulsar standalone that would make it be a no-go in production (besides the obvious SPOF and no ability to scale it out). What you describe seems somewhat reasonable to me, but there may be details of how standalone is run/configured that may cause more problems. One other thing I wanted to mention though, under standalone, you can put all the data on a single disk/volume. That would make it much much easier to snapshot the disk and have it be consistent. The replication factor of bookkeeper/zookeeper is what handles most of that in a clustered scenario (as multiple disks/services are much more difficult to do traditional backups on), but with a single disk in standalone, it should be relative safe to just do DR with a disk backup ---- 2020-06-01 21:35:50 UTC - Greg Methvin: I’m guessing the documentation is probably lacking here, but as I understand it the shared and key_shared subscription types don’t guarantee message ordering. what behavior were you expecting in your example? ---- 2020-06-01 21:36:42 UTC - lujop: I've some doubts about how <https://github.com/apache/pulsar/wiki/PIP-26:-Delayed-Message-Delivery|Delayed Delivery> and new <https://github.com/apache/pulsar/wiki/PIP-58-%3A-Support-Consumers--Set-Custom-Retry-Delay|Consumers Set Custom Retry Delay feature> and how it can impact with a big number of retries and exponential backoff. For example for a queue that is used for an external integration, that starts first retry after 1 minute, but does last after 2 days due to exponential backoff can have a huge impact in append-only Pulsar structures? Because although there is only one old message that is not processed, the log cannot be discarded for newer entries until the older is processed, can't it? Can this be a problem, or it's internally optimized using caching in memory? Also, to my understanding, custom retry delays will use another topic, then if I expect strictly order m1,m2,m3 and need to retry m1 and don't want m2 and m3 to be processed, I would need to manage by myself doing some precondition checks and manually force also m2 and m3 retries? ---- 2020-06-01 21:44:36 UTC - Alexander Ursu: Was wondering if anyone has used the Pulsar SQL (Presto) successfully and securely in kubernetes. I'm not quite fond of it and unsure how to configure authentication for it. Would like to hear from any success stories. ---- 2020-06-01 21:46:27 UTC - lujop: For key_shared I initially expect that messages with the same key to be processed in the order. That is: To -> m1 queued with key=K1 and delay 30s To+1s -> m2 queued with key=K1 and delay 5s To+30s -> m1 is processed only after delayed time passes. m2 hasn't been processed yet because although it's delay time has been passed order is preserved To+30s -> just after m1 is processed m2 is processed also ---- 2020-06-01 22:05:31 UTC - Sijie Guo: <https://github.com/streamnative/charts/blob/master/charts/pulsar/templates/presto/presto-coordinator-configmap.yaml#L170> You can check this configmap to see how to configure authentication for presto. ---- 2020-06-01 22:07:03 UTC - Raphael Enns: I was looking at <https://pulsar.apache.org/docs/en/deploy-bare-metal/>. We don't need any data redundancy the data we're sending doesn't need to last long. We're also not pushing through a large amount or frequency of data. What would you recommend for a simple stable production setup? Would 1 zookeeper process, 1 bookkeeper process and 1 pulsar broker process all running on the same machine work? ---- 2020-06-02 04:43:18 UTC - Alexander Ursu: Ah thank you. I was also wondering more along the lines of how external clients connect to the Presto cluster securely, specifically traffic external to the k8s cluster. Can that be done as a part of this helm chart too? ---- 2020-06-02 08:02:18 UTC - xue: There are three brokers in a pulsar cluster. Using the synchronous sending interface of pulsar producers, there are only about 100 TPS. code: Producer<String> stringProducer = client.newProducer(Schema.STRING) .topic("my-topic") .create(); stringProducer.send("My message"); ----
