2020-05-29 10:01:42 UTC - Maria: @Maria has joined the channel ---- 2020-05-29 10:51:58 UTC - lujop: Hi, I'm evaluating using Pulsar for a project and I'm reading documentation to check if we can manage our use cases well and investigating the best way to implement them. I write the present mail to expose the use cases and receive opinions about if Pulsar is a good match and the approach I've in mind is the best or there are better alternatives.
***Uses cases*** All the use cases are mainly for managing integrations with external applications/services. ***1*** Having consumer exponential backoff retries with final retry delays up to some days. Approach: I haven't seen native support for this, Is there? I suppose that I can use some manual logic and enqueue another message wit delayed redelivery? ***2*** Searchable message history, for pending messages, processed ones, and discarded with errors. Being able to search by time ranges and some message properties. Approach: Is using Pulsar SQL and message retention of processed messaged of 1 to 5 years a supported use case? Or can I've problems? ***3*** Strictly ordering between messages for the same queue and property. For example, a queue to manage status changes from invoices where we need to ensure that all status changes for the same invoice are processed in order. And that if some message is not delivered due to errors, others from the same invoice must wait until the first is processed while letting messages from other invoices to be processed. Approach: Using a topic for each entity with multitopic subscriptions can do the job very well. Is creating a topic a cheap operation and can have a lot of them without needing a lot of resources? ***4*** Is it possible to make a cluster with only two machines? I know that documentation suggests 6, but if there isn't a lot of workload it's possible to use just only two? or three is mandatory for consensus? A lot of thanks in advance, ---- 2020-05-29 11:10:56 UTC - Penghui Li: 1. The custom retry delay will release in 2.6.0, You can implement exponential backoff based on this feature, here is the PR of this feature <https://github.com/apache/pulsar/pull/6449> 2. Yes, you can use Pulsar SQL to retrieve messages, and publish time can used by query param. 3. Pulsar has multiple subscription modes. You can use Exclusive, Failover or Key_Shared. Both of these subscription modes are designed for ordering. You can find details at <http://pulsar.apache.org/docs/en/concepts-messaging/#subscriptions> . Key_Shared is a new subscription mode and current version has some ordering problems while consumer changed. This PR <https://github.com/apache/pulsar/pull/6977> is try to fix this issue. 4. If you have an external zookeeper cluster, you can use 2 machines to install Pulsar broker and bookkeeper. If you want to install zookeeper along with Broker and bookie, it’s better to use 3 machines. ---- 2020-05-29 11:27:03 UTC - lujop: Thank you very much @Penghui Li One question about 3, is creating and maintaining a lot of topics a cheap operation, like one topic for invoice in the use case I described? Because I need strict ordering per Invoice but not to block messages for other invoices ---- 2020-05-29 11:45:53 UTC - Penghui Li: Yes, you can create many topics. You can use invoice as the message key, message with same key writes to the some partition. So you don’t need to create a topic for each invoice. ---- 2020-05-29 12:31:43 UTC - Aaron Verachtert: I'm looking to implement a readinessProbe for my Pulsar-instance deployed on a Kubernetes cluster. Is there any documentation on health/readiness endpoints for Pulsar? ---- 2020-05-29 13:04:18 UTC - lujop: To have that feature I understand that I've to configure a partitioned topic, haven't I? And if for example I've 4 partitions, and four different invoices blocked messages one in each partition, they will block the processing or not? ---- 2020-05-29 13:42:04 UTC - Lawal Azeez: @Sijie Guo i still look forward to hear from you ---- 2020-05-29 14:09:17 UTC - Penghui Li: Yes, you need a partitioned topic. If you send invoices with same key to a partition, the partition only have one active consumer. So you can process orderly. ---- 2020-05-29 14:35:49 UTC - Addison Higham: If you look at master, the latest helm charts now have probes ---- 2020-05-29 14:36:12 UTC - Aaron Verachtert: Yes, I noticed. However, the endpoints in the helm charts are not working on 2.4.0 and I was wondering if they are documented somewhere ---- 2020-05-29 14:39:11 UTC - Olivier Chicha: is there a place where we can know which version of the client can run with which version of the server? here I am in fact interested in knowing if client 2.5.2 is compatible with server 2.5.1 ---- 2020-05-29 15:27:45 UTC - Addison Higham: ah, if you look at the startup scripts for the broker you will notice that it drops that ready file on boot ---- 2020-05-29 15:29:00 UTC - Addison Higham: patch versions should not contain protocol changes and most protocol changes are compatible cross minor versions (not as sure if that is policy though) ---- 2020-05-29 16:26:26 UTC - Raphael Enns: Hi, we're currently running an instance of Pulsar v2.4.1 on a production machine which is having some issues. After running for a week or so, we start getting errors of "Not enough non-faulty bookies available" and eventually rocksdb gives "Failed to checkpoint db" errors. After restarting Pulsar, we get "Failed to restore rocksdb" with a "<http://java.io|java.io>.FileNotFoundException". To get Pulsar running again, I need to clear the data directory. We are currently running Pulsar in standalone mode with default settings other than setting defaultRetentionTimeInMinutes to 60. We are managing the Pulsar process via supervisor and it is running on Ubuntu 18.04. At this point, I'm less concerned about the errors, unless there is an obvious reason for them, but what I'd like to know is your recommendation on how to run Pulsar stably in production. We're using it right now as a means of passing messages between applications and it currently has low usage, though that will increase over time. Also, supervisor uses the TERM signal to stop processes by default. Is that fine? We have used rocksdb in other use cases and have noticed that it can have its database corrupted. Do you have any suggestions on how to prevent that? Thanks in advance. ---- 2020-05-29 16:54:07 UTC - Miles Tan: @Miles Tan has joined the channel ---- 2020-05-29 17:12:40 UTC - Olivier Chicha: ok thanks ---- 2020-05-29 18:08:28 UTC - Sijie Guo: • standalone itself is more for development. It includes developer preview features. If you want to use standalone for production, consider disabling function worker and state if you are not using them. `bin/pulsar standalone -nfw -nss`. ---- 2020-05-29 18:09:03 UTC - Sijie Guo: Sorry it was midnight in my timezone. Didn’t get the message. will take a look. ---- 2020-05-29 18:56:17 UTC - Lawal Azeez: thank you ---- 2020-05-29 19:08:26 UTC - Babatunde Famakinwa: @Babatunde Famakinwa has joined the channel ---- 2020-05-29 20:39:01 UTC - Raphael Enns: Thanks. At this time, we only need a single node, so standalone gets us where we need to be. Though we could look into setting up zookeeper/bookkeeper/broker on a single machine. ---- 2020-05-29 20:39:19 UTC - Raphael Enns: I will try using -nfw and -nss ---- 2020-05-29 20:40:04 UTC - Raphael Enns: I couldn't find any information on -nss though other than it is to disable stream storage. Could you point me to something that describes what it is and why I would want to use it or not use it? ---- 2020-05-29 21:08:05 UTC - Sijie Guo: stream storage is the state storage used by pulsar functions. If you don’t use pulsar functions and its state api, you don’t need it. ---- 2020-05-29 21:08:27 UTC - Raphael Enns: thanks ----
