Slack digest for #general - 2020-05-30

Apache Pulsar Slack Sat, 30 May 2020 02:11:56 -0700

2020-05-29 10:01:42 UTC - Maria: @Maria has joined the channel
----
2020-05-29 10:51:58 UTC - lujop: Hi, I'm evaluating using Pulsar for a project 
and I'm reading documentation to check if we can manage our use cases well and 
investigating the best way to implement them.
I write the present mail to expose the use cases and receive opinions about if 
Pulsar is a good match and the approach I've in mind is the best or there are 
better alternatives.


***Uses cases***
All the use cases are mainly for managing integrations with external 
applications/services.

***1***
Having consumer exponential backoff retries with final retry delays up to some 
days.
Approach: I haven't seen native support for this, Is there? I suppose that I 
can use some manual logic and enqueue another message wit delayed redelivery?

***2***
Searchable message history, for pending messages, processed ones, and discarded 
with errors. Being able to search by time ranges and some message properties.
Approach: Is using Pulsar SQL and message retention of processed messaged of 1 
to 5 years a supported use case? Or can I've problems?

***3***
Strictly ordering between messages for the same queue and property. For 
example, a queue to manage status changes from invoices where we need to ensure 
that all status changes for the same invoice are processed in order. And that 
if some message is not delivered due to errors, others from the same invoice 
must wait until the first is processed while letting messages from other 
invoices to be processed.
Approach:
Using a topic for each entity with multitopic subscriptions can do the job very 
well. Is creating a topic a cheap operation and can have a lot of them without 
needing a lot of resources?

***4***
Is it possible to make a cluster with only two machines?
I know that documentation suggests 6, but if there isn't a lot of workload it's 
possible to use just only two? or three is mandatory for consensus?

A lot of thanks in advance,
----
2020-05-29 11:10:56 UTC - Penghui Li: 1. The custom retry delay will release in 
2.6.0, You can implement exponential backoff based on this feature, here is the 
PR of this feature <https://github.com/apache/pulsar/pull/6449>
2. Yes, you can use Pulsar SQL to retrieve messages, and publish time can used 
by query param. 
3. Pulsar has multiple subscription modes. You can use Exclusive, Failover or 
Key_Shared. Both of these subscription modes are designed for ordering. You can 
find details at 
<http://pulsar.apache.org/docs/en/concepts-messaging/#subscriptions> . 
Key_Shared is a new subscription mode and current version has some ordering 
problems while consumer changed. This PR 
<https://github.com/apache/pulsar/pull/6977> is try to fix this issue.
4. If you have an external zookeeper cluster, you can use 2 machines to install 
Pulsar broker and bookkeeper. If you want to install zookeeper along with 
Broker and bookie, it’s better to use 3 machines.
----
2020-05-29 11:27:03 UTC - lujop: Thank you very much @Penghui Li
One question about 3, is creating and maintaining a lot of topics a cheap 
operation, like one topic for  invoice in the use case I described?
Because I need strict ordering per Invoice but not to block messages for other 
invoices
----
2020-05-29 11:45:53 UTC - Penghui Li: Yes, you can create many topics. You can 
use invoice as the message key, message with same key  writes to the some 
partition. So you don’t need to create a topic for each invoice.
----
2020-05-29 12:31:43 UTC - Aaron Verachtert: I'm looking to implement a 
readinessProbe for my Pulsar-instance deployed on a Kubernetes cluster. Is 
there any documentation on health/readiness endpoints for Pulsar?
----
2020-05-29 13:04:18 UTC - lujop: To have that feature I understand that I've to 
configure a partitioned topic, haven't I?
And if for example I've 4 partitions, and four different invoices blocked 
messages one in each partition, they will block the processing or not?
----
2020-05-29 13:42:04 UTC - Lawal Azeez: @Sijie Guo i still look forward to hear 
from you
----
2020-05-29 14:09:17 UTC - Penghui Li: Yes, you need a partitioned topic. If you 
send invoices with same key to a partition, the partition only have one active 
consumer. So you can process orderly.
----
2020-05-29 14:35:49 UTC - Addison Higham: If you look at master, the latest 
helm charts now have probes
----
2020-05-29 14:36:12 UTC - Aaron Verachtert: Yes, I noticed. However, the 
endpoints in the helm charts are not working on 2.4.0 and I was wondering if 
they are documented somewhere
----
2020-05-29 14:39:11 UTC - Olivier Chicha: is there a place where we can know 
which version of the client can run with which version of the server?
here I am in fact interested in knowing if
client 2.5.2 is compatible with server 2.5.1
----
2020-05-29 15:27:45 UTC - Addison Higham: ah, if you look at the startup 
scripts for the broker you will notice that it drops that ready file on boot
----
2020-05-29 15:29:00 UTC - Addison Higham: patch versions should not contain 
protocol changes and most protocol changes are compatible cross minor versions 
(not as sure if that is policy though)
----
2020-05-29 16:26:26 UTC - Raphael Enns: Hi, we're currently running an instance 
of Pulsar v2.4.1 on a production machine which is having some issues. After 
running for a week or so, we start getting errors of "Not enough non-faulty 
bookies available" and eventually rocksdb gives "Failed to checkpoint db" 
errors. After restarting Pulsar, we get "Failed to restore rocksdb" with a 
"<http://java.io|java.io>.FileNotFoundException". To get Pulsar running again, 
I need to clear the data directory.

We are currently running Pulsar in standalone mode with default settings other 
than setting defaultRetentionTimeInMinutes to 60. We are managing the Pulsar 
process via supervisor and it is running on Ubuntu 18.04.

At this point, I'm less concerned about the errors, unless there is an obvious 
reason for them, but what I'd like to know is your recommendation on how to run 
Pulsar stably in production.

We're using it right now as a means of passing messages between applications 
and it currently has low usage, though that will increase over time. Also, 
supervisor uses the TERM signal to stop processes by default. Is that fine? We 
have used rocksdb in other use cases and have noticed that it can have its 
database corrupted. Do you have any suggestions on how to prevent that?

Thanks in advance.
----
2020-05-29 16:54:07 UTC - Miles Tan: @Miles Tan has joined the channel
----
2020-05-29 17:12:40 UTC - Olivier Chicha: ok thanks
----
2020-05-29 18:08:28 UTC - Sijie Guo: • standalone itself is more for 
development. It includes developer preview features. If you want to use 
standalone for production, consider disabling function worker and state if you 
are not using them. `bin/pulsar standalone -nfw -nss`.
----
2020-05-29 18:09:03 UTC - Sijie Guo: Sorry it was midnight in my timezone. 
Didn’t get the message. will take a look.
----
2020-05-29 18:56:17 UTC - Lawal Azeez: thank you
----
2020-05-29 19:08:26 UTC - Babatunde Famakinwa: @Babatunde Famakinwa has joined 
the channel
----
2020-05-29 20:39:01 UTC - Raphael Enns: Thanks. At this time, we only need a 
single node, so standalone gets us where we need to be. Though we could look 
into setting up zookeeper/bookkeeper/broker on a single machine.
----
2020-05-29 20:39:19 UTC - Raphael Enns: I will try using -nfw and -nss
----
2020-05-29 20:40:04 UTC - Raphael Enns: I couldn't find any information on -nss 
though other than it is to disable stream storage. Could you point me to 
something that describes what it is and why I would want to use it or not use 
it?
----
2020-05-29 21:08:05 UTC - Sijie Guo: stream storage is the state storage used 
by pulsar functions. If you don’t use pulsar functions and its state api, you 
don’t need it.
----
2020-05-29 21:08:27 UTC - Raphael Enns: thanks
----

Slack digest for #general - 2020-05-30

Reply via email to