2020-06-01 10:59:41 UTC - Oleg Toubenshlak: @Oleg Toubenshlak has joined the 
channel
----
2020-06-01 11:00:56 UTC - Shalom Tuby: @Shalom Tuby has joined the channel
----
2020-06-01 12:11:30 UTC - Tony Free: @Tony Free has joined the channel
----
2020-06-01 13:29:11 UTC - Miguel Martins: Pulsar is probably not fit for an 
event store.
I believe it doesn't have a way to perform inserts with optimistic concurrency 
check and as you mention, a way to retrieve all events by aggregate id without 
creating a topic for each one.
IMO, it would be better to use a dedicated event store and use it as a source 
for pulsar.
E.g postgres as event store and publish to pulsar using debezium
----
2020-06-01 13:29:41 UTC - Ankush: @Ankush has joined the channel
----
2020-06-01 13:55:37 UTC - Oleg Toubenshlak: Hi everyone, 

I have a question regarding to custom Pulsar connector (sink) deployment with 
Kubernetes (and Helm latest version).  
I need to access some files volumed externally in it. 

Is there a way to trigger pulsar broker to mount the external volume for the 
automatically generated connector which is run as a separated pod and created 
by pulsar broker. 

The connector is deployed by pulsar admin as “Functions-worker with brokers”.

Thanks!
----
2020-06-01 14:34:54 UTC - Hugo Smitter: @Hugo Smitter has joined the channel
----
2020-06-01 15:00:52 UTC - Oleg Toubenshlak: Hi again, another question 
regarding to pulsar connector. Is there a way to run connector as worker with 
java higher than java8?  For example by pulsarDockerImageName configuration 
with docker java13 image?
----
2020-06-01 15:55:38 UTC - Gary Fredericks: is there any way to know whether a 
particular version of the pulsar client is compatible with a particular version 
of the server?
----
2020-06-01 15:55:54 UTC - Gary Fredericks: I'm wondering about using client 
2.5.2 with server 2.4.1 in particular
----
2020-06-01 16:01:36 UTC - Ebere Abanonu: You look at the protocol version. 
2.5.2 protocol version 15 while 4.5.1 protocol version 14.
----
2020-06-01 16:02:11 UTC - Gary Fredericks: Okay, thanks
----
2020-06-01 16:02:28 UTC - Ebere Abanonu: The server is compatible with any 
client as long as the protocol version is indicated when connecting
+1 : Frank Kelly
----
2020-06-01 16:04:05 UTC - Tester T: Hi, folks!
We plan to use pulsar as event journal in event sourced system. So, we are 
planning to use following topic design:
1. topic per ddd aggregate. Won’t have direct subscriptions, only time to time 
usage of the reader api. Messages must be stored indefinitely (via tiered 
storage). 
2. couple of aggregated streams for other services integration, event 
processing, job queuing. Planned to be implemented via pulsar function which 
will listen for topic-pattern or full namespace. Messages could be deleted 
within some retention period.
e.g.
1. event-journaled topics: ‘orders_shop234’, ‘orders_shop124’.
 2. integration topics: ‘paid-orders’, ‘placed-orders’ which will be hydrated 
by pulsar-functions from event-journaled topics.

So, the question is what are the possible limitations of this design? Would 
there be any possible scaling issues e.g for 50k of aggregates (event journaled 
topics)?

Thanks for answers!
----
2020-06-01 16:04:40 UTC - Gary Fredericks: so I shouldn't _have_ to pay 
attention to this, ideally?
----
2020-06-01 16:06:35 UTC - Ebere Abanonu: That will be the job of the client 
library. Just pay attention to the features you will be getting
----
2020-06-01 16:06:51 UTC - Gary Fredericks: cool, thanks
----
2020-06-01 16:08:42 UTC - Ebere Abanonu: How do you intend to replay the 
events? That will determine your limitations
----
2020-06-01 16:11:46 UTC - Tester T: Why not to use the reader api for this 
purpose?
event rewind is required only in event-journaled topics
----
2020-06-01 16:14:29 UTC - Ebere Abanonu: the reader api is the best fit for that
----
2020-06-01 16:14:33 UTC - David Kjerrumgaard: I am not aware of any limitations 
beyond what the hardware can handle :smiley:
+1 : Kirill Kosenko
----
2020-06-01 16:15:07 UTC - Ebere Abanonu: Am actually working on something on 
EventSourcing on Pulsar
----
2020-06-01 16:19:43 UTC - Tester T: I am little bit worried about functions 
subs that will potentially listen for ~50-70k topics.
> Am actually working on something on EventSourcing on Pulsar
Yeah, I saw your contribution to the pulsar <http://akka.net|akka.net> 
persistence plugin, keep up good work!
----
2020-06-01 16:23:39 UTC - Ebere Abanonu: I think the best way to know is to 
take it for that sought of rough ride. But dont think that should be an issue, 
should it @Sijie Guo?
----
2020-06-01 16:23:49 UTC - Ebere Abanonu: Thanks
----
2020-06-01 16:24:44 UTC - Addison Higham: pulsar doesn't have a limit on number 
of subscriptions, but it does have some cost in the client SDK with keeping 
track of all of them (which does require a fair bit of memory) but more 
limiting is that it just takes a while for pulsar to iterate through and create 
all the subscriptions. It can cause some timeouts. In a flink job we had, it 
involved changing a couple of timeouts, not sure if that is exposed in the 
functions though. I would suggest testing that first.
----
2020-06-01 16:35:15 UTC - Tester T: I am not going to iterate through topics 
and individually subscribe.
Instead there will be couple (from 1 to 20) function subscriptions that will 
listen full namespace and process events.

I’ll definitely give it a try!

thanks for answers!
----
2020-06-01 17:54:41 UTC - Kirill Merkushev: Disagree here, we use pulsar as 
event source for quite a while now, we have 20M events now and use infinite 
retention and in case we need to restream something - we just read from the 
beginning - it takes now something like 30 min to fully recreate a database 
from the topic (once per quarter its fine on db migrations, esp with no 
downtime). We use 32 partitions on the topic and user Id as a key. True, that 
pulsar lacks some optimistic checks, but we rely here on the write through db 
here, with status update later on consumer thread. Maintaining debezium with pg 
with pulsar could be a quite heavy thing. I would advise to check if you need 
to store any personal or deletable data in pulsar, as it could be quite hard to 
get rid of selected aggregates (we use event gw with custom offloader plugin 
which stores personal data in the db and keeps in pulsar only the reference, to 
make this data removable). Also check the latest experimental per key 
subscription which should solve your per aggregate consume scenario.
----
2020-06-01 17:55:01 UTC - Kirill Merkushev: We use this event gw 
<https://github.com/bsideup/liiklus>
----
2020-06-01 17:57:45 UTC - Kirill Merkushev: Pulsar on regexp sub internally 
creates consumer per topic, so you would have thousands of consumers sharing 
the connection pool internally, don’t think its scalable
----
2020-06-01 20:05:15 UTC - Kirill Kosenko: Thank you guys for your replies
----
2020-06-01 20:27:03 UTC - lujop: Thank you very much for your response penghui.
After processing your responses I've more doubts but mainly are derived or new 
ones and I will start different threads for each
+1 : Penghui Li
----
2020-06-01 20:38:16 UTC - lujop: I made a little test with delayedDelivery and 
it seems that when applied the message order is ignored, isn't it?
For example for a subscriber with key_shared if I send a message with key=1 and 
delay 30 seconds and then another with the same key and delay 5 seconds, the 
second one is consumed first. Is this how it works?
----
2020-06-01 20:43:17 UTC - Vladimir Shchur: topic pattern does exactly that - it 
creates a separate consumer for each topic involved. So I would not call it a 
good idea.
----
2020-06-01 20:54:16 UTC - lujop: I'm evaluating to use Pulsar for a use case of 
classic message queues for integrations without realtime needs and not a huge 
number of messages.
Some of the features are a very good match for me, like multi subscriptions, 
cheap topic creation, and allow to have topics for entity pattern, pulsar SQL, 
and to have the flexibility to use for more advanced streaming uses cases if 
later needed.
One of my concerns is that reading documentation about production requirements, 
it says that a single machine instance is only for development purposes, but if 
no realtime uses cases are needed and it's not expected a lot of use, it is 
realistic to have only one instance if in case of disaster messages can be 
regenerated some way?
And about disaster recovery, if it would be needed, are backups of Pulsar 
possible, or the way is to relay on the cluster replication?
----
2020-06-01 20:59:04 UTC - Greg Methvin: isn’t this what you would expect? you 
asked for the second message to be delivered in 5 seconds and the first one to 
be delivered in 30 seconds
----
2020-06-01 21:04:30 UTC - lujop: No, I expected the message order to have the 
preference above the delay and that the first one was delivered after 30 
seconds, and the second one immediately after the first.
But I understand that when you use delay, order is not important?
----
2020-06-01 21:12:12 UTC - Addison Higham: I can't speak to if there is 
something about pulsar standalone that would make it be a no-go in production 
(besides the obvious SPOF and no ability to scale it out). What you describe 
seems somewhat reasonable to me, but there may be details of how standalone is 
run/configured that may cause more problems.

One other thing I wanted to mention though, under standalone, you can put all 
the data on a single disk/volume. That would make it much much easier to 
snapshot the disk and have it be consistent. The replication factor of 
bookkeeper/zookeeper is what handles most of that in a clustered scenario (as 
multiple disks/services are much more difficult to do traditional backups on), 
but with a single disk in standalone, it should be relative safe to just do DR 
with a disk backup
----
2020-06-01 21:35:50 UTC - Greg Methvin: I’m guessing the documentation is 
probably lacking here, but as I understand it the shared and key_shared 
subscription types don’t guarantee message ordering. what behavior were you 
expecting in your example?
----
2020-06-01 21:36:42 UTC - lujop: I've some doubts about how 
<https://github.com/apache/pulsar/wiki/PIP-26:-Delayed-Message-Delivery|Delayed 
Delivery> and new 
<https://github.com/apache/pulsar/wiki/PIP-58-%3A-Support-Consumers--Set-Custom-Retry-Delay|Consumers
 Set Custom Retry Delay feature> and how it can impact with a big number of 
retries and exponential backoff.
For example for a queue that is used for an external integration, that starts 
first retry after 1 minute, but does last after 2 days due to exponential 
backoff can have a huge impact in append-only Pulsar structures?  Because 
although there is only one old message that is not processed, the log cannot be 
discarded for newer entries until the older is processed, can't it?
Can this be a problem, or it's internally optimized using caching in memory?
Also, to my understanding,  custom retry delays will use another topic, then if 
I expect strictly order m1,m2,m3 and need to retry m1 and don't want  m2 and m3 
to be processed, I would need to manage by myself doing some precondition 
checks and manually force also m2 and m3 retries?
----
2020-06-01 21:44:36 UTC - Alexander Ursu: Was wondering if anyone has used the 
Pulsar SQL (Presto) successfully and securely in kubernetes. I'm not quite fond 
of it and unsure how to configure authentication for it. Would like to hear 
from any success stories.
----
2020-06-01 21:46:27 UTC - lujop: For key_shared I initially expect that 
messages with the same key to be processed in the order. That is:
To -&gt; m1 queued with key=K1 and delay 30s
To+1s -&gt; m2 queued with key=K1 and delay 5s
To+30s -&gt; m1 is processed only after delayed time passes. m2 hasn't been 
processed yet because although it's delay time has been passed order is 
preserved
To+30s -&gt; just after m1 is processed m2 is processed also
----
2020-06-01 22:05:31 UTC - Sijie Guo: 
<https://github.com/streamnative/charts/blob/master/charts/pulsar/templates/presto/presto-coordinator-configmap.yaml#L170>

You can check this configmap to see how to configure authentication for presto.
----
2020-06-01 22:07:03 UTC - Raphael Enns: I was looking at 
<https://pulsar.apache.org/docs/en/deploy-bare-metal/>. We don't need any data 
redundancy the data we're sending doesn't need to last long. We're also not 
pushing through a large amount or frequency of data. What would you recommend 
for a simple stable production setup? Would 1 zookeeper process, 1 bookkeeper 
process and 1 pulsar broker process all running on the same machine work?
----
2020-06-02 04:43:18 UTC - Alexander Ursu: Ah thank you. I was also wondering 
more along the lines of how external clients connect to the Presto cluster 
securely, specifically traffic external to the k8s cluster. Can that be done as 
a part of this helm chart too?
----
2020-06-02 08:02:18 UTC - xue: There are three brokers in a pulsar cluster. 
Using the synchronous sending interface of pulsar producers, there are only 
about 100 TPS.
code:
Producer&lt;String&gt; stringProducer = client.newProducer(Schema.STRING)
        .topic("my-topic")
        .create();
stringProducer.send("My message");
----

Reply via email to