2020-05-06 10:27:41 UTC - Franck Schmidlin: Followup question: what difference 
are there between io #sinks and #functions?

I want to post topic messages to an http endpoint, à la pulsar beam. 

Functions feel more versatile than sinks, but would I be missing a trick? Is 
there any difference in processing, isolation, etc?
----
2020-05-06 10:36:36 UTC - alex kurtser: Hello, I would like to clarify better 
the managedLedgerDefaultEnsembleSize parameter.

If i want to scale out my bookkeepers instances from 3  instances to 6.  Do i 
need to set the parameter  managedLedgerDefaultEnsembleSize to the value of 6 
as well in order to use the all 6 bookkeepers ?
----
2020-05-06 12:08:10 UTC - Pierre Zemb: Hi all :wave:
I have a question, why so much parameters like georeplication, tiered-storage, 
retention and others are on the namespace-level and not the topic-level?
----
2020-05-06 12:16:01 UTC - Alexandre DUVAL: 
<https://github.com/apache/pulsar/wiki/PIP-51%3A-Tenant-policy-support>

I think it has been impl that way because of previous needs. Now, more 
globally, I think it's "todo".
----
2020-05-06 12:17:12 UTC - Alexandre DUVAL: I'm open to contribute with you if 
you go for it :wink:.
----
2020-05-06 12:18:05 UTC - Pierre Zemb: thanks @Alexandre DUVAL! Found PIP 39: 
<https://github.com/apache/pulsar/wiki/PIP-39:-Namespace-Change-Events>
----
2020-05-06 12:19:24 UTC - Pierre Zemb: I might work on that part indeed, I will 
keep you posted :slightly_smiling_face:
----
2020-05-06 12:19:44 UTC - Alexandre DUVAL: PIP39 is really interesting.
----
2020-05-06 12:21:37 UTC - Alexandre DUVAL: About this work, I think 
<https://github.com/apache/pulsar/pull/6428> will be interesting (currently 
only working for namespaces).
----
2020-05-06 12:26:53 UTC - Pierre Zemb: thanks a lot @Alexandre DUVAL for the 
links, will dive into those
----
2020-05-06 12:39:53 UTC - Damien Roualen: Hello,
I have a question regarding Presto.
Is that better to keep Pulsar with Presto included (for instance the 2.5.0 with 
a custom version of Presto e.g. 0.206 added to the pom file)?
Or to deploy Presto from the official website (<https://prestodb.io/>) and add 
the Pulsar connector plugin.
Context: we have an existing Pulsar cluster, and we would like to deploy Presto 
and connect to the cluster.
----
2020-05-06 12:45:16 UTC - rani: @Sijie Guo, any clues here^?
----
2020-05-06 13:38:04 UTC - Ming: Sink refers to outbound data from Pulsar to an 
external system. If we speak of data flow, Pulsar Function in most cases keep 
the data within Pulsar (i.e. sending to another topic). If you have  external 
data I/O, `sink` or `source` connectors are right approach. Speaking of 
underline implementation, both connectors and Pulsar functions are very 
similar. They serve different purposes. If you refer to posting data to http 
endpoints, Sink source is more applicable. However, Beam is neither, which was 
developed as a standalone component to be more versatile and pluggable.
+1 : Franck Schmidlin
----
2020-05-06 13:44:37 UTC - Ming: @Kirill Merkushev you use admin API, admin CLI 
or rest API to rewind the cursor once the function subscription is created. An 
example could be 
<https://pulsar.apache.org/admin-rest-api/?version=2.5.1#operation/resetCursor>
+1 : Kirill Merkushev
----
2020-05-06 14:09:14 UTC - hugues DESLANDES: Hi,
We are testing the pulsar-flink connector (but not using the schema registry, 
<https://github.com/streamnative/pulsar-flink>). From flink we would like to 
sink in pulsar some empty messages (to use compaction on pulsar topic). 
Acoording to my understanding of the connector, I have not found any way to do 
this : we provide a message and a way to find the key from the message : how 
could we make the message empty ?
Any tip or workaround would be helpfull. Thanks
----
2020-05-06 14:37:12 UTC - Penghui Li: I have create two issues to track the 
documentation for Proxy metrics and Presto worker metrics. 
<https://github.com/apache/pulsar/issues/6896>
<https://github.com/apache/pulsar/issues/6897>
And I marked help-wanted. If you are interesting in fix them, welcome.
----
2020-05-06 14:40:56 UTC - Penghui Li: No, the new bookies will be selected when 
the managed-ledger rollover. For more details you can read  
<https://jack-vanlightly.com/blog/2018/10/2/understanding-how-apache-pulsar-works>
+1 : alex kurtser
----
2020-05-06 14:44:40 UTC - Kirill Merkushev: can I precreate same way 
subscription and then create a function then?
----
2020-05-06 15:26:30 UTC - Allen ONeill: Does anyone know of a hosted/managed 
version of Pulsar same as I can get for eg: Cassandra/Kafka etc?
----
2020-05-06 15:31:28 UTC - Chris Bartholomew: We do at <https://kafkaesque.io/>. 
If you have any questions about it, le me know.
----
2020-05-06 15:36:14 UTC - Chris Hansen: Really? They seem to work for me but I 
only tested `@JsonCreator` and `@JsonProperty`. Without those, I was getting an 
exception.
----
2020-05-06 15:48:48 UTC - Ming: @Kirill Merkushev Not only do you just 
pre-create a subscription, you also have to create the input topic. Although I 
have not tried, it should work since the default subscription type is shared.
----
2020-05-06 15:57:41 UTC - Ricardo Ferreira: @Ricardo Ferreira has joined the 
channel
----
2020-05-06 16:02:28 UTC - Alex: @Alex has joined the channel
----
2020-05-06 16:56:13 UTC - Manjunath Ghargi: Hi All,
I'm looking for a Performance Test tool through which we can benchmark all the 
performance metrics similar to Jmeter or Gatling which are some standard tools 
for performance benchmarking.  Can someone kindly share more details if any of 
these external tools supports Pulsar and if we can make use of them for 
performance testing? Or any other info related to performance testing of the 
Pulsar server scaling up to ~30k to 50k TPS?
+1 : Franck Schmidlin
----
2020-05-06 17:18:55 UTC - Addison Higham: 
<http://openmessaging.cloud/docs/benchmarks/> at one point, you had to build it 
yourself as the published docker images were broke, but perhaps they work again 
now
tada : Shivji Kumar Jha
+1 : Franck Schmidlin
----
2020-05-06 17:21:02 UTC - Sijie Guo: @rani for python functions, there was one 
protobuf related change was missing to cherry-pick in 2.5.1 release. 
<https://github.com/apache/pulsar/issues/6858>
----
2020-05-06 17:27:10 UTC - Sijie Guo: @Kirill Merkushev

• If you already have a function running, you can use reset-cursor (i.e. 
admin-cli or resetful api) to reset the cursor for the subscription created by 
the function.
• You can pre-recreate a subscription with the subscription position you like 
to start before submitting a function.
----
2020-05-06 17:34:58 UTC - Sijie Guo: So this is related to shading problems. 
The Jackson related libraries are shaded. You can use `pulsar-client-original` 
to get around this issue.

Can you create an issue for us to improve this behavior?
----
2020-05-06 17:36:11 UTC - Shivji Kumar Jha: Hi @Manjunath Ghargi  We used 
<https://locust.io/> . A pretty good tool for a dev. We wrote down 
<https://docs.locust.io/en/stable/writing-a-locustfile.html#declaring-tasks|locust
 tasks> which could use python pulsar client to send/receive pulsar.
This task was then baked into a docker container and we could just launch more 
and more instances of this container to increase throughput on pulsar.

By default, the perf results are ephemeral so you could write to your favourite 
graphing tool (statsd for us) and then follow in there...

Not out of the box, but very flexible!
----
2020-05-06 17:36:18 UTC - Sijie Guo: Proxy expose metrics. But I don’t think 
presto expose prometheus metrics.
----
2020-05-06 17:37:24 UTC - Sijie Guo: Along with PIP-39, we will introduce 
topic-level policy. /cc @Penghui Li
----
2020-05-06 17:39:30 UTC - Sijie Guo: @Damien Roualen :

I would recommend getting started with the one bundled with Pulsar. Because you 
don’t need to worry about the  compatibility issue with different presto 
version.

But deploying from Presto officially has its advantage - you can always pull in 
the latest changes from Presto.
----
2020-05-06 17:40:44 UTC - Sijie Guo: Hi @hugues DESLANDES - I don’t think the 
current connector implementation support. Can you create an issue for it?
----
2020-05-06 17:43:56 UTC - Manjunath Ghargi: @Shivji Kumar Jha: Can you please 
share a sample code for locust task that you have written or any open Git Repo 
that we I can refer.
----
2020-05-06 17:45:33 UTC - Manjunath Ghargi: Thanks I'll look into this 
framework.
----
2020-05-06 17:50:28 UTC - Sijie Guo: @Allen ONeill - Please checkout 
<https://streamnative.io/support/managed-pulsar-service/> built by the original 
developers of Pulsar/BookKeeper.
+1 : Shivji Kumar Jha
----
2020-05-06 17:51:27 UTC - Shivji Kumar Jha: @Manjunath Ghargi Here is a quick 
<https://gist.github.com/shiv4289/fba1f68542b2fd4505e72d91de91b9f2|gist> that 
you could refer.
----
2020-05-06 17:56:23 UTC - Manjunath Ghargi: Thanks Shiv.
----
2020-05-06 17:57:38 UTC - Kirill Merkushev: is it safe to create subscription 
via consumer (to change the subscription type to exclusive/failover)? As this 
option is missing in the api @Sijie Guo
----
2020-05-06 17:59:51 UTC - Sijie Guo: It is safe to create subscription. 
Functions doesn’t support exclusive. So don’t use exclusive type.

You can’t change the subscription type after a subscription is created.
----
2020-05-06 18:24:08 UTC - Prasanth Lemati: @Prasanth Lemati has joined the 
channel
----
2020-05-06 18:52:15 UTC - Addison Higham: hrm, a team at my company is curious 
about creating lots of namespaces (like a couple hundred). I would imagine that 
would create additional load on the metadata store with load balancing and 
policy management, but should that be much of a concern?
+1 : Franck Schmidlin, 高天赐
----
2020-05-06 18:53:33 UTC - Addison Higham: (still figuring out of the use case 
makes sense, but curious conceptually if that would place undue stress anywhere)
----
2020-05-06 19:09:31 UTC - Gary Fredericks: I'm trying to figure out how to 
reconcile these two things

A) tiered storage lets you store an arbitrarily long history for a topic, and 
pulsar lets you read that history
B) the backlog quota feature prevents a subscriber from consuming messages from 
too far behind the newest message in a topic
----
2020-05-06 19:09:52 UTC - Gary Fredericks: (this is me trying to work out what 
to do about <https://github.com/streamnative/pulsar/issues/931>)
----
2020-05-06 19:12:54 UTC - Gary Fredericks: my suspicion is that the backlog 
quota shouldn't apply in certain circumstances, like retention situations where 
the data isn't going to be deleted anyhow
----
2020-05-06 19:13:18 UTC - Gary Fredericks: but I don't understand well enough 
how backlogs work to be sure of that
----
2020-05-06 19:22:18 UTC - Alexandre DUVAL: What can be the reason of
```[pulsar-io-23-1] WARN  org.apache.pulsar.broker.service.ServerCnx - 
[/192.168.10.13:43134] java.lang.NoSuchMethodError: 
java.nio.ByteBuffer.rewind()Ljava/nio/ByteBuffer; with role proxy-to-broker```
----
2020-05-06 19:31:43 UTC - Addison Higham: @Gary Fredericks are you using 
`consumer_backlog_eviction`? I have been curious about that as well. Since 
backlogs are per namespace though, you should be able to remove the backlog 
quota and see if that fixes the issue.
----
2020-05-06 19:35:49 UTC - Gary Fredericks: @Addison Higham I am, that was the 
key thing I didn't know when I filed that issue

the problem is I don't yet know the implications of changing that policy; is 
running with an unlimited backlog quota a safe thing to do, in namespaces with 
infinite retention?
----
2020-05-06 19:38:33 UTC - Addison Higham: the only thing I can think of is if 
when a subscription retains a message that prevents either offloading from 
happening OR from the broker message cache from being cleared out. Otherwise, I 
don't see why it would be problematic
----
2020-05-06 19:38:45 UTC - Addison Higham: and I don't know the answer to that 
question (but am curious to know as well :slightly_smiling_face: )
----
2020-05-06 19:39:37 UTC - Addison Higham: I have sort of assumed that reader 
subscription, since they are somewhat different as a non-durable cursor, may 
not even have backlog quota logic applied, but that might not be true
----
2020-05-06 19:40:13 UTC - Gary Fredericks: well they do, is what I found, but I 
was wondering if maybe they shouldn't
----
2020-05-06 19:41:53 UTC - Pierre Zemb: for all :fr: readers, @Steven Le Roux, 
@Quentin ADAM and myself recorded a podcast about Pulsar and KoP, enjoy: 
<https://bigdatahebdo.com/podcast/episode-99-apache-pulsar-et-kafka-on-pulsar/>
+1 : Florentin Dubois, Gilles Barbier, Alexandre DUVAL, Sijie Guo, Pierre Zemb
fr : Pierre Zemb
clap : Karthik Ramasamy
----
2020-05-06 19:45:23 UTC - Chris Hansen: sure thing
----
2020-05-06 20:25:44 UTC - Franck Schmidlin: I'm looking at the AWS deployment 
instructions and the default cluster sizing seems quite large/expensive to my 
untrained eye.

<https://pulsar.apache.org/docs/v2.0.1-incubating/deployment/aws-cluster/|https://pulsar.apache.org/docs/v2.0.1-incubating/deployment/aws-cluster/>

Is there any minimal but functional size for a cluster? I want a realistic 
infrastructure for my poc but i won't be hammering it.
In fact, even in production I don't have the kind of volumes that seem to be 
the standard use case for pulsar.
----
2020-05-06 20:28:09 UTC - Alexandre DUVAL: @Sijie Guo any idea? (latest master 
proxy/broker, 2.5.1 bookkeeper/client)
----
2020-05-06 20:29:01 UTC - Alexandre DUVAL: i don't see major changes on bookies 
between both versions so i didnt updated bookies, but maybe i must
----
2020-05-06 20:30:30 UTC - Alexandre DUVAL: the global behavior is everything 
connect well but no message is forwarded
----
2020-05-06 20:39:06 UTC - Sijie Guo: NoSuchMethodError means there is 
dependency conflict that causes netty is not properly loaded
----
2020-05-06 20:44:32 UTC - Alexandre DUVAL: on the broker itself?
----
2020-05-06 21:18:16 UTC - Alexandre DUVAL: is that related to the bump of netty
```    &lt;netty.version&gt;4.1.48.Final&lt;/netty.version&gt;
    
&lt;netty-tc-native.version&gt;2.0.30.Final&lt;/netty-tc-native.version&gt;```
and it conflicted with the pulsar usages on proxy &lt;-&gt; broker connections ?
----
2020-05-06 21:53:11 UTC - Greg Methvin: I’m wondering about the same thing 
actually. We basically want to have a namespace per customer, of which we might 
have 1000 or so.
----
2020-05-06 21:53:47 UTC - Greg Methvin: it’s not totally necessary but it seems 
useful.
----
2020-05-06 21:58:01 UTC - Addison Higham: @Franck Schmidlin pulsar scales down 
pretty well. I run across 8 regions which are very imbalanced, in my smallest 
regions I run brokers with as little as 1 GB of memory. Bookies tend to be a 
bit more memory hungry (I have had issues with ensuring it doesn't OOM on heap) 
but I can still run it at about 4 GB of memory. Zookeeper can be quite small as 
well, 1 GB of heap is fine for it.
----
2020-05-06 21:59:53 UTC - Addison Higham: that size cluster can still be 
capable of pushing reasonable throughputs,  10k msgs/sec or more. The real 
important bit is just fast disk for bookie journals. I use provisioned IOPS 
volumes
----
2020-05-06 22:38:18 UTC - Ming: @Gary Fredericks We were just discussing this 
topic about message retention with someone. I think there is a lot of different 
concepts here. In A), Tiered Storage  merely extends the disk space. It does 
not govern the message retention policy. Message retention policy governs how 
long message can be kept. B) Backlog quota puts a limit on how many unacked 
messages on a subscription. It prevents a topic growing infinite if there are 
too many unacked messages. So the consumer can either ack those messages or TTL 
will force expired message to be auto-acked. Backlog is per subscription. So 
there could be multiple backlogs for a topic because a topic can have multiple 
subscriptions.
----
2020-05-06 22:38:42 UTC - Kirill Merkushev: also is there a way to tune 
producer the same way - as change the hashing to murmur32?
----
2020-05-06 22:40:08 UTC - Gary Fredericks: @Ming does a backlog imply extra 
resource usage beyond what's already used by the retention?

or more to my use case, is there _any_ benefit to a backlog quota if your 
retention is infinite?
----
2020-05-06 23:08:05 UTC - Ming: Backlog and retention policy are two different 
concepts. In a vanilla Pulsar configuration, only unacked message on a 
subscription will be kept for consumption. This means ack-ed message and 
messages on topics with no subscription (ie. reader only) can be deleted. This 
is why retention policy is introduced to allow messages to be retained in a 
persistent storage. Pulsar tries to keep unacked message forever. The backlog 
quota and TTL are really to prevent message queue (purposefully I use queue 
instead of topic since these terms are interchangeable in queuing world) 
growing indefinitely. Pulsar will delete acked message and messages with no 
subscription as soon as it could (when the trigger is satisfied such as time 
interval) So the retention policy counters this behaviour to keep the message. 
Actually, messages can still be deleted in tiered storage if it no longer 
satisfies retention policy. You probably know already message are not deleted 
individually instead it is the ledger, a collection of messages, to be deleted.
----
2020-05-06 23:10:33 UTC - Gary Fredericks: Does this mean that if the retention 
policy prevents deletion, the backlog quota has no additional effect?
----
2020-05-06 23:10:58 UTC - Kirill Merkushev: aand one more question regarding 
functions - is context shared between functions?
----
2020-05-06 23:21:06 UTC - Ming: They work independently. Depends on which 
blacklog quota policy, in the case of `producer_exception` when the blacklog 
quota is reached, no longer will producer can send a message instead it will 
receive an exception from the broker.
----
2020-05-06 23:23:00 UTC - Kirill Merkushev: and btw how enable state for local 
runner?
----
2020-05-06 23:23:46 UTC - Ming: While the data retention still could have 
plenty of room to persist messages. These are two independent problems Pulsar 
tries to tackle. But they interplay too.
----
2020-05-06 23:25:16 UTC - Kirill Merkushev: (as I get exception)
```java.lang.RuntimeException: Failed to increment key 
'85658b96-f126-413a-8f2a-1304604a6902' by amount '1'
        at 
org.apache.pulsar.functions.instance.ContextImpl.incrCounter(ContextImpl.java:277)
 ~[org.apache.pulsar-pulsar-functions-instance-2.5.0.jar:?]
        at 
ru.lanwen.pulsar.functions.SimpleCtxFunction.process(SimpleCtxFunction.java:13) 
~[?:?]```
----
2020-05-06 23:34:12 UTC - Liam Clarke: Hi all,

I'm looking at Pulsar as a Kafka replacement, and I had a question about 
delivery guarantees. From reading the docs, it seems that Pulsar's architecture 
guarantees "at least once" by default - if a producer sends a record to a 
broker, and the broker commits it to BookKeeper and then fails before sending 
the ack to the producer, then the producer will try again. However, if I enable 
deduplication, it looks like it guarantees 'exactly once' - 
<https://pulsar.apache.org/docs/en/cookbooks-deduplication/>

Am I understanding this correctly?

Also, Pulsar IO and Pulsar Functions - am I correct in that they run on the 
brokers, as opposed to Kafka Connect / Kafka Streams which run standalone?
----
2020-05-06 23:37:04 UTC - Alexandre DUVAL: oh maybe it's because i compiled 
with java11 T_T
----
2020-05-06 23:37:50 UTC - Alexandre DUVAL: and running with java8..
----
2020-05-06 23:48:46 UTC - Ming: It depends on your requirements. Since you are 
looking into aws cluster, I guess standalone won't be enough for your PC. So 
you might need at least 3 bookies and 3 zookeeper pods. But can you get away 
with one broker? If it's POC, you could use spot instances that's 80 to 90% 
less in terms of costs. You will do fine with m4 or m5. No need to pay premium 
for compute or storage optimized vms. We actually have been using m4 in 
production cluster that is sufficient.
----
2020-05-06 23:50:47 UTC - Chris Hansen: 
<https://github.com/apache/pulsar/issues/6902>
----
2020-05-07 00:49:19 UTC - Joshua Dunham: Hi Everyone, Getting an error : Error 
creating ledger for allocating /stream/storage/streamsXXX... is this a disk 
storage issue or ZooKeeper issue?
----
2020-05-07 03:25:29 UTC - Raphael Enns: @Raphael Enns has joined the channel
----
2020-05-07 04:47:28 UTC - Sijie Guo: Correctly. Broker de-duplication can 
achieve exactly-once producing.
----
2020-05-07 04:48:01 UTC - Sijie Guo: Pulsar Functions and Connectors can run 
standalone, along with brokers, spearately in a function worker cluster, or 
over Kubernetes.
----
2020-05-07 04:48:38 UTC - Sijie Guo: ZooKeeper issue.
----
2020-05-07 04:49:20 UTC - Sijie Guo: If you are running stanalone , I will 
recommend disabling the state store first. you can run standalone with 
`bin/pulsar standalone -nss`.
----
2020-05-07 07:26:36 UTC - Damien Roualen: I started with the one with Pulsar, 
but the version was old 0.206, and it was not possible for me to work using 
Java 11.

Exactly I can use the last version directly that way.
----

Reply via email to