Slack digest for #general - 2020-02-06

Apache Pulsar Slack Thu, 06 Feb 2020 01:11:14 -0800

2020-02-05 10:22:59 UTC - Konstantinos Papalias: Thanks for the update @Sijie 
Guo, is there an open conversation currently or any ongoing PIP ?
----
2020-02-05 12:20:02 UTC - Roman Popenov: I cannot speak to that, I would follow 
the issue at hand:
PIP 37: 
<https://github.com/apache/pulsar/wiki/PIP-37:-Large-message-size-handling-in-Pulsar>
and on github: <https://github.com/apache/pulsar/pull/4400#issue-283063448>
+1 : Ryan
----
2020-02-05 12:30:54 UTC - Anonymitaet: Dear Pulsar enthusiast,


The first Pulsar live video stream, TGIP-CN, will be available soon! 
:smiley_cat:

Date: Feb 9, 2020 (this Sunday)
Time: 11:00 CST, 
Topic: Pulsar architecture
Instructor: Sijie Guo from StreamNative
Duration: 40 min
Language: Chinese
Live streaming link: <https://live.bilibili.com/21468418>

Want to chat with Pulsar core engineer directly and get first-hand experiences? 
Do not miss this great opportunity! :raising_hand:
----
2020-02-05 12:33:57 UTC - Fernando: Are the brew binaries for `libpulsar` up to 
date with `2.5.0`? It seems that the latest version in brew is `2.4.2`
----
2020-02-05 15:34:08 UTC - Mikhail Veygman: @Mikhail Veygman has joined the 
channel
----
2020-02-05 15:53:41 UTC - Mikhail Veygman: Hi..  New to the forum.  I was 
wondering if there is a way to subscribe to the same topic with multiple 
clients and receive all messages that have ever been published to that topic.  
If it helps pulsar running v2.4.1+, Java Client.
----
2020-02-05 16:20:46 UTC - Ryan Slominski: @Ryan Slominski has joined the channel
----
2020-02-05 16:27:07 UTC - Ryan Slominski: Hi - just trying out pulsar and 
following docs standalone example and hit snag on second step - try to 
subscribe:

``` -bash-4.2$ ./pulsar-client consume my-topic -s "first-subscription"
cat: /opt/pulsar/distribution/server/target/classpath.txt: No such file or 
directory
Error: Could not find or load main class 
org.apache.pulsar.client.cli.PulsarClientTool```

----
2020-02-05 16:27:58 UTC - Matteo Merli: is that from a binary distribution?
----
2020-02-05 16:28:29 UTC - Ryan Slominski: Yeah, just downloaded it.  I think I 
figured it out - looks like symbolic links in path are not allowed
----
2020-02-05 16:29:07 UTC - Matteo Merli: yes, it's better to just rename the 
directory
----
2020-02-05 16:33:44 UTC - Sijie Guo: For each client, you can just have one 
separate subscription. A subscription will receive all the data.
----
2020-02-05 16:53:44 UTC - Sam Leung: :thumbsup:
----
2020-02-05 16:59:08 UTC - Mikhail Veygman: For each client or for each topic?
----
2020-02-05 17:12:06 UTC - Fernando: I’m trying to offload topics to S3 but I 
keep getting `No ledgers to offload` Is it supposed to offload only messages 
that are not acked? Or am I missing some configuration?
----
2020-02-05 17:41:47 UTC - Mikhail Veygman: @Sijie Guo Is this per client or per 
topic?  I can't seem to receive all messages for one of the clients.  Does this 
need to be a Reader or will Consumer do just fine?
----
2020-02-05 17:44:56 UTC - Sijie Guo: it only offloads messages that are not 
deleted (i.e. messages not acked or rentetion policy applied)
----
2020-02-05 17:49:11 UTC - Sijie Guo: your requirement is to have each client 
subscribed to the topic and each client receives all message, no?
----
2020-02-05 17:49:21 UTC - Sijie Guo: do I misunderstand your requirements?
----
2020-02-05 18:01:19 UTC - Clemens Vasters: @Clemens Vasters has joined the 
channel
----
2020-02-05 18:28:08 UTC - Pradeesh: @Sijie Guo ^^ can you help us out with this 
error
----
2020-02-05 18:37:58 UTC - Guilherme Perinazzo: is there a way to force the 
client to send more than one message in a batch?
----
2020-02-05 18:38:13 UTC - Guilherme Perinazzo: I'm trying to test something, 
but it always seems to send 1 message batches
----
2020-02-05 18:38:44 UTC - Matteo Merli: it depends on the bathing max delay time
----
2020-02-05 18:39:12 UTC - Guilherme Perinazzo: i set it to 10000ms
----
2020-02-05 18:39:33 UTC - Matteo Merli: are you setting delays/keys?
----
2020-02-05 18:40:10 UTC - Matteo Merli: also, are you calling send() or 
sendAsync() :
----
2020-02-05 18:40:12 UTC - Matteo Merli: ?
----
2020-02-05 18:40:46 UTC - Guilherme Perinazzo: Oh, i'm doing send, yeah, makes 
sense
----
2020-02-05 18:41:21 UTC - Matteo Merli: you should change to :
• producer.sendAsync()
• producer.sendAsync()
• producer.flush()
----
2020-02-05 18:43:04 UTC - Guilherme Perinazzo: yeah, using async worked, thanks!
----
2020-02-05 18:46:49 UTC - Mikhail Veygman: That is correct
----
2020-02-05 18:47:11 UTC - Mikhail Veygman: I think I misunderstood the issue I 
was having.
----
2020-02-05 19:21:35 UTC - Mikhail Veygman: Thank you.
----
2020-02-05 20:31:15 UTC - Ryan Slominski: Hi - I'm experimenting with pulsar 
2.5.0 and noticed bin/pulsar-daemon stop standalone results in some exceptions 
in the log file like:

```15:24:50.952 [Thread-1] ERROR org.apache.distributedlog.BKAbstractLogWriter 
- Completing Log segments encountered exception
<http://java.io|java.io>.IOException: Failed to close ledger for 
streams_000000000000000001_000000000000000001_000000000000000000:&lt;default&gt;:inprogress_000000000000000002
 : BookKeeper client is closed
        at 
org.apache.distributedlog.BKLogSegmentWriter$6.closeComplete(BKLogSegmentWriter.java:660)
 ~[org.apache.distributedlog-distributedlog-core-4.10.0.jar:4.10.0]
        at 
org.apache.bookkeeper.client.LedgerHandle$5.lambda$safeRun$0(LedgerHandle.java:552)
 ~[org.apache.bookkeeper-bookkeeper-server-4.10.0.jar:4.10.0]
        at 
java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:774)
 ~[?:1.8.0_232]
        at 
java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:750)
 ~[?:1.8.0_232]
        at 
java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:488) 
~[?:1.8.0_232]
        at 
java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:1990)
 ~[?:1.8.0_232]
        at 
org.apache.bookkeeper.client.LedgerHandle$5.lambda$safeRun$3(LedgerHandle.java:614)
 ~[org.apache.bookkeeper-bookkeeper-server-4.10.0.jar:4.10.0]
        at 
java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:774)
 ~[?:1.8.0_232]
        at 
java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:750)
 ~[?:1.8.0_232]
        at 
java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:488) 
~[?:1.8.0_232]
        at 
java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:1990)
 ~[?:1.8.0_232]
        at 
org.apache.bookkeeper.client.MetadataUpdateLoop.lambda$writeLoop$1(MetadataUpdateLoop.java:146)
 ~[org.apache.bookkeeper-bookkeeper-server-4.10.0.jar:4.10.0]
        at 
java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:774)
 ~[?:1.8.0_232]
        at 
java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:750)
 ~[?:1.8.0_232]
        at 
java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:488) 
~[?:1.8.0_232]
        at 
java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:1990)
 ~[?:1.8.0_232]
        at 
org.apache.bookkeeper.meta.CleanupLedgerManager.lambda$close$1(CleanupLedgerManager.java:246)
 ~[org.apache.bookkeeper-bookkeeper-server-4.10.0.jar:4.10.0]
        at 
java.util.concurrent.ConcurrentHashMap$KeySetView.forEach(ConcurrentHashMap.java:4649)
 ~[?:1.8.0_232]
        at 
org.apache.bookkeeper.meta.CleanupLedgerManager.close(CleanupLedgerManager.java:246)
 ~[org.apache.bookkeeper-bookkeeper-server-4.10.0.jar:4.10.0]
        at org.apache.bookkeeper.client.BookKeeper.close(BookKeeper.java:1410) 
~[org.apache.bookkeeper-bookkeeper-server-4.10.0.jar:4.10.0]
        at 
org.apache.distributedlog.BookKeeperClient.close(BookKeeperClient.java:271) 
~[org.apache.distributedlog-distributedlog-core-4.10.0.jar:4.10.0]```
Doesn't exactly inspire confidence in a tool specializing in concurrency.   Is 
this a known issue?
----
2020-02-05 20:33:19 UTC - Sijie Guo: I think the bookkeeper client was forced 
to close when there are still cleanup tasks. I don’t think it is impacting 
anything. but it is something can be improved.
----
2020-02-05 20:35:37 UTC - Ryan Slominski: Is this a problem only with 
standalone or does this affect production cluster code too?   I'm simply 
evaluating the software and actually ran the stop command on a completely idle 
instance (no clients connected).   Not getting a good feeling about the 
software stability.
----
2020-02-05 20:48:09 UTC - Antti Kaikkonen: When I'm creating a source connector 
with --processing-guarantees EFFECTIVELY_ONCE I can only achieve throughput of 
~100 msg/s. The ack latency is about 10ms so it seems that every read happens 
only after the previous message has been acknowledged. Is there any way to get 
higher throughput with effectively once guarantees?
----
2020-02-05 20:54:24 UTC - Sijie Guo: It is standalone. Standalone enables all 
the components. Some components are still in developer preview (e.g. the 
function state related components).
----
2020-02-05 20:55:32 UTC - Joe Francis: There was an unclean shutdown, sure.  
But does it recover .. thats the real test.  Depending on a clean shutdown 
should not be a pre-requisite for system stability.
----
2020-02-05 21:09:07 UTC - Alexander Ursu: @Alexander Ursu has joined the channel
----
2020-02-05 21:11:11 UTC - Alexander Ursu: New to Pulsar, is there some sort of 
guide to setting up a multi-node cluster using Docker Swarm?
----
2020-02-05 21:12:02 UTC - Sijie Guo: what is your source connector? I think the 
throughput here mainly depends on how does the connector implement the #read 
method. If you can’t change the way how the connector reads the data, you can 
scale up the throughput by increasing parallelism of functions.
----
2020-02-05 21:13:27 UTC - Sijie Guo: I don’t think there is a guide specific 
about DockerSwarm. You can try to read the general guide for on-prems 
deployment and maybe kubernetes deployment.

<http://pulsar.apache.org/docs/en/deploy-bare-metal/>
<http://pulsar.apache.org/docs/en/deploy-kubernetes/>
----
2020-02-05 21:22:43 UTC - Antti Kaikkonen: I created my own that instantly 
returns a dummy record. When I tested the same connector with 
--processing-guarantees ATLEAST_ONCE I got so high performance that I started 
to get java heap space out of memory errors so I had to introduce Thread.sleep 
in the read method.

But I'm wondering if AT_LEAST once should achieve the same guarantee as long as 
de-duplication is enabled and I'm implementing getPartitionId and 
getRecordSequence methods of the Record interface?
----
2020-02-05 21:25:12 UTC - Antti Kaikkonen: I'm using 2.4.2 standalone mode.
----
2020-02-05 21:34:32 UTC - Sijie Guo: The exactly-once is implemented in 
atleast-once with broker de-duplication.  you have to make sure your connector 
implementation return the partition id and record sequence correctly and 
consistently,
----
2020-02-05 21:58:12 UTC - Alexander Ursu: Might there be a reason why, or is it 
just a not so popular choice? I'm lead to believe there's some other reason why 
it's almost not mentioned at all when I try to search for one.
----
2020-02-05 22:35:53 UTC - Antti Kaikkonen: Yes I have implemented those. My 
getPartitionId always returns Optional.of("1");  since there is only a single 
partition in the source. I don't think that I can use parallelism to increase 
performance since there is only a single source partition and I need to retain 
orderding.
----
2020-02-05 22:36:45 UTC - Antti Kaikkonen: I tested with ATLEAST_ONCE and the 
deduplication stopped working so that doesn't seem to be an option either.
----
2020-02-05 22:51:50 UTC - Roman Popenov: Do functions leverage pulsar proxy in 
any way? Is there some kind of internal load balancing mechanism going on 
internally between proxies and brokers?
----
2020-02-05 23:23:29 UTC - Sijie Guo: Oh it is just because most of the 
committers haven’t used  docker swarm before. And we don’t see many requests 
about deployment to docker swarm. But we are definitely happy to see 
contributions of deploying to docker swarm. Maybe you can help create a github 
issue for requesting this feature. So people in the community can help that out.
----
2020-02-05 23:24:57 UTC - Sijie Guo: functions doesn’t rely on broker or 
proxies directly. functions can talk to a pulsar cluster via a broker service 
url or a proxy service url.

the function worker implementation leverage pulsar topics and subscriptions for 
load balancing and message routing.

hope that clarifies your questions.
thanks : Roman Popenov
----
2020-02-05 23:25:21 UTC - Roman Popenov: It does. Thank you
----
2020-02-06 00:08:40 UTC - Guilherme Perinazzo: does the c client expose any way 
to free a string pointer it allocated?
----
2020-02-06 00:09:08 UTC - Matteo Merli: just regular `free()`
----
2020-02-06 00:11:30 UTC - Guilherme Perinazzo: Okay, guess i'll have to find 
how to call libc from rust, thanks
----
2020-02-06 00:47:18 UTC - Antti Kaikkonen: Just tested with 2.5.0 standalone 
and got ~90 msg/s with EFFECTIVELY_ONCE and over 300 000 msg/s with 
ATLEAST_ONCE.

Here is the source connector that I'm using for testing: 
<https://pastebin.com/rRkV0mTs>
----
2020-02-06 03:08:24 UTC - Antti Kaikkonen: 
<https://github.com/apache/pulsar/blob/master/pulsar-functions/instance/src/main/java/org/apache/pulsar/functions/sink/PulsarSink.java#L237>
I think that `future.join()`; is causing this behavior. Is it required for 
effectively once guarantee? Would it be possible for a message to be added to a 
topic after a failed message if the messages were sent asynchronously?
----
2020-02-06 07:56:20 UTC - Fernando: How can I migrate my pulsar cluster from 
one k8s cluster to another? I’ve tried copying the files in the mounted volumes 
but that doesn’t work. Also tried offloading topics to S3 but that doesn’t work 
either. It seems there’s some ephemeral data that might be missing in the new 
cluster, for it to recognize the ledgers and the topics correctly. Any advice?
----
2020-02-06 08:25:05 UTC - Yuvaraj Loganathan: <https://pulsar.apache.org/> 
website is down ?
----
2020-02-06 08:43:51 UTC - Martin Skogevall: I just tried it, and it seems to 
work,
----

Slack digest for #general - 2020-02-06

Reply via email to