Slack digest for #general - 2020-06-13

Apache Pulsar Slack Sat, 13 Jun 2020 02:11:32 -0700

2020-06-12 09:55:37 UTC - skans100: @Penghui Li is there a tentative date for 
getting 2.6.0 released?
----
2020-06-12 09:56:53 UTC - Penghui Li: Yes, plan to announce next week if there 
are no block issues in 2.6.0 release.
+1 : skans100
----
2020-06-12 11:03:30 UTC - Nicolas Ha: In here 
<http://pulsar.apache.org/functions-rest-api/?version=2.5.1#operation/registerFunction>
it says
```jar
Path to the JAR file for the Pulsar Function (if the Pulsar Function is written 
in Java). It also supports URL path [http/https/file (file protocol assumes 
that file already exists on worker host)] from which worker can download the 
package.```
And the example input is
```  "jar": java-function-1.0-SNAPSHOT.jar```
Does this mean my jar has to be publicly available?
How does the java pulsar client does it?
----
2020-06-12 11:09:30 UTC - Nicolas Ha: I am asking because I cannot seem to use 
this 
<https://github.com/apache/pulsar/blob/2aff473e598fe5e8ba9f8ed0860de35a67718725/pulsar-client-tools/src/main/java/org/apache/pulsar/admin/cli/CmdFunctions.java#L657>
it seems to expect my Java fn to be in the classpath - so I am trying to use 
the API but it isn’t clear to me how to proceed
----
2020-06-12 11:20:40 UTC - Nicolas Ha: from the broker logs
```java.lang.IllegalArgumentException: Function class my.myFunction must be in 
class path```
So the question is: how can I upload a jar / class not in the class path to 
pulsar?
----
2020-06-12 11:21:21 UTC - sjmittal: Hey folks I am installing pulsar first time 
on k8s and I get this error which I suppose is mainly due to bookie not 
starting:
`11:12:24.704 [pulsar-web-42-5] ERROR 
org.apache.pulsar.broker.admin.impl.PersistentTopicsBase - [null] Failed to 
create non-partitioned topic <persistent://public/functions/assignments>`
`java.util.concurrent.CompletionException: 
org.apache.pulsar.broker.service.BrokerServiceException$PersistenceException: 
org.apache.bookkeeper.mledger.ManagedLedgerException: Not enough non-faulty 
bookies available`
----
2020-06-12 11:21:39 UTC - sjmittal: any idea what could be the reason behind 
this error
----
2020-06-12 11:38:13 UTC - Luke Stephenson: Hi @Sijie Guo.  Here is the 
configuration I'm using:
```    managedLedgerOffloadDriver: "aws-s3"
    s3ManagedLedgerOffloadBucket: "goanna-pulsar-topic-offload"
    s3ManagedLedgerOffloadRegion: "ap-southeast-2"
    s3ManagedLedgerOffloadServiceEndpoint: 
"<http://s3.ap-southeast-2.amazonaws.com|s3.ap-southeast-2.amazonaws.com>"```
----
2020-06-12 11:40:44 UTC - xiaolong.ran: The 
`s3ManagedLedgerOffloadServiceEndpoint` useful for testing. Can you config 
`AWS_ACCESS_KEY_ID`  and `AWS_SECRET_ACCESS_KEY`?
----
2020-06-12 11:40:47 UTC - Luke Stephenson: Where would I find the docker image 
for the 2.6.0 rc?  I'm looking here and don't see it 
<https://hub.docker.com/r/apachepulsar/pulsar-all/tags>
----
2020-06-12 11:41:02 UTC - xiaolong.ran: 
<https://pulsar.apache.org/docs/en/cookbooks-tiered-storage/#authentication-with-aws>
----
2020-06-12 11:42:01 UTC - xiaolong.ran: in broker.conf:


```managedLedgerOffloadDriver: "aws-s3"
    s3ManagedLedgerOffloadBucket: "goanna-pulsar-topic-offload"
    s3ManagedLedgerOffloadRegion: "ap-southeast-2"```
in pulsar_env.sh

```export AWS_ACCESS_KEY_ID=ABC123456789
export AWS_SECRET_ACCESS_KEY=ded7db27a4558e2ea8bbf0bf37ae0e8521618f366c```

----
2020-06-12 11:43:18 UTC - Luke Stephenson: I hadn't set those env vars. I 
assumed they were picked up from the ec2 instance.   Will look at that.
----
2020-06-12 11:51:11 UTC - xiaolong.ran: Sorry, I am not an expert of EC2 
instance, but my understanding is that you should be able to configure 
`AWS_ACCESS_KEY_ID` and `AWS_SECRET_ACCESS_KEY` in your service as env, right? 
If this is the case, you should be able to offload the data into aws-s3
----
2020-06-12 11:57:27 UTC - Luke Stephenson: I'll give that a go
----
2020-06-12 11:57:45 UTC - xiaolong.ran: Cool
----
2020-06-12 12:34:24 UTC - Luke Stephenson: I tried setting up
&gt; 5. Assuming an IAM role
That didn't help.  Haven't yet worked out how I get `AWS_ACCESS_KEY_ID` onto 
the environment with the pulsar helm template
----
2020-06-12 12:36:12 UTC - Luke Stephenson: Think I've spotted what I need to do 
now.
----
2020-06-12 12:44:16 UTC - Luke Stephenson: nah. not sure how to get that into 
the helm template
----
2020-06-12 12:55:15 UTC - Luke Stephenson: I managed to get them into 
`conf/pulsar_env.sh` as
```PULSAR_EXTRA_OPTS=" -Daws.accessKeyId=xxx -Daws.secretKey=xxx "```
However, that didn't help and I'm still seeing the original error message: "The 
authorization header is malformed; the region 'us-east-1' is wrong; expecting 
'ap-southeast-2''"
----
2020-06-12 13:09:14 UTC - Luke Stephenson: Out of curiosity, I tried creating a 
bucket in us-east-1 and configuring `s3ManagedLedgerOffloadRegion: "us-east-1"` 
and it gets past that error (still fails for a different reason, but it 
progresses further)
----
2020-06-12 13:24:50 UTC - Luke Stephenson: It's working with us-east-1.
----
2020-06-12 13:25:16 UTC - Luke Stephenson: Unfortunately my cluster is running 
in ap-southeast-2 though, so that isn't a great setup.
----
2020-06-12 13:48:34 UTC - Luke Stephenson: And now if I switch it back to 
ap-southeast-2 it works.  Seems the library provides useful error messages if 
something is misconfigured when the bucket is in us-east-1.  So short term I 
can point to that region to sort my config out and then switch back to the 
desired region later.
----
2020-06-12 14:24:34 UTC - Marcio Martins: Is there a way to make the S3 
offloads get a prefix with the topic name? I don't want to provision a new 
bucket per topic, and having everything in a single bucket means I can't delete 
a topic and then manually go and delete the S3 objects. Alternatively, maybe 
would be nice to delete the S3 files when a topic is deleted, as I don't think 
this is currently happening.
writing_hand : Asaf Mesika
----
2020-06-12 14:37:17 UTC - jujugrrr: @Marcio Martins I think I've seen a github 
issue about cleaning policy for S3 topic
----
2020-06-12 14:38:01 UTC - Marcio Martins: Thanks, will check it out
----
2020-06-12 15:04:27 UTC - Matteo Merli: It would only be replicated if you had 
a "global" zookeeper as config store. Otherwise you need to create the 
namespace with same config in all 4 instances.
----
2020-06-12 15:05:59 UTC - Aaron Batilo: What about adding a lifecycle policy to 
the s3 objects?
+1 : David Kjerrumgaard
----
2020-06-12 15:06:19 UTC - Asaf Mesika: We’ve thinking of building a library 
offering high throughput task queue and scheduled tasks (recurring), on top of 
Pulsar, in Java (Think Quartz, but horizontally scalable, high throughput and 
better API for one-off tasks)
1. Does anybody happens to know an existing open source library like that?
2. We’re trying to assess the amount of “operational cost” Apache Pulsar may 
have (we’ll probably start with something like 40k msgs/min): how much you 
spend on maintaining a running BK cluster and Pulsar cluster. Does it have many 
errors? Errors are easy to handle? 
----
2020-06-12 15:09:09 UTC - David Kjerrumgaard: The short answer is yes they will 
be fetched. However you have to pay attention to the data retention and message 
expiration policies to ensure that the messages don't get "purged"
----
2020-06-12 15:12:07 UTC - David Kjerrumgaard: You can use the 
<http://pulsar.apache.org/docs/en/pulsar-admin/#stats-1> command to show you 
all the connected topic consumers and their respective consumption rates.
----
2020-06-12 15:13:14 UTC - Marcio Martins: Thanks!
----
2020-06-12 15:13:47 UTC - Anup Ghatage: Hey @sjmittal
 `Not enough non-faulty bookies available`
Usually means there aren’t enough bookies for an ensemble - which means either:
1. Your bookkeeper deployment went awry 
2. Service discovery cannot connect to or find the bookies for the ensemble
----
2020-06-12 15:39:53 UTC - sjmittal: yeah figured out the issue, my PVC was not 
setup correctly and hence bookies were not instantiated and hence the broker 
was complaining not enough bookies
+1 : Anup Ghatage, Asaf Mesika
----
2020-06-12 15:43:17 UTC - Enrico Olivelli: Hey I have just received my t-shirt 
for Pulsar Summit! Thank you @Sijie Guo!
Conference is coming!
Please every one check it out ! 
<https://pulsar-summit.org|https://pulsar-summit.org>
----
2020-06-12 16:57:01 UTC - Anup Ghatage: Whoa whoa whoa. How do we get one? 
@Sijie Guo
----
2020-06-12 16:58:59 UTC - Ebere Abanonu: @Sijie Guo wouldn't mind one too. Do 
you T-Shirt Pulsar itself?
----
2020-06-12 18:03:16 UTC - Enrico Olivelli: @Anup Ghatage as you are a speaker 
you should have received an email that asked for your size :) maybe it is too 
late
----
2020-06-12 18:05:07 UTC - Anup Ghatage: @Enrico Olivelli unfortunately seems 
like I didn’t get that email :disappointed:
(or missed it up pretty sure I didn’t since its about T-shirts :smile: )
----
2020-06-12 18:05:44 UTC - Daniel Ciocirlan: thanks @Matteo Merli for the 
clarification :slightly_smiling_face:
----
2020-06-12 18:06:20 UTC - Daniel Ciocirlan: one more question if not asking too 
much, if a sync producer fails to send a message to broker due to operation 
timeout, would the producer retry the message or drop it ?
----
2020-06-12 18:09:43 UTC - Matteo Merli: When there are failure, the producer is 
internally retrying, up to the specified "send-timeout" configured
----
2020-06-12 18:10:07 UTC - Daniel Ciocirlan: yes, if we exceed that the message 
is dropped ?
----
2020-06-12 18:10:10 UTC - Matteo Merli: after that the message is dropped (that 
doesn't exclude that we message could be actually published though)
----
2020-06-12 18:10:26 UTC - Daniel Ciocirlan: cool good to know 
:slightly_smiling_face:
----
2020-06-12 18:10:43 UTC - Matteo Merli: if you want the client to keep trying, 
you need to set send-timeout to 0
----
2020-06-12 18:10:48 UTC - Daniel Ciocirlan: we just started 2 months with 
Pulsar and have some questions that cannot find in the docs
----
2020-06-12 18:11:23 UTC - Daniel Ciocirlan: we can't the upper layer is not 
thread safe and if we get 400 K- 500K RPMs we get to thread starvation
----
2020-06-12 18:11:36 UTC - Daniel Ciocirlan: thanks for the great help!
----
2020-06-12 18:12:22 UTC - Daniel Ciocirlan: regarding the local ZK, i need to 
create partitioned topics in all regions even if i produce in 1 region  and 
this namespace is geo-replicated to consuming regions
----
2020-06-12 18:22:31 UTC - Matteo Merli: You should do all admin operations in 
each "cluster", so even creating the partitioned topics
----
2020-06-12 18:23:04 UTC - Daniel Ciocirlan: okey so we did it correct 
:slightly_smiling_face:
----
2020-06-12 18:23:35 UTC - Daniel Ciocirlan: my last 2 questions, if not too 
much :
----
2020-06-12 18:24:22 UTC - Daniel Ciocirlan: • do we need any connectors for a 
geo replicated solution, we just use shared subscription for event notification 
in consuming regions
----
2020-06-12 18:26:05 UTC - Daniel Ciocirlan: • i cannot find any documentations 
that details what the proxy actually does, i found a book that says that proxy 
queries ZK for which broker has the partition, then uses the connection pool to 
forward the producer request to the <http://broker.Is|broker.Is> this correct ? 
any other functionality ?
----
2020-06-12 18:26:25 UTC - Matteo Merli: if you configure the "clusters" URL and 
configure the namespace to be replicated across the clusters, that would be it
----
2020-06-12 18:26:52 UTC - Matteo Merli: the proxy is used to expose broker 
through a load balancer
----
2020-06-12 18:27:14 UTC - Matteo Merli: if you're using standalone Pulsar you 
wouldn't need the proxy anyway
----
2020-06-12 18:27:33 UTC - Daniel Ciocirlan: we have AWS setup with NLB in front
----
2020-06-12 18:27:51 UTC - Daniel Ciocirlan: target is about 1M RPM, replicated 
to 4 regions for consuming
----
2020-06-12 18:29:22 UTC - Daniel Ciocirlan: thanks for your time @Matteo Merli
----
2020-06-12 18:33:49 UTC - Matteo Merli: :+1:
----
2020-06-12 18:51:16 UTC - Anup Ghatage: Update: I did receive the email but 
missed it :cry:
I deserve to not have a shirt since I wasn’t on top of things :sob:
----
2020-06-12 18:51:29 UTC - Curtis Cook: I’ll take your tshirt if you don’t want 
it :wink:
----
2020-06-12 19:04:43 UTC - Anup Ghatage: let’s hold on now..  I’ve messaged 
@Sijie Guo. I might get one if he’s in a good mood :stuck_out_tongue:
----
2020-06-12 19:05:30 UTC - Curtis Cook: i suggest sending cookies
grin : Anup Ghatage
----
2020-06-12 20:44:09 UTC - Gilles Barbier: Hi Asaf, we are currently building 
something like that. A job manager system + a workflow engine to orchestrate 
those jobs. It could be used for async tasks or even microservices orchestration
----
2020-06-12 20:44:53 UTC - Gilles Barbier: With a big focus on observability. 
Scale and resilience should be provided by Pulsar. We plan to open source an 
alpha this summer.
----
2020-06-12 20:46:29 UTC - Gilles Barbier: (we are developing in kotlin, but 
workers could easily be written in any languages supported by Pulsar)
----
2020-06-12 21:14:20 UTC - Asaf Mesika: Ok. Sounds similar yet a bit different. 
My main question to you is: did you tackle any surprises? What about operations 
of BookKeeper - is it a stable system or requires deep knowledge to operate ?
----
2020-06-12 21:14:41 UTC - Asaf Mesika: We plan to write in Kotlin as well - 
great language 
----
2020-06-12 21:14:52 UTC - Kirill Merkushev: Hello, got an error in the prod - 
`2020-06-12 21:08:26.716 ERROR 13 --- [r-client-io-1-1] 
org.apache.pulsar.client.impl.ClientCnx : [id: 0x834bd319, L:/10.0.5.244:40214 
- R:ip-10-0-5-251.eu-central-1.compute.internal/10.0.5.251:6651] Close 
connection because received internal-server error 
java.lang.IllegalStateException: Namespace bundle 
company/events/0x58000000_0x60000000 is being unloaded`  - is there a way to 
reproduce that intentionally? So that i can test our code on this. And what 
does it mean in general?
----
2020-06-12 21:23:48 UTC - Addison Higham: oh dang, didn't realize I was 
supposed to send an email either :stuck_out_tongue:
----
2020-06-12 21:27:20 UTC - Addison Higham: `pulsar-admin namespaces unload 
&lt;namespace&gt;` will trigger that (or `pulsar-admin topics unload 
&lt;topic&gt;`)

Unloading is how pulsar moves load across brokers. For a given topic (actually 
a bundle, as topics are grouped together in a bundle to have less of them to 
manage) pulsar periodically keeps stats of how busy the topic (bundles) is and 
how that load is relative to other brokers. It will try and shed load if it 
gets too high.

When the bundle is unloading, all the connections get closed as you now have a 
different broker you need to talk to
+1 : Matteo Merli, Kirill Merkushev
----
2020-06-12 21:34:09 UTC - Cathy Thompson: @Cathy Thompson has joined the channel
----
2020-06-12 22:13:02 UTC - Kirill Merkushev: is that possible to reproduce it 
with standalone setup? (as it have effectively just one broker)
----
2020-06-12 22:13:31 UTC - Kirill Merkushev: thx for the explanation - even 
better than in docs :smile:
----
2020-06-13 06:01:24 UTC - Gilles Barbier: We did not perform stress tests yet, 
so we have not yet reached this step. The main issue up to now was that presto 
can be really slow
----
2020-06-13 06:03:14 UTC - Asaf Mesika: That depends on how connector was 
implemented I guess but does it have to do with job processing framework?
----
2020-06-13 06:05:07 UTC - Gilles Barbier: Our hope was to be able to use presto 
to introspect   Workers queues
----
2020-06-13 07:53:12 UTC - Philippe Chavanne: @Philippe Chavanne has joined the 
channel
----

Slack digest for #general - 2020-06-13

Reply via email to