2020-09-11 09:12:24 UTC - Rolf Arne Corneliussen: Is broker-to-broker communication restricted to geo-replication? ---- 2020-09-11 09:45:11 UTC - Vil: what does you mean? ---- 2020-09-11 10:00:42 UTC - Rolf Arne Corneliussen: I could also ask: Do brokers communicate with other brokers within the same cluster? ---- 2020-09-11 10:01:16 UTC - Vil: i think no they do not communicate ---- 2020-09-11 10:04:27 UTC - Rolf Arne Corneliussen: Ok, thanks. There is a section in the `broker.conf` file that suggest otherwise, but the in-line documentation might be wrong or out-dated: `# Authentication settings of the broker itself. Used when the broker connects to other brokers,` `# either in same or other clusters` `brokerClientTlsEnabled=false` `brokerClientAuthenticationPlugin=` `brokerClientAuthenticationParameters=` `brokerClientTrustCertsFilePath=` ---- 2020-09-11 10:08:00 UTC - Vil: not sure then. i have see this setting for the pulsar proxy <https://pulsar.apache.org/docs/en/administration-proxy/> ---- 2020-09-11 10:08:24 UTC - Vil: maybe @Addison Higham knows? ---- 2020-09-11 10:21:38 UTC - Rolf Arne Corneliussen: Yes, the reason for asking is that I want to use a pulsar proxy in front of each cluster, and I want to have a different authentication mechanism when connecting to a remote cluster than the brokers within a cluster are configured to require from connecting clients. If brokers communicate within the same cluster, this does not seem to possible. ---- 2020-09-11 13:35:58 UTC - Frank Kelly: Brokers DO need to talk to brokers within the same cluster so that they can direct the incoming caller to the right broker which maintains the Topic/Partition. +1 : Vil ---- 2020-09-11 13:39:45 UTC - Frank Kelly: See for example <https://github.com/apache/pulsar/blob/dba0b65a2fd03586cb3d17a306fbda5322318d22/pulsar-broker/src/main/java/org/apache/pulsar/broker/lookup/TopicLookupBase.java#L82-L95> ---- 2020-09-11 13:49:43 UTC - Vil: thanks @Frank Kelly. my mistake. i thought about bookeeper bookies. not pulsar brokers ---- 2020-09-11 13:50:14 UTC - Frank Kelly: No worries +1 : Vil ---- 2020-09-11 14:42:39 UTC - Addison Higham: The consumer api exposes a channel based interface, you can see an example in the tests here: <https://github.com/apache/pulsar-client-go/blob/master/pulsar/consumer_test.go#L457>
That then allows you to write a more common select loop somewhat like this example: <https://tour.golang.org/concurrency/5> +1 : Toktok Rambo ---- 2020-09-11 14:45:46 UTC - Addison Higham: @Rolf Arne Corneliussen there are some strategies for this situation, the authentication providers API is chained, so you can provide multiple implementations. You can then configure your brokers to use a different auth plugin from your remote from your external clients and the brokers can validate either +1 : Vil ---- 2020-09-11 15:01:57 UTC - Rolf Arne Corneliussen: Thanks for your input @Frank Kelly. I am not familiar with the code, but it is not obvious to me that the one broker communicates with another to provide a LookupResult? I though the look up process only involves getting metadata from ZooKeeper, but I may be mistaken. ---- 2020-09-11 15:12:54 UTC - Rolf Arne Corneliussen: Thanks @Addison Higham. I will try `authenticationProviders=org.apache.pulsar.broker.authentication.AuthenticationProviderToken,org.apache.pulsar.broker.authentication.AuthenticationProviderTls.` And `brokerClientAuthenticationPlugin=org.apache.pulsar.client.impl.auth.AuthenticationTls`. I need the latter for geo-replication. If a broker communicates with a peer within the same cluster, I guess it will use TLS client auth, as there is no settings to get it to use tokens. But I have tried to use token authentication in `conf/client.conf` and that works with `pulsar-admin` and `pulsar-client`. ---- 2020-09-11 15:22:28 UTC - Addison Higham: yes, we should perhaps have an additional set of settings for replication clients to use a different client, but at the moment they share the same props ---- 2020-09-11 15:27:09 UTC - Rolf Arne Corneliussen: Thanks for helping out, @Addison Higham . I might create an issue/PR on github if this turns out to be essential for our setup. ---- 2020-09-11 16:17:13 UTC - Evan Furman: following back up here. Would love to get topic level metrics working ---- 2020-09-11 16:29:15 UTC - Sijie Guo: Sorry that I missed the message. ---- 2020-09-11 16:29:27 UTC - Sijie Guo: The output doesn’t look like a broker metric ---- 2020-09-11 16:29:48 UTC - Sijie Guo: Are you sure that you are curling the `/metrics` endpoint of brokers? ---- 2020-09-11 16:30:00 UTC - Evan Furman: double checking now ---- 2020-09-11 16:33:58 UTC - Evan Furman: those are the node exporter metrics ---- 2020-09-11 16:35:28 UTC - Evan Furman: ```efurman-mbp15 ansible-ops [feature/pulsar●] curl <http://10.3.21.238:8080/metrics> efurman-mbp15 ansible-ops [feature/pulsar●] ``` ---- 2020-09-11 16:38:23 UTC - Evan Furman: It looks like that should be right according to <https://pulsar.apache.org/docs/en/reference-metrics/#broker> ---- 2020-09-11 16:39:05 UTC - Evan Furman: ```[ec2-user@ip-10-3-21-238 ~]$ grep webServicePort /opt/pulsar/conf/broker.conf webServicePort=8080 webServicePortTls=8443``` ---- 2020-09-11 16:40:35 UTC - Evan Furman: ```[ec2-user@ip-10-3-21-238 ~]$ grep -i metric /opt/pulsar/conf/broker.conf | grep -v '#' replicationMetricsEnabled=true exposeTopicLevelMetricsInPrometheus=true exposeConsumerLevelMetricsInPrometheus=true``` ---- 2020-09-11 16:54:12 UTC - Sijie Guo: Can you go to the broker pods and curl the `/metrics` endpoint locally? ---- 2020-09-11 16:54:42 UTC - Evan Furman: we’re not running in k8s — we are using terraform/ansible to provision on ec2 ---- 2020-09-11 16:55:25 UTC - Evan Furman: ```[ec2-user@ip-10-3-21-238 ~]$ curl localhost:8080/metrics [ec2-user@ip-10-3-21-238 ~]$ ``` ---- 2020-09-11 16:55:33 UTC - Evan Furman: this is from a broker ^ ---- 2020-09-11 17:11:28 UTC - Caleb Epstein: I'm a bit confused about the difference between (in the C++ API) the Client.subscribe(string, ...) and Client.subscribe(vector<string> ...) methods? If I try to subscribe for N topics using N subscribe calls (and N Consumers), is this fundamentally different than doing one .subscribe passing a vector of length N? I seem not be be getting any messages in the vector mode. Having a single Consumer to call .receieve makes my implementation simpler (only one thing to call `.receive` on) but I'm not getting any data. ---- 2020-09-11 17:12:49 UTC - Brian Li: @Brian Li has joined the channel ---- 2020-09-11 17:17:17 UTC - Brian Li: Hi experts, I'm new to pulsar. I'm on 2.5.2. Recently I found that when I used jmap -histo:live and found the top hits are from pulsar. Do you have any idea on why it has so many instances? ---- 2020-09-11 17:18:52 UTC - Evan Furman: Yea it doesn’t seem like we’re getting broker metrics ---- 2020-09-11 17:43:54 UTC - jason_webb: @jason_webb has joined the channel ---- 2020-09-11 17:55:20 UTC - Addison Higham: Is this in your application with the pulsar-client? If so, how many producers and consumers are you using? ---- 2020-09-11 18:24:53 UTC - jason_webb: Hi! I am new to pulsar and looking at geo-replication and subscription types to see how we can solve a few problems. From the docs it was a little hard to tell how replicated subscriptions work with different subscription types (key_shared)? ---- 2020-09-11 18:37:56 UTC - Addison Higham: replicated subscriptions simply snapshot their state periodically and replicate it. It is useful in failover scenarios, but does not allow of a true "global" subscription. What problems are you looking to solve? ---- 2020-09-11 19:05:40 UTC - jason_webb: Thanks for the quick response! We were looking at how to replicate data from one system to another. Writes could be active/active across two regions. I was interested to see if there was a way to be able to consume from two regions but shared the data based on a key, so the keyspace is distributed and pinned to the consuming regions. Also, if the region goes down, be able to reassign the keys. I guess the key_shared functionality but with global distribution. ---- 2020-09-11 19:24:21 UTC - Addison Higham: Ah I think your colleague Amit was asking me earlier today along these same lines. One thing I didn't mention to him: do you know which keys should go to which region when you produce? if so, it should be possible to write your own partitioner on the producer, then the consumer would only read from the partitions it needs to for it's regions, then, if it can't connect to it's own region's cluster, it could fallback to reading from the other cluster. But, it is possible to share a subscription across regions if you have your zookeeper and bookkeeper clusters cross regions and then just brokers in each region. That is doable, but does take a fair bit of tuning to get stable, which is something that we could help with ---- 2020-09-11 19:46:28 UTC - Caleb Epstein: Actually it appears that if the same topic appears more than once in that vector, the whole subscription is stuck in the way I describe. Need to make sure the list of topics is unique. ---- 2020-09-11 23:34:41 UTC - Mehdi Laouichi: @Mehdi Laouichi has joined the channel ---- 2020-09-11 23:50:37 UTC - Mehdi Laouichi: My brokers (2.5.1) encounter errors on a specific topic partition with "Failed readOffloaded". I have confirmed the ledger was NOT offloaded to S3 for some reason. The ledger is still readable from the bookies. I tried skipping messages to move the subscription cursor to the next ledger but it eventually fails a couple of ledgers later. Any idea on how to unblock the situation? (Trace in replies) ---- 2020-09-11 23:51:08 UTC - Mehdi Laouichi: ```23:37:47.908 [pulsar-io-26-2] INFO org.apache.pulsar.broker.service.persistent.PersistentDispatcherMultipleConsumers - [<persistent://xxx/xxx/topic_name-partition-1> / SubscriptionName] Retrying read operation 23:37:47.986 [offloader-OrderedScheduler-0-0] ERROR org.apache.bookkeeper.mledger.offload.jcloud.impl.BlobStoreManagedLedgerOffloader - Failed readOffloaded: java.lang.NullPointerException: null at org.apache.bookkeeper.mledger.offload.jcloud.impl.BlobStoreManagedLedgerOffloader.lambda$new$1(BlobStoreManagedLedgerOffloader.java:153) ~[?:?] at org.apache.bookkeeper.mledger.offload.jcloud.impl.BlobStoreBackedReadHandleImpl.open(BlobStoreBackedReadHandleImpl.java:196) ~[?:?] at org.apache.bookkeeper.mledger.offload.jcloud.impl.BlobStoreManagedLedgerOffloader.lambda$readOffloaded$5(BlobStoreManagedLedgerOffloader.java:556) ~[?:?] at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [?:1.8.0_252] at com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:125) [com.google.guava-guava-25.1-jre.jar:?] at com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:57) [com.google.guava-guava-25.1-jre.jar:?] at com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:78) [com.google.guava-guava-25.1-jre.jar:?] at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [?:1.8.0_252] at java.util.concurrent.FutureTask.run(FutureTask.java:266) [?:1.8.0_252] at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180) [?:1.8.0_252] at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) [?:1.8.0_252] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_252] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_252] at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) [io.netty-netty-common-4.1.48.Final.jar:4.1.48.Final] at java.lang.Thread.run(Thread.java:748) [?:1.8.0_252] 23:37:48.016 [bookkeeper-ml-workers-OrderedExecutor-1-0] ERROR org.apache.bookkeeper.mledger.impl.ManagedLedgerImpl - [xxx/xxx/persistent/topic_name-partition-1] Error opening ledger for reading at position 25424:0 - org.apache.bookkeeper.mledger.ManagedLedgerException: Unknown exception 23:37:48.016 [bookkeeper-ml-workers-OrderedExecutor-1-0] WARN org.apache.bookkeeper.mledger.impl.OpReadEntry - [xxx/xxx/persistent/topic_name-partition-1][SubscriptionName] read failed from ledger at position:25424:0 : Unknown exception 23:37:48.016 [bookkeeper-ml-workers-OrderedExecutor-1-0] ERROR org.apache.pulsar.broker.service.persistent.PersistentDispatcherMultipleConsumers - [<persistent://xxx/xxx/persistent/topic_name-partition-1> / SubscriptionName] Error reading entries at 25424:0 : Unknown exception, Read Type Normal - Retrying to read in 55.893 seconds``` ---- 2020-09-11 23:51:24 UTC - Mehdi Laouichi: ```$> pulsarctl bookkeeper ledger get 25424 { "25424": { "storeCtime": false, "hasPassword": false, "metadataFormatVersion": 3, "ensembleSize": 2, "writeQuorumSize": 2, "ackQuorumSize": 2, "length": 2875, "lastEntryId": 4, "ctime": 1599309659578, "cToken": 0, "state": "CLOSED", "digestType": "CRC32C", "allEnsembles": { "0": [ { "port": 3181, "hostname": "xxx-3.us-west-2-bookie.pulsar.svc.cluster.local" }, { "port": 3181, "hostname": "xxx-4.us-west-2-bookie.pulsar.svc.cluster.local" } ] }, "currentEnsemble": null, "password": "", "customMetadata": { "application": "cHVsc2Fy", "component": "bWFuYWdlZC1sZWRnZXI=", "pulsar/managed-ledger": "c3BlbmRsYWJzLXNhcy1kZXYvcGd3L3BlcnNpc3RlbnQvdHJhbnNhY3Rpb25zX2NsZWFyZWQtcGFydGl0aW9uLTE=" } } }``` ---- 2020-09-11 23:51:59 UTC - Addison Higham: @Mehdi Laouichi what does `pulsar-admin topics stats-internal <topic>` give you? ---- 2020-09-11 23:52:41 UTC - Addison Higham: that will tell you if pulsar thinks it is offloaded or not. Also, there is a thing called "offload deletion lag" which is a period where the ledger has been offloaded but not yet deleted from bookkeeper ---- 2020-09-11 23:54:14 UTC - Mehdi Laouichi: <https://gist.github.com/sruon/9968781ccdd10077a8698c395d632967> ---- 2020-09-12 00:09:20 UTC - Addison Higham: hrm... perhaps the format of 2.5.1 doesn't tell if the ledgers were offloaded, there is an `offloaded: true/false` for each ledger version in what I have locally (2.6.1). The easiest way I can to think of getting that data then would be to look at zookeeper data ---- 2020-09-12 00:13:11 UTC - Mehdi Laouichi: any idea how it would look in zookeeper? Been browsing the tree but not sure where to look for it ---- 2020-09-12 00:13:38 UTC - Addison Higham: if you can get onto one of the zookeeper boxes, you can run `pulsar zookeeper-shell` and then in the `/managed-ledgers/<tenant>/<namespace>/persistent/<topic>` it should have a record there I do believe ---- 2020-09-12 05:50:15 UTC - Sijie Guo: you need to run `curl -L localhost:8080/metrics` ----
