Slack digest for #general - 2020-06-03

Apache Pulsar Slack Wed, 03 Jun 2020 02:11:46 -0700

2020-06-02 10:12:10 UTC - Ebere Abanonu: @Sijie Guo can I suggest that for your
next release, you create a separate section detailing features and changes for
developers. Having all the information in one place will help.
----
2020-06-02 12:00:12 UTC - Ankush: Hi everyone,
I have few questions regarding KeyShared subscription, which I stumbled across
while playing in dev:
1. When using key shared policy `KeySharedPolicy.autoSplitHashRange()` , what
is the internal process of Pulsar to rebalance hash keys when we add/remove a
consumer?
2. When using key shared policy `KeySharedPolicy.stickyHashRange()` , the
documentation says that if we cannot cover the complete range [0, 65535], the
cursor will rewind. What does that mean? How is pulsar handing restarts of a
node when using this policy (for us, we have 4 consumers in k8s and restarting
1 node can take around 1 minute)?
Thanks!
----
2020-06-02 13:48:53 UTC - Ebere Abanonu: @Sijie Guo remember there was a time I
faced an issue of schemaValidation exception even when the schema is correct? I
have been able to reproduce same with better understanding: For instance, I had
a topic named Students with a schema registered, now I tried testing a
different schema with different definition and I called the topic Students-Test
an exception will be thrown but if changed the name to something that does not
have Students in it, it succeeds. When I first encountered this Earlier I was
able to resolve it by restarting Pulsar. For the current issue, I was able to
resolve it by changing the name to sometthing not containing existing topic's
name. Could it have to do with caching
----
2020-06-02 15:56:47 UTC - Alexandre DUVAL: I think I missunderstand something
around TLS and proxying:
----
2020-06-02 15:57:43 UTC - Alexandre DUVAL:
----
2020-06-02 15:58:44 UTC - Alexandre DUVAL: throwing:
----
2020-06-02 15:58:52 UTC - Alexandre DUVAL:
----
2020-06-02 16:00:06 UTC - Alexandre DUVAL: broker_url should not contains the
scheme?
----
2020-06-02 16:00:44 UTC - Alexandre DUVAL: the throw occurs from this call
<https://github.com/apache/pulsar/blob/master/pulsar-proxy/src/main/java/org/apache/pulsar/proxy/server/DirectProxyHandler.java#L126>
----
2020-06-02 16:00:59 UTC - Alexandre DUVAL: and this doesn't take care of
ssl/notssl ?
<https://github.com/apache/pulsar/blob/master/pulsar-proxy/src/main/java/org/apache/pulsar/proxy/server/DirectProxyHandler.java#L119>
----
2020-06-02 16:01:27 UTC - Alexandre DUVAL: WDYT? Issue?
----
2020-06-02 16:12:48 UTC - Chris DiGiovanni: This morning I went to expand
storage on 3 of 4 of my bookies. These are the steps I followed:
1. Add disk to OS, and mounted to a new ledger dir
2. Added ledger dir to the bookkeeper.conf
3. Ran bin/bookkeeper shell updatecookie -expandstorage
4. Disabled Autorecovery
5. Restarted the bookie
6. Once the process connected and was stable, enabled autorecovery
I repeated this process 3 times for the 3 bookies and about 3-5 minutes
in-between each bookie. All expanded their size appropriately, though now I
have underreplicated ledgers that say they have 3 missing replicas. Unsure how
this is possible since all my namespaces have set
```--bookkeeper-ack-quorum 2 --bookkeeper-ensemble 3 --bookkeeper-write-quorum
3````
Here is an example of an underreplicated ledger I'm seeing from
listunderreplicated ledgers:
```3698657
Ctime : 1591105532954
MissingReplica :
<http://chhq-vudpulbk02.us.drwholdings.com:3181|chhq-vudpulbk02.us.drwholdings.com:3181>
MissingReplica :
<http://chhq-vudpulbk01.us.drwholdings.com:3181|chhq-vudpulbk01.us.drwholdings.com:3181>
MissingReplica :
<http://chhq-vudpulbk03.us.drwholdings.com:3181|chhq-vudpulbk03.us.drwholdings.com:3181>```
Not understanding how this is possible or how to fix. My bookies all show
readwrite as well. Any help on steps for recovery would be helpful.
----
2020-06-02 16:23:48 UTC - Sijie Guo: Noted with thanks. @Penghui Li ^^
----
2020-06-02 16:30:09 UTC - Alexandre DUVAL: nvm issue on pulsar-rs
:slightly_smiling_face: broker url must not contains scheme
----
2020-06-02 16:33:41 UTC - Addison Higham: @Alexandre DUVAL that code path is
using raw sockets (via netty). It doesn't really use the pulsar-client which is
what uses the URI scheme to determine TLS. If you look below, it doesn't really
do anything with that URI other than to pull out the host and port.

If you trace through that code you can see that based upon your settings, it
will create the sslHandlerSupplier.

As far as your issue, what sort of discovery are you using for brokers? However
your targetBrokerUrl is getting found appears to lack port information
----
2020-06-02 16:37:25 UTC - Addison Higham: is the creation time of the missing
ledgers before you performed the maintenance or during?
----
2020-06-02 16:39:33 UTC - Penghui Li: 1. The current implementation is split
the largest hash range and there are a new implementation based on consistent
hash. <https://github.com/apache/pulsar/pull/6791>
2. Rewind is reset the read position to the last acknowledged position. If the
broker can’t find any consumer to dispatch some messages, other consumers will
stop consuming messages until these messages can deliver.
----
2020-06-02 16:47:51 UTC - Chris DiGiovanni: Yes it looks to be during the
maintenance.
----
2020-06-02 16:56:42 UTC - Alexandre DUVAL: targetBrokerUrl must not contains
scheme
----
2020-06-02 16:56:49 UTC - Alexandre DUVAL: then it works
----
2020-06-02 16:59:28 UTC - Ankush: Thanks a lot. This is good help.
----
2020-06-02 16:59:44 UTC - Chris DiGiovanni: When I try to recover that ledger
using readledger bookie command I see lines like this:
`2020-06-02 11:58:56.936 [BookieClientScheduler-OrderedExecutor-0-0] ERROR
org.apache.bookkeeper.tools.cli.commands.bookie.ReadLedgerCommand - Failed to
read entry 167 -- No such ledger exists on Bookies`
----
2020-06-02 17:00:10 UTC - Chris DiGiovanni: I ran that against all my bookies
and it says the ledger doesn't exist.
----
2020-06-02 17:33:47 UTC - Addison Higham: do you have any logs from brokers or
metadata from the ledgers to see where they came from?
----
2020-06-02 17:39:46 UTC - Chris DiGiovanni: I'll need to look... Curious the
course of action if I'm able to find the logs or metadata from these ledgers
that is unable to read from these ledgers.
----
2020-06-02 17:41:12 UTC - Addison Higham: you look at the disks and see if you
can see any evidence of them there?
----
2020-06-02 17:47:13 UTC - Chris DiGiovanni: Here is the LedgerMetadata for a
ledger that is missing replicas from all three:
```ledgerID: 2817853
LedgerMetadata{formatVersion=2, ensembleSize=3, writeQuorumSize=3,
ackQuorumSize=2, state=CLOSED, length=692009, lastEntryId=168,
digestType=CRC32C, password=base64:,
ensembles={0=[<http://chhq-vudpulbk03.us.drwholdings.com:3181|chhq-vudpulbk03.us.drwholdings.com:3181>,

<http://chhq-vudpulbk01.us.drwholdings.com:3181|chhq-vudpulbk01.us.drwholdings.com:3181>,

<http://chhq-vudpulbk02.us.drwholdings.com:3181|chhq-vudpulbk02.us.drwholdings.com:3181>]},
customMetadata={component=base64:bWFuYWdlZC1sZWRnZXI=,
pulsar/managed-ledger=base64:ZmlvL3NpZ25hbC9wZXJzaXN0ZW50L3ZvbGFy,
application=base64:cHVsc2Fy}}```
----
2020-06-02 17:55:36 UTC - Chris DiGiovanni: Unfortunately it looks like the
logs rolled off as they are being spammed pretty hard because of the missing
ledgers.
----
2020-06-02 18:03:32 UTC - Chris DiGiovanni: Brokers never showed anything in
their logs and I have logs for the brokers going back 2 days.
----
2020-06-02 18:31:43 UTC - Raphael Enns: I was looking at
<https://pulsar.apache.org/docs/en/deploy-bare-metal/>. We don't need any data
redundancy as the data we're sending doesn't need to last long. We're also not
pushing through a large amount or frequency of data. What would you recommend
for a simple stable production setup? Would 1 zookeeper process, 1 bookkeeper
process and 1 pulsar broker process all running on the same machine work?
----
2020-06-02 19:35:43 UTC - Frank Kelly: Newbie question - where in the Java
client is the tenant / namespace specified in either the producer or the
consumer <https://pulsar.apache.org/docs/en/client-libraries-java/>
----
2020-06-02 19:38:21 UTC - Addison Higham: the `topic` setting, topics can be
done very simple with just a string like `my-topic`, but that then uses your
default tenant and namespace, a topic name of `my-namespace/my-topic` will be
in the default tenant, but in a namespace `my-namespace`.

What most people do is usually fully qualified topic names like:
`<persistent://my-tenant/my-namespace/my-topic>`
----
2020-06-02 19:38:54 UTC - Frank Kelly: Gotcha - thanks - the examples in the
doc were a little confusing.
----
2020-06-03 00:59:12 UTC - Hiroyuki Yamada: Hi, I’m testing Pulsar auto recovery
feature in a k8s environment deployed with Helm.
I’m using the default configuration so the number of replicas of bookie is 4
and Ensemble/WQ/AQ = (3,3,2) and there is an auto recovery pod and auto
recovery is on.

To test auto recovery behavior, after I created some data with pulsar-perf, I
deleted all the files under ledgers (rm -rf
/pulsar/data/bookkeeper/ledgers/current/*) of bookie-0 to simulate a disk
failure. During the testing, I was watching all the logs of recovery and
bookies by `kubectl logs`, but nothing really happened after the ledger data is
deleted. (so looks like the deleted data is not recovered)

Is it the expected behavior ? Am I missing something ?
It would be great if anyone can help me out.
----
2020-06-03 01:43:37 UTC - Rounak Jaggi: We have deployed pulsar cluster using
Terraform/ansible with 3 brokers, 3 bookies, 3 zookeeper and 2 proxy with
pulsar version 2.5.0. Now I have two questions:
1. Now we want to add 2 more brokers, 2 more bookies, 2 more zookeepers and 1
more proxy to the existing cluster environment, I was able to build those new
instance using Terraform easily, just by increasing the number of those
components. How can we configure only those new components with the latest
version of pulsar and not affect the existing components running on the older
version using ansible. Is there a way to do this?
2. How do we do upgrade/migration using Terraform/ansible on aws environment.
----
2020-06-03 04:07:50 UTC - Ken Huang: Hi, how to set the IP address in
bookkeeper so that register to zookeeper
----
2020-06-03 05:30:52 UTC - Pushkar Sawant: Is there a way to improve the
distribution of ledgers across bookkeeper servers. I have 6 node bookkeeper
cluster with 1Tb ledger storage on each node. At the moment 3 nodes are at
around 60% storage utilization and other 3 nodes are around 87% utilization.
----
2020-06-03 06:09:53 UTC - Dhakshin: @Dhakshin has joined the channel
----
2020-06-03 06:49:16 UTC - Sijie Guo: auto recovery contains two tasks. One is
bookie audit task, detecting bookies are gone; the other one is ledger audit,
detecting entries missing. ledger audit is scheduled for very long duration.

for bookie gone case, just kill one bookie and keep it down and you will see
how auto recovery works. for entries gone case, reduce the ledger audit
interval.
----
2020-06-03 06:50:21 UTC - Sijie Guo: it seems like more of a terraform question?
----
2020-06-03 06:50:35 UTC - Sijie Guo: advertisedAddress
----
2020-06-03 06:51:25 UTC - Sijie Guo: How many partitions do you have?
----
2020-06-03 07:30:49 UTC - Dhakshin: Hi,
Unable to load consumer metrics in prometheus after enabled
"exposeTopicLevelMetricsInPrometheus=true" and
"exposeConsumerLevelMetricsInPrometheus=true" properties in broker.conf file
----
2020-06-03 07:31:59 UTC - Hiroyuki Yamada: Thank you !
----

Slack digest for #general - 2020-06-03

Reply via email to