Slack digest for #general - 2020-05-28

Apache Pulsar Slack Thu, 28 May 2020 02:11:24 -0700

2020-05-27 09:24:25 UTC - lihan: @lihan has joined the channel
----
2020-05-27 12:01:46 UTC - Patrik Kleindl: Just for reference, I had to to the 
following to use python 3 in the container
`docker exec --privileged focused_merkle update-alternatives --install 
/usr/bin/python python /usr/bin/python3.7 10`


`docker exec focused_merkle python -m pip install --upgrade pip`

`docker exec focused_merkle python -m pip install pulsar-client`

`docker exec focused_merkle python -m pip install launchpadlib==1.10.6`
----
2020-05-27 12:30:32 UTC - Deepak Sah: @Penghui Li Hi, I list topics and I see 
none but when I try to create the same topic it gives me error
----
2020-05-27 12:33:44 UTC - Penghui Li: You can list the partitioned topic, the 
broker only delete the partition but not partitioned topic.
----
2020-05-27 12:40:45 UTC - Deepak Sah: So, If I try to publish a message on that 
topic, would that supposed to work?
----
2020-05-27 12:41:51 UTC - Penghui Li: I think yes, the partition is auto 
created by the broker.
----
2020-05-27 12:58:20 UTC - Deepak Sah: Thanks for your help. let me try, will 
let you know
----
2020-05-27 12:58:40 UTC - Penghui Li: You are welcome
----
2020-05-27 16:31:55 UTC - Deepak Sah: Hi, when I do `stats-internal` on a topic 
name it says it does not exists but on specifying a particular partition within 
that topic it works. But, the docs doesn’t mention this :face_with_rolling_eyes:
----
2020-05-27 16:37:54 UTC - Sébastien de Melo: Hi, we are running Pulsar 2.3.2 on 
AWS EKS, with 3 ZK, 4 BK, 3 brokers and 3 proxies.  It usually works well but 
we encountered an issue with our 4th bookkeeper (bookkeeper-3) yesterday.  It 
suddenly took a lot of resources (12.5 CPU and 15 GB RAM, instead of 0.25 and 
2.5 GB in normal circumstances) and our entire cluster went down. No more 
messages were streamed.  There were 2 stacktraces:
```14:51:10.064 [io-write-scheduler-OrderedScheduler-1-0] ERROR 
org.apache.bookkeeper.statelib.impl.rocksdb.checkpoint.RocksCheckpointer - 
Failed to restore rocksdb 
000000000000000007/000000000000000001/000000000000000000
java.io.FileNotFoundException: 
000000000000000007/000000000000000001/000000000000000000/checkpoints/338f76da-b345-42ba-9cd7-72ffeafb1ffd/metadata
        at 
org.apache.bookkeeper.statelib.impl.rocksdb.checkpoint.dlog.DLCheckpointStore.openInputStream(DLCheckpointStore.java:92)
 ~[org.apache.bookkeeper-statelib-4.9.2.jar:4.9.2]
        at 
org.apache.bookkeeper.statelib.impl.rocksdb.checkpoint.RocksCheckpointer.getLatestCheckpoint(RocksCheckpointer.java:117)
 ~[org.apache.bookkeeper-statelib-4.9.2.jar:4.9.2]
        at 
org.apache.bookkeeper.statelib.impl.rocksdb.checkpoint.RocksCheckpointer.restore(RocksCheckpointer.java:52)
 ~[org.apache.bookkeeper-statelib-4.9.2.jar:4.9.2]
        at 
org.apache.bookkeeper.statelib.impl.kv.RocksdbKVStore.loadRocksdbFromCheckpointStore(RocksdbKVStore.java:161)
 ~[org.apache.bookkeeper-statelib-4.9.2.jar:4.9.2]
        at 
org.apache.bookkeeper.statelib.impl.kv.RocksdbKVStore.init(RocksdbKVStore.java:223)
 ~[org.apache.bookkeeper-statelib-4.9.2.jar:4.9.2]
        at 
org.apache.bookkeeper.statelib.impl.journal.AbstractStateStoreWithJournal.lambda$initializeLocalStore$5(AbstractStateStoreWithJournal.java:202)
 ~[org.apache.bookkeeper-statelib-4.9.2.jar:4.9.2]
        at 
org.apache.bookkeeper.statelib.impl.journal.AbstractStateStoreWithJournal.lambda$executeIO$16(AbstractStateStoreWithJournal.java:471)
 ~[org.apache.bookkeeper-statelib-4.9.2.jar:4.9.2]
        at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) 
[?:1.8.0_181]
        at 
com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:111)
 [com.google.guava-guava-21.0.jar:?]
        at 
com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:58)
 [com.google.guava-guava-21.0.jar:?]
        at 
com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:75)
 [com.google.guava-guava-21.0.jar:?]
        at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) 
[?:1.8.0_181]
        at java.util.concurrent.FutureTask.run(FutureTask.java:266) 
[?:1.8.0_181]
        at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
 [?:1.8.0_181]
        at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
 [?:1.8.0_181]
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) 
[?:1.8.0_181]
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) 
[?:1.8.0_181]
        at 
io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
 [io.netty-netty-all-4.1.32.Final.jar:4.1.32.Final]
        at java.lang.Thread.run(Thread.java:748) [?:1.8.0_181]```
and

----
2020-05-27 16:37:54 UTC - Sébastien de Melo: ```14:51:10.066 
[io-write-scheduler-OrderedScheduler-1-0] WARN  
org.apache.bookkeeper.stream.storage.impl.sc.StorageContainerRegistryImpl - 
De-registered StorageContainer ('7') when failed to start
java.util.concurrent.CompletionException: 
org.apache.bookkeeper.statelib.api.exceptions.StateStoreException: Failed to 
restore rocksdb 000000000000000007/000000000000000001/000000000000000000
        at 
java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:292)
 ~[?:1.8.0_181]
        at 
java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:308)
 ~[?:1.8.0_181]
        at 
java.util.concurrent.CompletableFuture.uniCompose(CompletableFuture.java:943) 
~[?:1.8.0_181]
        at 
java.util.concurrent.CompletableFuture$UniCompose.tryFire(CompletableFuture.java:926)
 ~[?:1.8.0_181]
        at 
java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474) 
~[?:1.8.0_181]
        at 
java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:1977)
 ~[?:1.8.0_181]
        at 
org.apache.bookkeeper.statelib.impl.journal.AbstractStateStoreWithJournal.lambda$executeIO$16(AbstractStateStoreWithJournal.java:474)
 ~[org.apache.bookkeeper-statelib-4.9.2.jar:4.9.2]
        at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) 
[?:1.8.0_181]
        at 
com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:111)
 [com.google.guava-guava-21.0.jar:?]
        at 
com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:58)
 [com.google.guava-guava-21.0.jar:?]
        at 
com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:75)
 [com.google.guava-guava-21.0.jar:?]
        at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) 
[?:1.8.0_181]
        at java.util.concurrent.FutureTask.run(FutureTask.java:266) 
[?:1.8.0_181]
        at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
 [?:1.8.0_181]
        at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
 [?:1.8.0_181]
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) 
[?:1.8.0_181]
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) 
[?:1.8.0_181]
        at 
io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
 [io.netty-netty-all-4.1.32.Final.jar:4.1.32.Final]
        at java.lang.Thread.run(Thread.java:748) [?:1.8.0_181]
Caused by: org.apache.bookkeeper.statelib.api.exceptions.StateStoreException: 
Failed to restore rocksdb 
000000000000000007/000000000000000001/000000000000000000
        at 
org.apache.bookkeeper.statelib.impl.rocksdb.checkpoint.RocksCheckpointer.restore(RocksCheckpointer.java:84)
 ~[org.apache.bookkeeper-statelib-4.9.2.jar:4.9.2]
        at 
org.apache.bookkeeper.statelib.impl.kv.RocksdbKVStore.loadRocksdbFromCheckpointStore(RocksdbKVStore.java:161)
 ~[org.apache.bookkeeper-statelib-4.9.2.jar:4.9.2]
        at 
org.apache.bookkeeper.statelib.impl.kv.RocksdbKVStore.init(RocksdbKVStore.java:223)
 ~[org.apache.bookkeeper-statelib-4.9.2.jar:4.9.2]
        at 
org.apache.bookkeeper.statelib.impl.journal.AbstractStateStoreWithJournal.lambda$initializeLocalStore$5(AbstractStateStoreWithJournal.java:202)
 ~[org.apache.bookkeeper-statelib-4.9.2.jar:4.9.2]
        at 
org.apache.bookkeeper.statelib.impl.journal.AbstractStateStoreWithJournal.lambda$executeIO$16(AbstractStateStoreWithJournal.java:471)
 ~[org.apache.bookkeeper-statelib-4.9.2.jar:4.9.2]
        ... 12 more
Caused by: java.io.FileNotFoundException: 
000000000000000007/000000000000000001/000000000000000000/checkpoints/338f76da-b345-42ba-9cd7-72ffeafb1ffd/metadata
        at 
org.apache.bookkeeper.statelib.impl.rocksdb.checkpoint.dlog.DLCheckpointStore.openInputStream(DLCheckpointStore.java:92)
 ~[org.apache.bookkeeper-statelib-4.9.2.jar:4.9.2]
        at 
org.apache.bookkeeper.statelib.impl.rocksdb.checkpoint.RocksCheckpointer.getLatestCheckpoint(RocksCheckpointer.java:117)
 ~[org.apache.bookkeeper-statelib-4.9.2.jar:4.9.2]
        at 
org.apache.bookkeeper.statelib.impl.rocksdb.checkpoint.RocksCheckpointer.restore(RocksCheckpointer.java:52)
 ~[org.apache.bookkeeper-statelib-4.9.2.jar:4.9.2]
        at 
org.apache.bookkeeper.statelib.impl.kv.RocksdbKVStore.loadRocksdbFromCheckpointStore(RocksdbKVStore.java:161)
 ~[org.apache.bookkeeper-statelib-4.9.2.jar:4.9.2]
        at 
org.apache.bookkeeper.statelib.impl.kv.RocksdbKVStore.init(RocksdbKVStore.java:223)
 ~[org.apache.bookkeeper-statelib-4.9.2.jar:4.9.2]
        at 
org.apache.bookkeeper.statelib.impl.journal.AbstractStateStoreWithJournal.lambda$initializeLocalStore$5(AbstractStateStoreWithJournal.java:202)
 ~[org.apache.bookkeeper-statelib-4.9.2.jar:4.9.2]
        at 
org.apache.bookkeeper.statelib.impl.journal.AbstractStateStoreWithJournal.lambda$executeIO$16(AbstractStateStoreWithJournal.java:471)
 ~[org.apache.bookkeeper-statelib-4.9.2.jar:4.9.2]
        ... 12 more```
----
2020-05-27 16:38:16 UTC - Sébastien de Melo: We tried to restart the bookkeeper 
but the error came back.  We also tried the recover subcommand (in bookkeeper 
shell, once with bookkeeper-3 as argument, once with bookkeeper-0) but without 
success.
Then we chose to delete the volumes of bookkeeper-3, hoping that it would 
rebuild itself at restart.  But instead the pod crashed in loop because of the 
following error:
```There are directories without a cookie, and this is neither a new 
environment, nor is storage expansion enabled. Empty directories are 
[data/bookkeeper/journal/current, data/bookkeeper/ledgers/current]```
We finally redeployed the entire cluster.

We found issues related to this problem: 
<https://github.com/apache/pulsar/issues/5668>, 
<https://github.com/apache/pulsar/issues/6894>, 
<https://github.com/apache/pulsar/issues/3121>.
It seems that it is not resolved yet, but is there a way to fix it, in case it 
happens again (e.g. a command to launch)?
----
2020-05-27 16:40:22 UTC - Sébastien de Melo: 
----
2020-05-27 17:18:54 UTC - Kai Levy: Is there a way for me to check when a 
ledger can be deleted? I am having trouble with a topic that is holding onto 
more data than expected.. the only cursor on the topic is pointed at the most 
recent ledger, but it still has many full ledgers sitting around
----
2020-05-27 17:19:25 UTC - Matteo Merli: Is there time retention set?
----
2020-05-27 17:24:13 UTC - Kai Levy: Yes, there is
----
2020-05-27 17:51:25 UTC - Sijie Guo: @Ken Huang I don’t think functions support 
geo-replication yet.
----
2020-05-27 17:52:42 UTC - Sijie Guo: Because function implementation is done 
with pulsar topics, the topic metadata is actually geo-replicated. So you can 
get it “partially” work.
----
2020-05-27 17:52:55 UTC - Sijie Guo: However I think there are still many 
problems unresolved.
----
2020-05-27 17:53:28 UTC - Sijie Guo: So I wouldn’t recommend you doing that at 
this moment. Instead, can you raise a Github issue?
----
2020-05-27 17:53:35 UTC - Sijie Guo: So we can look into how to support it.
----
2020-05-27 17:54:50 UTC - Sijie Guo: Do you use state storage for pulsar 
functions?
----
2020-05-27 17:55:02 UTC - Sijie Guo: If not, I would recommend disabling stat 
storage.
----
2020-05-27 17:55:45 UTC - Sijie Guo: Set `extraServerComponents` to empty.
----
2020-05-27 18:24:40 UTC - Adriaan de Haan: Hi, today I did a demo for my CEO on 
how cool Kubernetes and Pulsar is - so as part of the ad-hoc "demo" I scaled 
down the number of bookies to 2 and then scaled them back up to 3. It seems to 
have worked without a hitch, but now my topics are not working anymore... I 
keep getting:
```Received send error from server: PersistenceError : 
org.apache.bookkeeper.mledger.ManagedLedgerException: Not enough non-faulty 
bookies available```
I checked the bookies and they seem to be ok - I saw some errors of:
```ERROR org.apache.bookkeeper.proto.WriteEntryProcessor - Attempt to write to 
fenced ledger```
In the bookie logs, but that seems to have been just temporary (probably during 
the scaling of the bookies)
----
2020-05-27 18:25:29 UTC - Matteo Merli: I think that stats-internals is not 
printing the timestamp associated with a ledger, but we should add that
----
2020-05-27 18:25:36 UTC - Matteo Merli: since the info is known internally
ok_hand : Konstantinos Papalias
----
2020-05-27 18:25:40 UTC - Adriaan de Haan: I run sanity check and "simple test" 
on bookkeeper, both seem to work just fine
----
2020-05-27 18:31:04 UTC - Kai Levy: yes the timestamp would definitely be 
useful..
----
2020-05-27 18:33:53 UTC - Adriaan de Haan: Any advice on where to look next to 
troubleshoot this? I have a 3 node setup and have ensemble set to 3, write 
quorum of 3 and ack of 2
----
2020-05-27 18:34:15 UTC - Adriaan de Haan: v2.5.2 deployed on kubernetes
----
2020-05-27 18:36:36 UTC - Addison Higham: @Adriaan de Haan a few questions, are 
you using the helm charts?
----
2020-05-27 18:36:52 UTC - Addison Higham: also, did your bookkeeper pods get 
rescheduled at all?
----
2020-05-27 18:39:03 UTC - Adriaan de Haan: yes, I am using the helm install
----
2020-05-27 18:40:04 UTC - Adriaan de Haan: and the scale down took the one 
bookie down and the scale up (back to 3) brought a new one up successfully.
----
2020-05-27 18:40:05 UTC - Addison Higham: oh NM I misread that, so yes @Adriaan 
de Haan there is a known issues with brokers connecting to bookies during an IP 
address change. Because bookies are keyed by hostname and the IP address 
changes, The sockets stay open to the old bookie IP address and it can take up 
to 11 minutes for the OS to timeout the sockets (due to TCP keepalive).
----
2020-05-27 18:41:08 UTC - Adriaan de Haan: This problem is still present, hours 
later I still cannot communicate with the topics even on a totally new PC
----
2020-05-27 18:41:42 UTC - Addison Higham: you can try restarting your brokers, 
that should fix the issue
----
2020-05-27 18:42:30 UTC - Adriaan de Haan: let me try that
----
2020-05-27 18:48:20 UTC - Addison Higham: assuming that this is the issue 
(which it sounds like, but could be wrong!) there are some OS level tweaks to 
help mitigate the problem but I think there may be some discussions about how 
to handle this in pulsar/bookkeeper
----
2020-05-27 18:48:59 UTC - Frank Kelly: @Addison Higham is there a Github issue 
for that known issue? Thanks. CC: @Andy Papia Interesting :point_up::point_up:
----
2020-05-27 18:51:34 UTC - Addison Higham: oh do you know which issue? So we 
confirmed that lowering OS level keep alive helps the situation. It seems like 
the right answer (some conversation in <#C5ZSVEN4E|dev> about it) would be to 
implement heartbeats for bookkeeper connections
----
2020-05-27 18:55:03 UTC - Frank Kelly: Perhaps 
<https://github.com/apache/pulsar/issues/6154> ? Although the content of the 
ticket suggests a different root cause (even if the symptom is the same).
----
2020-05-27 19:17:55 UTC - Adriaan de Haan: It solved the problem... yeah I was 
about to ask how do I ensure my cluster remains healthy in an automated fashion 
- it seems a bit fragile
----
2020-05-27 19:37:33 UTC - Sijie Guo: I think it is a different thing in the 
above issue.

Not enough non-faulty bookies can be thrown in the following situation:

1. number of writable bookies is smaller than ensemble size. bookies can be 
turned to readonly when disk usage is above 95%; or they can be gone.
2. There is enough writable bookies in the cluster. But the broker is not able 
to connect to some of them (due to network issue or tcp connection issue). So 
from broker’s perspective, there is no enough bookies. 
----
2020-05-27 19:37:59 UTC - Sijie Guo: The issue you attached is the first 
situation.
----
2020-05-27 19:38:15 UTC - Sijie Guo: The issue Addison discussed in the general 
channel is the second situation.
----
2020-05-27 19:44:02 UTC - Addison Higham: 2.5.2 (along with the latest helm 
charts) has a number of fixes that have improved things a fair bit. There are 
options on how to do this, the default helm chart has a healthcheck which 
ensures the brokers are up, but may not be writable (this bookkeeper issue). In 
our clusters, we run a full healthcheck that ensures you can write and replace 
brokers if they aren't writable. This has some upsides in that we can recover 
from this issue (brokers will get restarted) but has the downside of that you 
can't serve read requests and can have complete downtime
----
2020-05-27 19:45:18 UTC - Frank Kelly: Thanks for the clarification @Sijie Guo 
- is there a GitHub issue to track the Addison's situation?
----
2020-05-27 19:45:50 UTC - Addison Higham: writes are really important for us, 
so currently we elect that. Running on k8s is great, but there are a few unique 
challenges that are getting sussed out and are pretty high priority to get 
fixed (AFAICT, lots of work has happened in the last few months already)
+1 : Adriaan de Haan, Konstantinos Papalias
----
2020-05-27 19:50:26 UTC - Addison Higham: there are a number of situations in 
which it can happen, one of which I know is being tracked in BK that has an 
open issue. But there likely needs to be a PIP/(BIP if that is a thing?) about 
some plan for handling it more generally (perhaps via a heartbeat, which is how 
pulsar handles this problem with clients)
----
2020-05-27 20:07:31 UTC - Sijie Guo: I don’t think we need heartbeat there. A 
simple fix is on bookie watcher when a new node joins, it will check if the ip 
address is changed for a bookie id.
----
2020-05-27 20:07:38 UTC - Josh Haft: Any other thoughts on this proxy 502 
issue? Has anyone else seen this? When it begins happening, I observe that no 
requests for the admin API are being sent to one of the brokers. Seems like 
this could indicate some stale connection attempting to be used by the proxy 
which probably corresponds to the occasional 502 being returned to the client. 
There are no obvious issues in the broker logs and pulsar requests seem to come 
through fine. Requests to other brokers  for the admin API are forwarded fine, 
and those return 200 to the client.
----
2020-05-27 20:11:54 UTC - Addison Higham: @Josh Haft this very much sounds like 
where, when using zookeeper for broker discovery can result in proxies being 
out of sync.

What we have seen happen specifically:
- a broker changes, this causes a watch to be triggered in the broker
- this watch causes a refresh of broker metadata which makes multiple requests 
to zookeeper
- one of these requests can fail, resulting in the old list staying around.
- there is no periodic refresh, so proxies will not get re-synced unless you 
restart
----
2020-05-27 20:15:20 UTC - Addison Higham: we moved to using `brokerServiceURL` 
and moved away from zookeeper discovery. If that is an option, recommend that, 
if that isn't an option, you can do a few things:
- put a health check on proxies that test a certain endpoint and triggers the 
restarts of the proxy
- work happens in pulsar proxy to have an alternate mechanism to ensure the 
list of brokers is consistent by periodic polling rather than just having 
watches triggering a sync. That would at least make it so that it eventually 
gets the correct list of brokers
----
2020-05-27 20:17:06 UTC - Addison Higham: @Sijie Guo it seems like lots of 
others things could cause connection issues though? it might not JUST be an ip 
address change but also just a networking fault that would result in sockets to 
be killed and waiting on socket keep alive?
----
2020-05-27 20:17:22 UTC - Sijie Guo: Correct.
----
2020-05-27 20:17:58 UTC - Sijie Guo: hearbeat is a mid-term to long-term 
solution. which  has compatibility considerations as well.
----
2020-05-27 20:18:30 UTC - Sijie Guo: what I proposed is a short-term fix that 
is specific to the situation. it will not have any compatibility issues.
----
2020-05-27 20:18:56 UTC - Addison Higham: gotcha, yeah, protocol changes more 
tricky for sure
----
2020-05-27 20:19:22 UTC - Josh Haft: @Addison Higham Appreciate the feedback. I 
see that recommendation for k8s environments, is that where you saw these 
issues? Our brokers are running on VMs.
----
2020-05-27 20:22:55 UTC - Addison Higham: yes, we run in k8s, if you aren't 
running in k8s, any sort of load balancer or name resolution work as well. But 
if none of those are options, your options to fix it without changing pulsar 
are either to do some health checks that restarts proxies when they get in a 
weird state and/or to tune zookeeper to really minimize the likelihood of 
failed queries. Obviously, this is all assuming quite a bit! if you see this 
happen after a broker change, likely the problem, otherwise, could be some 
other issue. The logs make it pretty clear though, I don't have a sample handy
----
2020-05-27 20:23:19 UTC - Addison Higham: maybe the other thing the proxy 
should just do is not swallow the exceptions if it fails and instead crash...
----
2020-05-27 20:26:30 UTC - Sijie Guo: @Josh Haft do you have DNS name over 
brokers? You can use DNS or just use a multi-host service urls
----
2020-05-27 20:26:54 UTC - Sijie Guo: The service url is just the “bootstrap” 
entrypoint for proxy to discover brokers.
----
2020-05-27 20:52:50 UTC - Josh Haft: Thanks. Is the multi-host option just 
comma-delimited URIs, i.e. 
"brokerWebServiceURL=<https://broker01:8443>,<https://broker02:8443>" ?
----
2020-05-27 20:53:47 UTC - Sijie Guo: <https://broker01:8443>,broker02:8443
+1 : Josh Haft
----
2020-05-27 22:52:41 UTC - Paul Wagner: @Paul Wagner has joined the channel
----
2020-05-27 23:59:07 UTC - Sam Xie: Hi, I'm running a pulsar EKS cluster with 9 
bookies and 8 brokers. Doing a performance test with one topic, 8 bundles and 
64 partitions. There always seems to be a couple bookies and brokers inactive. 
Does anyone have an idea why?
----
2020-05-28 00:08:56 UTC - Luke Stephenson: Hello.  I'm having an issue with 
both pulsar 2.5.0 and 2.5.2 which is the same as 
<https://apache-pulsar.slack.com/archives/C5Z4T36F7/p1586210696311600>.  During 
the initial cluster deployment which has 9 bookies, several of the nodes throw 
a NPE in `NetUtils.resolveNetworkLocation`.  After startup those bookies are 
never used (i.e show low CPU usage) by the pulsar cluster until they are 
restarted again, and even the restart might just result in another 
`NetUtils.resolveNetworkLocation` NPE.  Am I correct in assuming we need to 
wait for the next bookkeeper release before this will be fixed?  Is there 
something I can do on my end to make those calls more likely to succeed?
----
2020-05-28 01:12:23 UTC - Ken Huang: ok
----
2020-05-28 02:18:51 UTC - Ken Huang: hi @Tanner Nilsson what version do you use.

I use your code will get 400 error in version 2.5.1.
----
2020-05-28 02:30:57 UTC - Tymm: tried that with 2.5.2, but the problem is still 
there ...
----
2020-05-28 02:36:59 UTC - renjiemin: @renjiemin has joined the channel
----
2020-05-28 07:43:39 UTC - Sunny Chan: I am running some tests to see what's 
maximum rate a publisher can publish into a Pulsar node and it seems that each 
publisher thread using the same client can only publish up to around 300 
20bytes-messages/second while my consumer can handle much more if I create 
multiple publisher threads.

I have seen online documentation that there are message throttling going on to 
reduce chance of Pulsar node going OOM but I can't find any settings that is 
relating to the publisher message rate. Can someone point me to documentation 
on tuning this?
----
2020-05-28 07:50:56 UTC - Penghui Li: Do you running in standalone mode?
----

Slack digest for #general - 2020-05-28

Reply via email to