2018-10-24 15:00:31 UTC - Yiwei Chen: @Yiwei Chen has joined the channel
----
2018-10-24 15:10:17 UTC - George Wilk: We have a namespace w/o TTL set for
backlog. Our retention policy is configured as:
"retentionTimeInMinutes" : 0,
"retentionSizeInMB" : 0
We have recently begun to experience backlog quota issues ("back log quota
exceeded") preventing us from publishing more messages. Is there implicit
backlog quota limit set by default?
We have yet to configure our TTL and refine our retention policy, but in the
interim we wanted to keep the backlog growing. Any advice on how to make it
possible?
----
2018-10-24 15:24:59 UTC - Matteo Merli: Yes there is a default per-topic system
wide backlog quota defined in broker.conf
----
2018-10-24 15:25:26 UTC - Matteo Merli: Default is 10GB, which is probably low
for most cases
----
2018-10-24 15:26:12 UTC - Matteo Merli: You can override on namespace
----
2018-10-24 15:27:15 UTC - Matteo Merli:
<http://pulsar.apache.org/docs/en/cookbooks-retention-expiry/#backlog-quotas>
----
2018-10-24 15:41:38 UTC - George Wilk: Thanks, @Matteo Merli!
----
2018-10-24 16:27:31 UTC - Ivan Kelly: @Matteo Merli with the message listener
on reader, is there a way to provide backpressure?
----
2018-10-24 16:33:26 UTC - Matteo Merli: Yes, blocking the listener thread will
apply backpressure
----
2018-10-24 16:34:31 UTC - Ivan Kelly: but is that listener thread shared
between consumers?
----
2018-10-24 16:35:55 UTC - Matteo Merli: Yes, you can configure the size of the
listener thread pool in the client instance. Each consumer/reader listener is
pinned to a thread from that pool, but a thread can be shared across multiple
consumers/readers.
----
2018-10-24 17:50:07 UTC - Ryan Samo: Hey guys, is it true that the “Pulsar
Proxy” does not support WebSockets? If so, is there any harm in using
NGINX/HAProxy to ingress traffic directly into the cluster? I know the “Pulsar
Proxy” is similar to a smart L7 load balancer, I’m just trying to have a way
where consumers could use thick clients like Java or Python to consume or thin
clients over WebSockets if they have a system written in JavaScript for
example. Any pointers would be greatly appreciated.
----
2018-10-24 18:36:29 UTC - Ryan Samo: Sorry never mind, I just realized that you
guys have the Pulsar proxy and a websocket proxy out there, 2 separate proxies.
They both work nicely!
----
2018-10-24 18:54:40 UTC - Beast in Black: Hi guys (and @Matteo Merli) - After a
server reboot, I am seeing failures in my C++ app (which uses the pulsar cpp
client) when trying to connect a producer and consumer to a previously existing
topic. The error I'm seeing is `ServiceUnitNotReady` and from
<https://godoc.org/github.com/apache/incubator-pulsar/pulsar-client-go/pulsar>
I see that this means that: `Service Unit unloaded between client did lookup
and producer/consumer got created`
My questions are:
1. What could cause this to happen? Note that bookkeeper and zookeeper on that
server were also affected although I have deployed 2 other instances (3 total)
of both bookkeeper and zookeeper on 2 other servers.
2. Is there any way to recover from this error?
----
2018-10-24 18:58:02 UTC - Ali Ahmed: is the service healthy ?
----
2018-10-24 19:05:43 UTC - Beast in Black: @Ali Ahmed how can I check that?
----
2018-10-24 19:07:57 UTC - Ali Ahmed: check list of brokers ```pulsar-admin
brokers list default```
----
2018-10-24 19:11:56 UTC - Ankit Parashar: @Ankit Parashar has joined the channel
----
2018-10-24 19:13:28 UTC - Beast in Black: @Ali Ahmed thank you, I'll try this
and report back.
----
2018-10-24 19:29:15 UTC - Beast in Black: @Ali Ahmed all 3 of my brokers appear
to be up and available per the command you gave me (first two octets of IP
addresses redacted):
```
X.Y.1.36:8080
X.Y.2.5:8080
X.Y.3.40:8080
```
----
2018-10-24 19:30:02 UTC - Ali Ahmed: I would check the logs of the services to
see if there an errors
----
2018-10-24 19:42:44 UTC - Beast in Black: @Ali Ahmed thanks, will do. I checked
the broker logs on the rebooted server and apart from some error messages
related to a local namespace for non-persistent topics (see below) I do not see
any errors especially any related to my persistent topic where I had the
`ServiceUnitNotReady` issue. I will check the bookie and zookeeper logs on the
same server.
Errors related to local namespace hosting non-persistent topic (not currently
used):
```
org.apache.pulsar.broker.admin.v2.NonPersistentTopics - [null] Namespace bundle
is not owned by any broker <LOCAL_NS>
```
----
2018-10-24 19:44:44 UTC - Ali Ahmed: what is your tenant name ?
----
2018-10-24 19:59:53 UTC - Beast in Black: @Ali Ahmed tenant is called `_mm`,
and the global and local NSes under it are `_mm/c8global` and `_mm/c8local`
respectively
----
2018-10-24 20:03:30 UTC - Beast in Black: I also have the following additional
namespaces under the tenant:
global: `_mm/c8global._system`
local: `_mm/c8local._system`
The `<LOCAL_NS>` mentioned above where it complains about the NS bundle
ownership is the `_mm/c8local._system`one
----
2018-10-24 20:44:24 UTC - Matteo Merli: @Beast in Black can you do a lookup:
`pulsar-admin topics lookup $TOPIC`
----
2018-10-24 21:00:17 UTC - Beast in Black: @Matteo Merli it returned
`"<pulsar://X.Y.3.40:6650>"`
----
2018-10-24 21:00:45 UTC - Matteo Merli: But that broker still return error?
----
2018-10-24 21:01:05 UTC - Matteo Merli: Try to force topic reassignment:
`pulsar-admin topics unload $TOPIC`
----
2018-10-24 21:02:06 UTC - Beast in Black: @Matteo Merli thanks, I'll try that.
Under what circumstances would producers/consumers get the
`ServiceUnitNotReady` error?
----
2018-10-24 21:05:02 UTC - Matteo Merli: It should not happen. It means the
broker is trying to shut down that topic (or group of topics) and failed for
some reasons
----
2018-10-24 21:05:22 UTC - Matteo Merli: Can you check that broker log for any
hint?
----
2018-10-24 21:06:41 UTC - Beast in Black: @Matteo Merli the broker log on the
rebooted server does not contain any references at all to my persistent topic
for which the cpp client reported the `ServiceUnitNotReady` error. That is why
I was confused :slightly_smiling_face:
----
2018-10-24 21:07:43 UTC - Matteo Merli: Is that `"<pulsar://X.Y.3.40:6650>"` ?
----
2018-10-24 21:10:22 UTC - Beast in Black: @Matteo Merli `3.40` actually refers
to one of my other brokers (have 3) on a server which was rebooted earlier and
appeared to come back up with no issues. The broker on the rebooted server with
the issue is `X.Y.1.36` - let me check the broker logs on `3.40`
----
2018-10-24 21:12:25 UTC - Beast in Black: I'm running this under kubernetes on
AWS so the broker IPs are the IPs of the broker k8s pods on the AWS instance
nodes (one of which is what was rebooted using a sysrq reboot as part of a test)
----
2018-10-24 21:44:56 UTC - Beast in Black: @Matteo Merli On the broker logs on
`3.40` I'm seeing some interesting errors related to bookie and ledger recovery
after node reboots.
The bookie errors look like this:
```
Caused by:
org.apache.pulsar.broker.service.BrokerServiceException$PersistenceException:
org.apache.bookkeeper.mledger.ManagedLedgerException: Not enough non-faulty
bookies available
<SNIP>
```
The ledger errors look like this:
```
java.util.concurrent.CompletionException:
org.apache.pulsar.broker.service.BrokerServiceException$PersistenceException:
org.apache.bookkeeper.mledger.ManagedLedgerException: Error while recovering
ledger
```
I will need more time to go through this to see if I can figure out the
problem. I have 3 bookies running as k8s pods on 3 separate AWS nodes (2 nodes
of which were rebooted one after another and I started seeing the problem after
the reboot of the second node). I am wondering whether I need to up the number
of bookies to 4 according to the documentation at
<https://bookkeeper.apache.org/docs/4.7.2/admin/bookies/#requirements>
----
2018-10-24 21:46:10 UTC - Matteo Merli: Yes, that looks like the problem
----
2018-10-24 21:46:30 UTC - Matteo Merli: the topic cannot get online because of
those errors
----
2018-10-24 21:55:54 UTC - Beast in Black: @Matteo Merli thanks for your help
and the tip about the lookup which pointed me to the right broker. I will
investigate further and see if it is a problem with the way we have configured
pulsar, bookie and zookeeper. Thanks again!
+1 : Matteo Merli
----
2018-10-25 00:06:06 UTC - durga: Hi guys - General question… why do we need 4
bookies (instead of 3) for high availability? If bookies use quorum then
shouldn’t 3 be sufficient?
I see in the docs brief mention about why 4 bookies are needed but it is not
clear. What is a `generic` entry and what is `self-verifying` entry? Is it a
configuration parameter on server side?
----
2018-10-25 00:07:35 UTC - Matteo Merli: It mainly depends on how many copies of
your data you want. If you use ensembleSize=2, then 3 bookies are enough to
sustain 1 bookie being down
+1 : durga
----
2018-10-25 00:08:20 UTC - Matteo Merli: In general, you need to have E bookies
up to write to ledgers with ensembleSize=E
+1 : durga
----
2018-10-25 00:08:56 UTC - Matteo Merli: Where did you get the generic vs
self-verifying entry?
----
2018-10-25 00:10:24 UTC - durga: @Matteo Merli - Excellent… thanks I will check
with the team.
I got the `generic` vs `self-verifying` entry from BookKeeper docs. Link:
<https://bookkeeper.apache.org/docs/4.7.2/admin/bookies/>
You should add what you said above to the docs. Your explanation is more easier
to understand.
----
2018-10-25 00:12:07 UTC - Matteo Merli: Yes, I don’t understand that part
either :slightly_smiling_face:
----
2018-10-25 00:12:08 UTC - Grant Wu: PRs are accepted :slightly_smiling_face:
+1 : Matteo Merli
----
2018-10-25 00:12:25 UTC - Grant Wu: (Note: I do not work for streamlio/am not a
core maintainer)
----
2018-10-25 00:13:52 UTC - Matteo Merli: The self-verifying part I think is
referred to MAC code. By default we use CRC32-C for checksumming the entries
payloads, since it’s many times faster.
Though I don’t see how that affect the ensemble size
----
2018-10-25 00:16:38 UTC - durga: @Matteo Merli - What is ensembleSize? Is it
number of replicas?
----
2018-10-25 00:18:42 UTC - Matteo Merli: not exactly (though it typically
coincides)
ensembleSize : How many bookies to use for a specific ledger
writeQuorum: How many bookies to write to
ackQuorum: How many responses to wait for before considering a write successful
+1 : durga, Rodrigo Malacarne
----
2018-10-25 00:19:23 UTC - Matteo Merli: if `E > W` the client will be
striping entries for a single ledger across multiple bookies
----
2018-10-25 00:21:20 UTC - Matteo Merli: btw: I strongly suggest this blog post
that drills down to all the components in Pulsar & BookKeeper:
<https://jack-vanlightly.com/blog/2018/10/2/understanding-how-apache-pulsar-works>
+1 : durga, Rodrigo Malacarne, Igor Zubchenok
----
2018-10-25 00:22:09 UTC - durga: @Matteo Merli - Thanks. The above is helpful.
I will read the blog post as well.
----
2018-10-25 00:23:02 UTC - Matteo Merli: also the following was a good one :wink:
<https://jack-vanlightly.com/blog/2018/10/21/how-to-not-lose-messages-on-an-apache-pulsar-cluster>
+1 : durga, Rodrigo Malacarne, Igor Zubchenok
----
2018-10-25 00:27:42 UTC - durga: Thanks @Matteo Merli. Will read them.
----
2018-10-25 00:45:37 UTC - Aniket: @Aniket has joined the channel
----
2018-10-25 01:55:47 UTC - durga: @Matteo Merli - Thanks. The above posts are
very useful.
----