Slack digest for #general - 2018-10-25

Apache Pulsar Slack Thu, 25 Oct 2018 02:12:01 -0700

2018-10-24 15:00:31 UTC - Yiwei Chen: @Yiwei Chen has joined the channel
----
2018-10-24 15:10:17 UTC - George Wilk: We have a namespace w/o TTL set for 
backlog.  Our retention policy is configured as:
"retentionTimeInMinutes" : 0,
"retentionSizeInMB" : 0


We have recently begun to experience backlog quota issues ("back log quota 
exceeded") preventing us from publishing more messages.  Is there implicit 
backlog quota limit set by default?
We have yet to configure our TTL and refine our retention policy, but in the 
interim we wanted to keep the backlog growing.  Any advice on how to make it 
possible?
----
2018-10-24 15:24:59 UTC - Matteo Merli: Yes there is a default per-topic system 
wide backlog quota defined in broker.conf
----
2018-10-24 15:25:26 UTC - Matteo Merli: Default is 10GB, which is probably low 
for most cases
----
2018-10-24 15:26:12 UTC - Matteo Merli: You can override on namespace
----
2018-10-24 15:27:15 UTC - Matteo Merli: 
<http://pulsar.apache.org/docs/en/cookbooks-retention-expiry/#backlog-quotas>
----
2018-10-24 15:41:38 UTC - George Wilk: Thanks, @Matteo Merli!
----
2018-10-24 16:27:31 UTC - Ivan Kelly: @Matteo Merli with the message listener 
on reader, is there a way to provide backpressure?
----
2018-10-24 16:33:26 UTC - Matteo Merli: Yes, blocking the listener thread will 
apply backpressure
----
2018-10-24 16:34:31 UTC - Ivan Kelly: but is that listener thread shared 
between consumers?
----
2018-10-24 16:35:55 UTC - Matteo Merli: Yes, you can configure the size of the 
listener thread pool in the client instance. Each consumer/reader listener is 
pinned to a thread from that pool, but a thread can be shared across multiple 
consumers/readers.
----
2018-10-24 17:50:07 UTC - Ryan Samo: Hey guys, is it true that the “Pulsar 
Proxy” does not support WebSockets? If so, is there any harm in using 
NGINX/HAProxy to ingress traffic directly into the cluster? I know the “Pulsar 
Proxy” is similar to a smart L7 load balancer, I’m just trying to have a way 
where consumers could use thick clients like Java or Python to consume or thin 
clients over WebSockets if they have a system written in JavaScript for 
example. Any pointers would be greatly appreciated.
----
2018-10-24 18:36:29 UTC - Ryan Samo: Sorry never mind, I just realized that you 
guys have the Pulsar proxy and a websocket proxy out there, 2 separate proxies. 
They both work nicely!
----
2018-10-24 18:54:40 UTC - Beast in Black: Hi guys (and @Matteo Merli) - After a 
server reboot, I am seeing failures in my C++ app (which uses the pulsar cpp 
client) when trying to connect a producer and consumer to a previously existing 
topic. The error I'm seeing is `ServiceUnitNotReady` and from 
<https://godoc.org/github.com/apache/incubator-pulsar/pulsar-client-go/pulsar> 
I see that this means that: `Service Unit unloaded between client did lookup 
and producer/consumer got created`

My questions are:
1. What could cause this to happen? Note that bookkeeper and zookeeper on that 
server were also affected although I have deployed 2 other instances (3 total) 
of both bookkeeper and zookeeper on 2 other servers.
2. Is there any way to recover from this error?
----
2018-10-24 18:58:02 UTC - Ali Ahmed: is the service healthy ?
----
2018-10-24 19:05:43 UTC - Beast in Black: @Ali Ahmed how can I check that?
----
2018-10-24 19:07:57 UTC - Ali Ahmed: check list of brokers ```pulsar-admin 
brokers list default```
----
2018-10-24 19:11:56 UTC - Ankit Parashar: @Ankit Parashar has joined the channel
----
2018-10-24 19:13:28 UTC - Beast in Black: @Ali Ahmed thank you, I'll try this 
and report back.
----
2018-10-24 19:29:15 UTC - Beast in Black: @Ali Ahmed all 3 of my brokers appear 
to be up and available per the command you gave me (first two octets of IP 
addresses redacted):
```
X.Y.1.36:8080
X.Y.2.5:8080
X.Y.3.40:8080
```
----
2018-10-24 19:30:02 UTC - Ali Ahmed: I would check the logs of the services to 
see if there an errors
----
2018-10-24 19:42:44 UTC - Beast in Black: @Ali Ahmed thanks, will do. I checked 
the broker logs on the rebooted server and apart from some error messages 
related to a local namespace for non-persistent topics (see below) I do not see 
any errors especially any related to my persistent topic where I had the 
`ServiceUnitNotReady` issue. I will check the bookie and zookeeper logs on the 
same server.

Errors related to local namespace hosting non-persistent topic (not currently 
used):
```
org.apache.pulsar.broker.admin.v2.NonPersistentTopics - [null] Namespace bundle 
is not owned by any broker &lt;LOCAL_NS&gt;
```
----
2018-10-24 19:44:44 UTC - Ali Ahmed: what is your tenant name ?
----
2018-10-24 19:59:53 UTC - Beast in Black: @Ali Ahmed tenant is called `_mm`, 
and the global and local NSes under it are `_mm/c8global` and `_mm/c8local` 
respectively
----
2018-10-24 20:03:30 UTC - Beast in Black: I also have the following additional 
namespaces under the tenant:
global: `_mm/c8global._system`
local: `_mm/c8local._system`

The `&lt;LOCAL_NS&gt;` mentioned above where it complains about the NS bundle 
ownership is the `_mm/c8local._system`one
----
2018-10-24 20:44:24 UTC - Matteo Merli: @Beast in Black can you do a lookup: 
`pulsar-admin topics lookup $TOPIC`
----
2018-10-24 21:00:17 UTC - Beast in Black: @Matteo Merli it returned 
`"<pulsar://X.Y.3.40:6650>"`
----
2018-10-24 21:00:45 UTC - Matteo Merli: But that broker still return error?
----
2018-10-24 21:01:05 UTC - Matteo Merli: Try to force topic reassignment: 
`pulsar-admin topics unload $TOPIC`
----
2018-10-24 21:02:06 UTC - Beast in Black: @Matteo Merli thanks, I'll try that. 
Under what circumstances would producers/consumers get the 
`ServiceUnitNotReady` error?
----
2018-10-24 21:05:02 UTC - Matteo Merli: It should not happen. It means the 
broker is trying to shut down that topic (or group of topics) and failed for 
some reasons
----
2018-10-24 21:05:22 UTC - Matteo Merli: Can you check that broker log for any 
hint?
----
2018-10-24 21:06:41 UTC - Beast in Black: @Matteo Merli the broker log on the 
rebooted server does not contain any references at all to my persistent topic 
for which the cpp client reported the `ServiceUnitNotReady` error. That is why 
I was confused :slightly_smiling_face:
----
2018-10-24 21:07:43 UTC - Matteo Merli: Is that `"<pulsar://X.Y.3.40:6650>"` ?
----
2018-10-24 21:10:22 UTC - Beast in Black: @Matteo Merli `3.40` actually refers 
to one of my other brokers (have 3) on a server which was rebooted earlier and 
appeared to come back up with no issues. The broker on the rebooted server with 
the issue is `X.Y.1.36` - let me check the broker logs on `3.40`
----
2018-10-24 21:12:25 UTC - Beast in Black: I'm running this under kubernetes on 
AWS so the broker IPs are the IPs of the broker k8s pods on the AWS instance 
nodes (one of which is what was rebooted using a sysrq reboot as part of a test)
----
2018-10-24 21:44:56 UTC - Beast in Black: @Matteo Merli On the broker logs on 
`3.40` I'm seeing some interesting errors related to bookie and ledger recovery 
after node reboots.

The bookie errors look like this:
```
Caused by: 
org.apache.pulsar.broker.service.BrokerServiceException$PersistenceException: 
org.apache.bookkeeper.mledger.ManagedLedgerException: Not enough non-faulty 
bookies available
&lt;SNIP&gt;
```

The ledger errors look like this:
```
java.util.concurrent.CompletionException: 
org.apache.pulsar.broker.service.BrokerServiceException$PersistenceException: 
org.apache.bookkeeper.mledger.ManagedLedgerException: Error while recovering 
ledger
```

I will need more time to go through this to see if I can figure out the 
problem. I have 3 bookies running as k8s pods on 3 separate AWS nodes (2 nodes 
of which were rebooted one after another and I started seeing the problem after 
the reboot of the second node). I am wondering whether I need to up the number 
of bookies to 4 according to the documentation at 
<https://bookkeeper.apache.org/docs/4.7.2/admin/bookies/#requirements>
----
2018-10-24 21:46:10 UTC - Matteo Merli: Yes, that looks like the problem
----
2018-10-24 21:46:30 UTC - Matteo Merli: the topic cannot get online because of 
those errors
----
2018-10-24 21:55:54 UTC - Beast in Black: @Matteo Merli thanks for your help 
and the tip about the lookup which pointed me to the right broker. I will 
investigate further and see if it is a problem with the way we have configured 
pulsar, bookie and zookeeper. Thanks again!
+1 : Matteo Merli
----
2018-10-25 00:06:06 UTC - durga: Hi guys - General question… why do we need 4 
bookies (instead of 3) for high availability? If bookies use quorum then 
shouldn’t 3 be sufficient?

I see in the docs brief mention about why 4 bookies are needed but it is not 
clear. What is a `generic` entry and what is `self-verifying` entry? Is it a 
configuration parameter on server side?
----
2018-10-25 00:07:35 UTC - Matteo Merli: It mainly depends on how many copies of 
your data you want. If you use ensembleSize=2, then 3 bookies are enough to 
sustain 1 bookie being down
+1 : durga
----
2018-10-25 00:08:20 UTC - Matteo Merli: In general, you need to have E bookies 
up to write to ledgers with ensembleSize=E
+1 : durga
----
2018-10-25 00:08:56 UTC - Matteo Merli: Where did you get the generic vs 
self-verifying entry?
----
2018-10-25 00:10:24 UTC - durga: @Matteo Merli - Excellent… thanks I will check 
with the team.

I got the `generic` vs `self-verifying` entry from BookKeeper docs. Link: 
<https://bookkeeper.apache.org/docs/4.7.2/admin/bookies/>

You should add what you said above to the docs. Your explanation is more easier 
to understand.
----
2018-10-25 00:12:07 UTC - Matteo Merli: Yes, I don’t understand that part 
either :slightly_smiling_face:
----
2018-10-25 00:12:08 UTC - Grant Wu: PRs are accepted :slightly_smiling_face:
+1 : Matteo Merli
----
2018-10-25 00:12:25 UTC - Grant Wu: (Note: I do not work for streamlio/am not a 
core maintainer)
----
2018-10-25 00:13:52 UTC - Matteo Merli: The self-verifying part I think is 
referred to MAC code. By default we use CRC32-C for checksumming the entries 
payloads, since it’s many times faster.
Though I don’t see how that affect the ensemble size
----
2018-10-25 00:16:38 UTC - durga: @Matteo Merli - What is ensembleSize? Is it 
number of replicas?
----
2018-10-25 00:18:42 UTC - Matteo Merli: not exactly (though it typically 
coincides)

ensembleSize : How many bookies to use for a specific ledger
writeQuorum: How many bookies to write to
ackQuorum: How many responses to wait for before considering a write successful
+1 : durga, Rodrigo Malacarne
----
2018-10-25 00:19:23 UTC - Matteo Merli: if `E &gt; W` the client will be 
striping entries for a single ledger across multiple bookies
----
2018-10-25 00:21:20 UTC - Matteo Merli: btw: I strongly suggest this blog post 
that drills down to all the components in Pulsar &amp; BookKeeper:

<https://jack-vanlightly.com/blog/2018/10/2/understanding-how-apache-pulsar-works>
+1 : durga, Rodrigo Malacarne, Igor Zubchenok
----
2018-10-25 00:22:09 UTC - durga: @Matteo Merli - Thanks. The above is helpful. 
I will read the blog post as well.
----
2018-10-25 00:23:02 UTC - Matteo Merli: also the following was a good one :wink:

<https://jack-vanlightly.com/blog/2018/10/21/how-to-not-lose-messages-on-an-apache-pulsar-cluster>
+1 : durga, Rodrigo Malacarne, Igor Zubchenok
----
2018-10-25 00:27:42 UTC - durga: Thanks @Matteo Merli. Will read  them.
----
2018-10-25 00:45:37 UTC - Aniket: @Aniket has joined the channel
----
2018-10-25 01:55:47 UTC - durga: @Matteo Merli - Thanks. The above posts are 
very useful.
----

Slack digest for #general - 2018-10-25

Reply via email to