2020-10-21 09:26:46 UTC - Johannes Wienke: Hi again. Currently trying to
evaluate the Pulsar Debezium integration with MongoDB. For that purpose I have
created a docker-compose file:
```version: '3'
services:
mongo:
image: mongo:4.2
ports:
- 27017:27017
command: ["mongod", "--replSet", "rep0", "--bind_ip", "localhost,mongo"]
depends_on:
# Crude hack to ensure that everyone gets a working MongoDB relica set.
# The init container tries to connect until it succeeds.
- mongo-repl-init
mongo-repl-init:
image: mongo:4.2
command: |
sh -c "while ! mongo <mongodb://mongo> --eval 'rs.initiate({ _id:
\"rep0\", version: 1, members: [{ _id: 0, host: \"mongo\" }] });'; do sleep 2;
done"
pulsar:
image: apachepulsar/pulsar:2.6.1
command:
- bin/pulsar
- standalone
ports:
- 6650:6650
- 8080:8080
cdc:
image: apachepulsar/pulsar-all:2.6.1
command:
- bin/pulsar-admin
- source
- localrun
- --source-config-file
- /config.yml
volumes:
- ./config.yml:/config.yml:ro```
With the respective debezium config:
```tenant: "public"
namespace: "default"
name: "debezium-mongodb-source"
topicName: "debezium-mongodb-topic"
archive: "connectors/pulsar-io-debezium-mongodb-2.6.1.nar"
parallelism: 1
configs:
mongodb.hosts: "rep0/mongo:27017"
mongodb.name:
database.whitelist: "cdcevents"
database.history.pulsar.service.url: "<pulsar://pulsar:6650>"
pulsar.service.url: "<pulsar://pulsar:6650>"```
This always tries to connect to localhost for the pulsar broker:
> org.apache.pulsar.client.api.PulsarClientException:
java.util.concurrent.CompletionException:
io.netty.channel.AbstractChannel$AnnotatedConnectException: finishConnect(..)
failed: Connection refused: localhost/127.0.0.1:6650
What am I missing here?
----
2020-10-21 10:13:26 UTC - Lari Hotari: I pushed the PR now and it's ready for
review. Please review <https://github.com/apache/pulsar/pull/8326>
----
2020-10-21 11:36:29 UTC - Praveen Sannadi: @Addison Higham What are the ideal
resource requests and limits for a moderate cluster setup? Is there a way to
decide on these resource requests and limits for all pulsar components in the
cluster setup? If you have any docs to figure this out please help us on this.
----
2020-10-21 13:06:26 UTC - Praveen Sannadi: Hi All,
Can anyone help me with the ideal resource requests and limits for a
moderate cluster setup? Is there a way to decide on these resource requests and
limits for all pulsar components in the cluster setup? If you have any docs to
figure this out please help us on this. For example In the helm charts
bookkeeper deployment we have
resources:
requests:
memory: 512Mi
cpu: 0.2
The above ones are from apache-pulsar-helm charts repo. I am just trying to set
these for our clusters. So trying to explore on these. Any docs like on what
params we need to set these values etc.,
----
2020-10-21 14:07:12 UTC - Alexandre DUVAL: `Schema.JSON(MyClass.class)` is
working well for `java.util.Optional`? I got `{"present":true}` as field value,
but not the Optional value?
----
2020-10-21 14:42:54 UTC - Alexandre DUVAL: Need on Schema.JSON
```
ObjectMapper mapper = new ObjectMapper();
mapper.registerModule(new Jdk8Module());```
----
2020-10-21 15:05:26 UTC - Lari Hotari: Pulsar 2.6.0+ has support for custom
ObjectMapper instance. There's a Kotlin code example in
<https://github.com/apache/pulsar/issues/6528#issuecomment-701410483> about how
to use it. I haven't tried it myself, but there's a note that it doesn't work
with the shaded client and you should use pulsar-client-original dependency
instead of pulsar-client.
----
2020-10-21 15:14:24 UTC - Milos Matijasevic: Hello, i'm trying to copy messages
from one pulsar cluster to another one, and it is pretty slow, here is code for
that
(<https://gist.github.com/milos-matijasevic/e29d00e279b5d1a540552864c7ce1321|pulsar-topic-replicator>),
have you any idea why it is so slow ?
----
2020-10-21 15:18:13 UTC - Joshua Decosta: Functions is for sure turned off
----
2020-10-21 15:41:53 UTC - Joshua Decosta: There def seems to be something off.
My metrics are showing that i don’t have topics but i can see via pulsaradmin
that i indeed do have some.
----
2020-10-21 15:42:18 UTC - Joshua Decosta: Is there somewhere i could look in
the spruce code to see how Prometheus is configured?
----
2020-10-21 16:02:37 UTC - Joshua Decosta: Does disabling the metrics auth on
proxy trickle down to broker?
----
2020-10-21 16:17:14 UTC - Addison Higham: Are you going over an unstable
network connection somewhere? a few things that could be causing it:
1) unstable network connection and just high amounts of packet loss
2) your pulsar broker is overloaded, see if you are getting into GC problems or
have high CPU
3) issues with your zookeeper, if your zookeeper is under powered, you could
see issues where when a new ledger is opened, it causes higher latency
resulting in timeouts
----
2020-10-21 16:19:58 UTC - Addison Higham: @VanderChen
1. yes, you can use use the java admin lib, see
<http://pulsar.apache.org/api/admin/2.6.0-SNAPSHOT/org/apache/pulsar/client/admin/Functions.html#uploadFunction-java.lang.String-java.lang.String->
2. Nothing is preventing you from using threads, it should be noted that in
2.6.x you also have the option of returning a completableFuture
----
2020-10-21 16:22:05 UTC - Addison Higham: Hi @Konrad Łyś the log4j.yaml file
controls the log format, what I would suggest doing is either customizing your
image by just replacing the built in log4j.yaml or you can use a combination of
`PULSAR_LOG_CONF` environment variable and a kubernetes mount to point at a
custom log4j.yaml file.
AFAIK, that some log4j conf should be used for both BK and brokers
----
2020-10-21 16:22:34 UTC - Konrad Łyś: ok
----
2020-10-21 16:24:12 UTC - Addison Higham: @Johannes Wienke the pulsar
standalone takes a minute to start up, it is possible that the localrun is
trying to run before pulsar finishes, you can use something like
<https://github.com/vishnubob/wait-for-it> to ensure pulsar is ready before
starting local run
----
2020-10-21 16:27:00 UTC - Addison Higham: You should be using async methods for
both consume and receive, it will be *much* higher.
But one thing: You can use replication to do this, you don't need to have a
global zookeeper if you manually set up the metadata on each cluster, see
<https://gist.github.com/sijie/79364497eaa349bf58d9fb760561f930> for details on
that
----
2020-10-21 16:28:12 UTC - Guillaume: Hi,
I deployed Pulsar on Kubernetes using official Helm but I do not have Proxy
Metrics in Grafana. Everything seems to be working except for Proxy Metrics. Is
there something to activate to have it working ?
Thank you.
----
2020-10-21 16:39:41 UTC - Pushkar Sawant: Thanks for your response.
1. Both producer and Pulsar cluster are on same cluster, separated by namespaces
2. The GC pauses are usually 1 sec or less. Max CPU utilization is at around
20%.
3. Zookeeper cluster memory utilization is around 50%, cpu utilization is 5% or
less. Peak GC pauses are at around 500ms.
----
2020-10-21 16:41:59 UTC - Addison Higham: that is somewhat of a high GC pause
for the broker, have you tried adding more memory?
----
2020-10-21 16:57:21 UTC - Milos Matijasevic: thank you! also i thought about
calling ack in thread
```go consumer.Ack()```
btw, we are moving from old cluster v1.0 to newest one and we use helm chart so
we can't do it manually and also we are refactoring namespaces, so that's why
we use this approach
----
2020-10-21 17:21:04 UTC - Johannes Wienke: I know, the test case is not perfect
there and I managed that manually. Still the option for the pulsar broker
doesn't seem to have any effect. (127.0.0.1 != pulsar). I was able to work
around this by providing the `--broker-service-url` command line flag. Is that
really required despite the different declarations in the config file?
----
2020-10-21 17:22:20 UTC - Robert Morrow: @Robert Morrow has joined the channel
----
2020-10-21 17:35:44 UTC - Addison Higham: ah apologies, didn't notice that. You
may need it because with localrun it may connect in 2 different places
----
2020-10-21 17:50:38 UTC - Joshua Decosta: Are you using authentication at all?
----
2020-10-21 17:51:00 UTC - Guillaume: Yes I am
----
2020-10-21 17:52:43 UTC - Guillaume: Is there something specific to configure
when using authentication?
----
2020-10-21 17:58:35 UTC - Pushkar Sawant: Our memory usage is at 50%. The GC
times are 10 minutes cumulative
----
2020-10-21 18:21:45 UTC - Joshua Decosta: @Addison Higham perhaps there are
ghost topics being created in the metrics? I’m seeing the topic climb on each
additional topic created
----
2020-10-21 23:28:10 UTC - Addison Higham: apologies @Joshua Decosta was
speaking at a conference so haven't had a chance to be online as much.
TBH, I am not as familiar with how we generate prometheus metrics. Have you
tried looking directly at the prometheus metrics coming out the metrics
endpoint and see if some of the metrics that are keyed per topic are set?
As far as missing metrics, one thing to note is that each broker reports
metrics for a different set of topics. It only reports for the topics that it
currently owns
----
2020-10-21 23:28:31 UTC - Addison Higham: This might be a case of where 30
minutes of time on a call would be helpful as well
----
2020-10-21 23:41:58 UTC - Devin G. Bost: I configured retention on a topic
that's getting a heartbeat so I could test the retention setting. However, it
appears from the graph that the topic's storage is clearing every 4 hours. I'd
expect it to not drop to 0 until the retention period has passed. When I get
the retention for the topic, it looks like this:
```bin/pulsar-admin namespaces get-retention public/default
{
"retentionTimeInMinutes" : 43200,
"retentionSizeInMB" : 0
}```
Are the messages not actually getting retained?
----
2020-10-22 00:30:55 UTC - Addison Higham: you need to set both
----
2020-10-22 00:31:06 UTC - Addison Higham: time and size
----
2020-10-22 07:25:14 UTC - Johannes Wienke: Ok, maybe that should be added to
the docs. That was really unexpected with all the existing broker URI
declarations already contained in the config files.
----
2020-10-22 07:56:26 UTC - Konrad Łyś: Here are my logs from pulsar broker
```07:51:45.452 [pulsar-web-42-1] INFO org.eclipse.jetty.server.RequestLog -
127.0.0.1 - - [22/Oct/2020:07:51:45 +0000] "GET
/admin/v2/persistent/spain/_system/_signals_ingest/stats HTTP/1.1" 200 5654 "-"
"curl/7.64.0" 3
07:51:49.348 [pulsar-load-manager-3-1] INFO
org.apache.pulsar.broker.loadbalance.impl.ModularLoadManagerImpl - Only 1
broker available: no load shedding will be performed
07:51:49.550 [pulsar-load-manager-3-1] INFO
org.apache.pulsar.broker.loadbalance.impl.ModularLoadManagerImpl - Writing
local data to ZooKeeper because maximum change 14.762450912068523% exceeded
threshold 10%; time since last report written is 60.0 seconds
07:51:49.576 [pulsar-ordered-OrderedExecutor-5-0-EventThread] INFO
org.apache.pulsar.zookeeper.ZooKeeperDataCache - [State:CONNECTED Timeout:30000
sessionid:0x1001dc776af0015 local:/10.8.35.4:37244
remoteserver:spain-zookeeper-0.spain-zookeeper-headless/10.8.137.137:2181
lastZxid:1781 xid:512 sent:512 recv:532 queuedpkts:0 pendingresp:0
queuedevents:0] Received ZooKeeper watch event: WatchedEvent
state:SyncConnected type:NodeDataChanged
path:/loadbalance/brokers/spain-pulsar-broker-0.spain-pulsar-broker.spain.svc.cluster.local:8080```
Namely from
`kubectl logs spain-pulsar-broker-0 -c spain-pulsar-broker -n spain`
----
2020-10-22 07:56:35 UTC - Konrad Łyś: spain is my namespace
----
2020-10-22 07:57:20 UTC - Konrad Łyś: As you can see some logs are following
the general format and some are not
----
2020-10-22 07:57:34 UTC - Konrad Łyś: Is this the desired behaviour?
----