Slack digest for #general - 2020-10-20

Apache Pulsar Slack Tue, 20 Oct 2020 02:12:13 -0700

2020-10-19 11:54:18 UTC - Lari Hotari: I created a PR to address the race 
condition in updating ManagedCursorImpl.readPosition in 
ManagedCursorImpl.setAcknowledgePosition method: 
<https://github.com/apache/pulsar/pull/8299> . @hangc @Penghui Li @jia zhai 
@Sijie Guo please review
+1 : hangc
----
2020-10-19 11:56:32 UTC - hangc: :+1:
----
2020-10-19 13:55:42 UTC - Marcio Martins: Hey guys, do you know what could 
cause a broker to suddenly start consuming an exaggerated amount of resources? 
I am on 2.5.1, and suddenly my brokers are all on a crashing rotation running 
out of both Direct Memory and Heap Memory. They had been running stable for 
over 2 weeks. There is no extra load of any kind that I can see. Is there a way 
I can pinpoint where this memory is going?
----
2020-10-19 13:57:28 UTC - Lari Hotari: You can create a heapdump with jmap to 
find out where the heap is consumed. It can be analysed with Eclipse MAT for 
example.
----
2020-10-19 13:57:57 UTC - Lari Hotari: I created a feature request to support 
cancelling the message / batch futures returned by Reader and Consumer Java 
APIs, that is <https://github.com/apache/pulsar/issues/8300>
----
2020-10-19 14:07:55 UTC - Ming Fang: I’m following the instructions here 
<https://pulsar.apache.org/docs/en/io-debezium-source/#configuration>  to use 
Debezium to source data from Postgres into Pulsar.  I was able to see topics 
created for my tables but they disappear soon after I stopped the Debezium 
connector.  My guess is it has something to do with the retention policy.
Is there a way to configure the Debezium connector to create topics that 
doesn’t get deleted automatically?
----
2020-10-19 14:10:17 UTC - Joshua Decosta: Has anyone explore the default 
metrics in grafana at all and noticed inconsistencies? 
----
2020-10-19 16:12:56 UTC - Shivji Kumar Jha: @Matteo Merli @Jerry Peng @Sijie 
Guo Reminder: Please review <https://github.com/apache/pulsar/pull/8173> when 
you are free.
ok_hand : Sijie Guo
----
2020-10-19 16:41:56 UTC - Addison Higham: @Joshua Decosta I haven't seen or 
heard anything like that before, but it could well be possible. Can you give a 
bit more context about which metrics you are seeing issues with?
----
2020-10-19 16:42:04 UTC - Sijie Guo: @Ming Fang
1. you can disable deleting inactive topics by settings 
`brokerDeleteInactiveTopicsEnabled` to be `false`.
2. configure the retention policy either at the cluster level or at a namespace 
level.
```# Default message retention time
defaultRetentionTimeInMinutes=0

# Default retention size
defaultRetentionSizeInMB=0```
----
2020-10-19 16:42:39 UTC - Joshua Decosta: In the default messaging metrics it
lists many more topics than present
----
2020-10-19 16:43:01 UTC - Joshua Decosta: It happens each time i add multiple
namespaces and new topics
----
2020-10-19 16:47:20 UTC - Ming Fang: @Sijie Guo Thanks. I will try your
suggestion.
I have another question regarding Debezium.
I’m trying to consolidate multiple topics into one using the Reroute SMT like
this
``` "transforms": "Reroute",
"transforms.Reroute.type":
"io.debezium.transforms.ByLogicalTableRouter",
"transforms.Reroute.key.enforce.uniqueness\": "false",
"transforms.Reroute.topic.regex": "(.*)_([0-9]+)",
"transforms.Reroute.topic.replacement": "\$1"```
It is not working and I’m not seeing any errors.
Can you confirm that this should work?
----
2020-10-19 17:21:42 UTC - Sijie Guo: @Ming Fang Ideally it should work as we
just pass the config to debezium. But I will have to double-check. At the same
time, I am wondering if you can create an issue for it.
+1 : Ming Fang
----
2020-10-19 17:33:46 UTC - Addison Higham: @Marcio Martins that sounds like a
strange one, do you see anything different in your logs?
----
2020-10-19 17:34:39 UTC - Addison Higham: interesting... do you have a sample
of the names?
----
2020-10-19 17:43:06 UTC - Addison Higham: @Jim M. I just wanted to follow up
with you on this. Were you able to figure it out?
----
2020-10-19 17:44:35 UTC - Addison Higham: @Priyath Gregory just wanted to
follow up, I think there is an issue here. What should be happening is that it
uses the mechanism that we provide the message object to not advance the cursor
until we see an ack
----
2020-10-19 17:53:20 UTC - Shivji Kumar Jha: @Sijie Guo Hi, here is the github
issue for the same: <https://github.com/apache/pulsar/issues/8301>
If the proposal looks fine, we could send this out on the dev mailing list or
perhaps add a PIP. Alternatively, a high level review for obvious errors, if
any is really appreciated.
+1 : Sijie Guo, Johannes Wienke
----
2020-10-19 17:54:46 UTC - Joshua Decosta: I usually just make up names. I could
screen cap some admin commands and show the dashbaord to reflect it if that
helps
----
2020-10-19 18:01:35 UTC - Addison Higham: if you are seeing things like you
make a topic named `foobar` and you see `foobar-partition-0`,
`foobar-partition-1`, that would be expected. If the names are entirely
unrelated to your original topics, that would be surprising
----
2020-10-19 18:02:18 UTC - Joshua Decosta: This behavior is all from non
partitioned topics
----
2020-10-19 19:10:48 UTC - Addison Higham: Hrm, yes, some commands and screen
caps would be helpful
----
2020-10-19 19:48:45 UTC - Marcio Martins: no, in fact that broker is mostly
idle...
----
2020-10-19 19:49:30 UTC - Marcio Martins: the heap usage went down after a
while, but the direct memory is still 1024MB/1024MB on 3/6 brokers... all which
are mostly idle
----
2020-10-19 19:49:53 UTC - Marcio Martins: before this started happening, it was
using less than 168mb :confused:
----
2020-10-19 19:54:58 UTC - Jeff Schneller: Seeing an odd result in a simple
shared subscription. I have 3 clients running with what should be a randomized
delay between ack'ng the message. However each message is receiving a message
with an offset of 3. So client 1 received 1, 4, 7, 10 client 2 receives 2, 5,
8 client 3 received 3, 6, 9. Is this the intended result? I thought based on
the delay then each client should be receiving some random message.
----
2020-10-19 20:04:21 UTC - Addison Higham: are all those messages already in the
topic? if so, then pulsar will attempt to deliver them in a round robin
fashion, multiple messages are sent to the client at once and it will just wait
until all acks are received.
----
2020-10-19 20:16:45 UTC - Jeff Schneller: yes they are in the topic already
----
2020-10-19 20:17:36 UTC - Jeff Schneller: can the number of messages delivered
be changed to only allow 1 or 2?
----
2020-10-19 20:17:54 UTC - Jeff Schneller: or what happens when a client crashes
and messages were already delivered.
----
2020-10-19 20:22:37 UTC - Addison Higham: if the client has not yet acked the
message, once the client disconnects, the broker will notice that and then
re-send the message. There is also an ack-timeout which will re-send the
message even if the client is still connected
----
2020-10-19 20:24:26 UTC - Jeff Schneller: perfect... I see it now. I had 3,
stopped 1 of them and ran with 2. Now I see one of the clients picked up all
the message that would have gone to 3.
----
2020-10-19 20:24:39 UTC - Addison Higham: and yes, you can force a single
message at a time by lowering the receiver queue size, see
<http://pulsar.apache.org/api/client/2.6.0-SNAPSHOT/org/apache/pulsar/client/api/ConsumerBuilder.html#receiverQueueSize-int->
----
2020-10-19 20:24:58 UTC - Addison Higham: &gt; Setting the consumer queue size
as zero
&gt; Decreases the throughput of the consumer, by disabling pre-fetching of
messages. This approach improves the message distribution on shared
subscription, by pushing messages only to the consumers that are ready to
process them. Neither Consumer.receive(int, TimeUnit) nor Partitioned Topics
can be used if the consumer queue size is zero. Consumer.receive() function
call should not be interrupted when the consumer queue size is zero.
----
2020-10-19 20:26:42 UTC - Jeff Schneller: ok... so the default size is 1000.
This topic won't have that many at any one time. So we may want to reduce it
to 2 to 5. Thanks again for your help.
----
2020-10-19 20:27:19 UTC - Addison Higham: once again, it will round robin
*first*
----
2020-10-19 20:27:29 UTC - Addison Higham: what are you trying to achieve @Jeff
Schneller?
----
2020-10-19 20:30:58 UTC - Jeff Schneller: we are sending messages to a long
running process in some cases. In other cases the process could be completed
in seconds. Say we have 3 servers running the process. If server 2 is bogged
down we don't want a bunch of messages going there waiting to be processed.
Instead server 1 or 3 could handle them if they are free and ready to accept
more messages
----
2020-10-19 20:40:03 UTC - Addison Higham: okay, tuning the receive queue can
work especially if you know how many messages a box can/should have in flight,
but for those short jobs, it will reduce throughput a fair amount, so as long
as that is okay, then it is a good way to go. Just for your information, there
are some other ways you could achieve similar results:
1. Use explicit negative acknowledgements, if you have some knowledge of when a
single consumer is bogged down, you can explicitly "nack" a message which will
cause the broker to re-send the message. This allows you to keep receiver queue
size high, which increases throughput, but you would need to build your own
mechanism to decide when a consumer is too busy to accept any more message
2. Use ack-timeout, which will cause the broker to re-emit a message if not
acked within a certain time. This would work well if you have a consistent
amount of time each job can take (if you want to use it to better distribute
load), but can still be used as an upper bound for retrying jobs that look stuck
----
2020-10-19 20:43:13 UTC - Jeff Schneller: Thanks. We will be using the
ack-timeout if a job takes longer than we think it should. The nack we are
using when we know the consumer can't handle it (ie resources or something on
the server aren't correct). We are honestly dealing with 100-200 messages in a
day or half a day right now. If we got to the 1000s in a day we have bigger
things to worry about.
----
2020-10-20 00:08:18 UTC - Jim M.: Thanks for the follow up. Sadly no, ended up
flipping back to kafka and setting in flight messages to one.
----
2020-10-20 00:23:50 UTC - Addison Higham: okay, well.. if you ever end up back
here would be happy to help figure out what was happening, that definitively
seems weird, remind me, were you using ceph?
----
2020-10-20 03:06:25 UTC - abramsz: @abramsz has joined the channel
----

Slack digest for #general - 2020-10-20

Reply via email to