Slack digest for #general - 2020-03-31

Apache Pulsar Slack Tue, 31 Mar 2020 02:11:54 -0700

2020-03-30 09:16:51 UTC - Jeon.DeukJin: Hello, I used pulsar-manager.
How to create other admin user after create super-admin?
----
2020-03-30 09:17:11 UTC - Jeon.DeukJin: ```curl -H "Content-Type: 
application/json" -X PUT 
<http://172.21.42.34:7750/pulsar-manager/users/superuser> -d '{"name": "admin", 
"password": "apachepulsar", "description": "test", "email": 
"<mailto:[email protected]|[email protected]>"}'


{"error":"Super user role is exist, this interface is no longer available"}```
----
2020-03-30 09:18:23 UTC - Jeon.DeukJin: ```then,  
curl -H "Content-Type: application/json" -X PUT 
<http://172.21.42.34:7750/pulsar-manager/users/admin> -d '{"name": "adminuser", 
"password": "adminuser", "description": "adminuser", "email": 
"<mailto:[email protected]|[email protected]>"}'
{"message":"Please login."}%```
----
2020-03-30 09:19:12 UTC - Jeon.DeukJin: 
----
2020-03-30 09:24:15 UTC - tuteng: Yes, super users should have been created 
successfully, you can try to login
----
2020-03-30 09:27:19 UTC - tuteng: If you cannot find your username and 
password, you can try to redeploy, delete the dbdata directory and restart 
service
```./build/distributions/pulsar-manager/bin/pulsar-manager```
.
----
2020-03-30 09:28:05 UTC - tuteng: Reinitialize superuser and password:
```curl -H "Content-Type: application/json" -X PUT 
<http://172.21.42.34:7750/pulsar-manager/users/superuser> -d '{"name": "admin", 
"password": "apachepulsar", "description": "test", "email": 
"<mailto:[email protected]|[email protected]>"}'```

----
2020-03-30 13:38:11 UTC - Sergii Zhevzhyk: It seems that reader fits a lot 
better for your use case. The performance should be the same as for consumer, 
because "Internally, the reader interface is implemented as a consumer using an 
exclusive, non-durable subscription to the topic with a randomly-allocated 
name." <https://pulsar.apache.org/docs/en/concepts-clients/#reader-interface>
----
2020-03-30 14:38:02 UTC - Evan Furman: Hi! Is there any way to get around the 
expensive rebalance after adding a new consumer? I’m hoping to hear we are 
doing something wrong. Our end goal is to make consumers able to autoscale but 
during our tests it seems this might not be possible. This is the main problem 
we had with Kafka as well.

We configured 200 consumers to read from a topic and a single producer to write 
to that topic. Upon adding only a few more consumers tho, we see huge 
latencies. I understand some latency is expected but what I was hoping to be in 
the magnitude of milliseconds is actually in the magnitude of full seconds. 
Performance takes a HUGE hit as the consumers are regrouping.

cc: @Tim Corbett 
----
2020-03-30 16:00:14 UTC - David Kjerrumgaard: @Alex Sim Are you running in 
standalone mode or with a multi-node cluster?  Can you share the relevent 
properties and the code you are using to "peek" at the messages?
----
2020-03-30 16:12:17 UTC - Sijie Guo: Pulsar manager?
----
2020-03-30 16:14:52 UTC - Sijie Guo: How did you measure the latency? What 
subscription mode are you using?
----
2020-03-30 16:19:00 UTC - Evan Furman: We’re using `KEY_SHARED` and testing 
with the `pulsar-perf` tool.
```/pulsar/bin/pulsar-perf consume 
<persistent://public/default/mp-explicit-partitioned> --subscriber-name 
mp-pulsar-consumer -i 5 -t 8 -st Key_Shared -u <pulsar://broker:6650/>```

----
2020-03-30 16:24:05 UTC - Sijie Guo: How did you measure the latency? 

Adding consumers to a key_shared subscription will split the key range. It 
doesn’t do any regrouping as what Kafka does. What did you observe?
----
2020-03-30 16:25:34 UTC - Evan Furman: We were observing the msg/sec while 
tailing the output of the perf monitoring tool. Give me a sec and I will fire 
it back up so we can pull some logs.
----
2020-03-30 17:08:18 UTC - Evan Furman: 
----
2020-03-30 17:08:25 UTC - Evan Furman: here you can really see it
----
2020-03-30 17:09:37 UTC - Evan Furman: This is during scaling the consumer… 
normal is ~10-11 ms mean latency. Then all the sudden we start seeing numbers 
like 66126.846 ms :scream:
----
2020-03-30 18:16:30 UTC - Sijie Guo: Oh. I see. I don’t think the latency is 
coming from “re-grouping”. the latency is actually coming from “redelivery”. So 
when a new consumer joins the key range, it splits the key range. a new key 
range is moved from a new consumer, upon moving the key range to a new 
consumer, it dispatches “unacked” messages to a new consumer. @Penghui Li can 
help check how we can improve here.
----
2020-03-30 19:27:54 UTC - Rankesh Kumar: @Rankesh Kumar has joined the channel
----
2020-03-30 20:33:07 UTC - Mike Russell: @Mike Russell has joined the channel
----
2020-03-31 01:22:09 UTC - Evan Furman: Thank you! Going to try again with 
setting the `-o` parameter on the consumer to something more sane than the 
default of `1000`. Will report back with findings
----
2020-03-31 01:28:14 UTC - Alex Sim: @Sijie Guo @David Kjerrumgaard Hi, thanks 
for your replies!  Yes, it's Pulsar Manager and running on standalone. As for 
the properties and code I've used, it's only just turning pulsar.peek.messages 
in application.properties to true. I'm an absolute beginner  tasked to upgrade 
our Pulsar server from 2.4.0 to 2.5.0 as well as enabling peek messages, so I'm 
really a bit stumped, sorry &gt;&lt;
----
2020-03-31 01:29:56 UTC - Alex Sim: I was assuming this would be enough to 
enable it as I don't see any documents indicating otherwise. Perhaps I didn't 
restart the (correct) relevant scripts?
----
2020-03-31 01:39:29 UTC - Sijie Guo: Which version of pulsar manager are you 
using? I don’t think peek message was released.
----
2020-03-31 01:39:35 UTC - Sijie Guo: @tuteng?
----
2020-03-31 01:50:39 UTC - tuteng: Yes, if you want to turn on peek-message, all 
you need to do is turn on the configuration and restart with the following 
command: /pulsar-manager/pulsar-manager/bin/pulsar-manager  
--spring.config.location=your-path/application.properties
----
2020-03-31 01:52:12 UTC - tuteng: If you don't want to specify a configuration 
file on the command line, you can directly build the new package and start it 
with the following command:
```./gradlew -x build -x test
./build/distributions/pulsar-manager/bin/pulsar-manager```

----
2020-03-31 02:24:09 UTC - Alex Sim: The first solution worked! I initially 
rebuilt the package(as according to the second solution provided) and it didn't 
work for mine. I'm not sure why, but it should just be something I'm not aware 
of. Thank you so much! Really appreciate this!
----
2020-03-31 02:50:31 UTC - Raman Gupta: I'm interested in this scenario too, and 
agree that consumer rebalancing was one of the biggest pain points with Kafka. 
@Sijie Guo why would adding consumers cause any redeliveries? Shouldn't adding 
consumers just cause new messages to be routed to the new consumers? Why would 
it cause any work being done by existing consumers to be redelivered?
----
2020-03-31 03:05:01 UTC - Sijie Guo: &gt; Shouldn’t adding consumers just cause 
new messages to be routed to the new consumers?
for shared subscription, if you are adding a new consumer, it doesn’t cause 
redelivery.

for failover and key_shared subscription, due to the ordering constraint, when 
a consumer is added, a partition or a key_range might be re-assigned to a new 
consumer. when the “re-assignment” happens, it re-dispatches messages from the 
last consumption state. hence messages are already dispatched but not yet 
acknowledged will be redelivered.

If there is no ordering constraint, we can optimize this very easily. but yet 
this is an area that we have been looking into optimizing the existing 
behavior. /cc @Penghui Li
----
2020-03-31 03:11:23 UTC - Tim Corbett: I work with @Evan Furman and I believe 
we just completed another test with receive queue depth of 50 (assuming that's 
what `-q` does in the pulsar-perf tool), but still saw 30000ms+ mean latencies. 
 I'm not sure why being behind by so few messages even if resent should take so 
long?  Also, it seems likely this mechanism would cause duplicate messages 
every time it reassigns consumers, right?
----
2020-03-31 03:15:34 UTC - Tim Corbett: (half-baked idea) It would almost be 
better if it waited for the consumer it was taking messages away from to fully 
acknowledge the outstanding messages of that hash range before sending any 
messages to the new consumer.  New consumers would have a delay in start-up 
time but otherwise the system would kind of chug along.
----
2020-03-31 03:19:44 UTC - Penghui Li: I have not tested the scenario of 50 
receive queue size before. Is this a small delay in other subscription models?
----
2020-03-31 03:20:31 UTC - Tim Corbett: Our use case would rely on either 
failover or keyshared, and I believe we have only tested keyshared so far.
----
2020-03-31 03:20:57 UTC - Tim Corbett: The reduced queue size was simply an 
attempt to have fewer messages unacked to redeliver
----
2020-03-31 03:25:31 UTC - Penghui Li: I have checked the key_shared 
subscription message dispatch code, the new consumer does not result in message 
duplication, but can result in message out of order and there is a issue to 
track the improvement. <https://github.com/apache/pulsar/issues/6554>
----
2020-03-31 03:27:40 UTC - Tim Corbett: Oh, so my half-baked idea was already a 
fully-fledged idea by someone else.  That's good news.
----
2020-03-31 03:29:31 UTC - Penghui Li: Let me try on my laptop.
----
2020-03-31 03:30:28 UTC - Tim Corbett: If there is no duplication and it 
currently allows messages out-of-order (fine for our use case pretty much), 
does that mean the measured latency is due to those out-of-order messages being 
redelivered at some leisurely pace?  I can attempt to write a test tool to 
isolate that measurement if so, though it will take a couple of days.
+1 : Penghui Li
----
2020-03-31 03:33:33 UTC - Unni: @Unni has joined the channel
----
2020-03-31 03:37:01 UTC - Tim Corbett: I am still confused how a redelivery 
(out-of-order or not) would not cause duplicate message processing at the 
consumer side, however.  Assuming I'm consumer A and I received messages 1, 2, 
and 3, and ack'ed 1, then 2 and 3 were redelivered to consumer B, what's to 
stop both consumers from attempting processing?
----
2020-03-31 03:40:16 UTC - Penghui Li: Message redelivery would cause duplicate, 
I mean I check the source code assign a new consumer to a hash range does not 
redeliver the unack messages of the old consumer. But if the old consumer 
disconnected, the unack messages may redeliver to the new consumer.
----
2020-03-31 03:43:50 UTC - Tim Corbett: Hmm, now I'm confused (sorry!).  I 
thought redelivery was a stated reason for the high latency reported after a 
reassignment?
+1 : Penghui Li
----
2020-03-31 03:44:54 UTC - Penghui Li: So, if message A is dispatched to 
consumer A but not acked, only new messages will dispatch to the consumer B(new 
consumer B connected and responsible for the half key hash range of consumer A 
), If consumer A redeliver message A or consumer crash, message A may send to 
consumer B.
----
2020-03-31 03:45:53 UTC - Penghui Li: &gt; I thought redelivery was a stated 
reason for the high latency reported after a reassignment?
I'm not sure right now, I need to test it.
----
2020-03-31 03:46:04 UTC - Tim Corbett: Gotcha.  Thanks for all your help!
----
2020-03-31 03:46:17 UTC - Penghui Li: You are welcome.
----
2020-03-31 03:51:22 UTC - Penghui Li: I have a question, how do you add 
consumers by using pulsar-perf? I want to know more details about the test, so 
that we can keep in sync. If using -n to increase consumer count, we should 
stop pulsar-perf and then update -n and then start the pulsar-perf right?
----
2020-03-31 03:51:51 UTC - Tim Corbett: Nope.  We are using an ECS cluster and 
spinning up docker tasks for each consumer
----
2020-03-31 03:52:26 UTC - Tim Corbett: We don't seem to specify a -n
----
2020-03-31 03:52:53 UTC - Tim Corbett: The idea is definitely to be able to 
keep the bulk of the consumers stable when adding a new one
----
2020-03-31 03:54:10 UTC - Penghui Li: Yes, the current behavior is find the 
largest key hash range to split, so in theory only affects one consumer.
----
2020-03-31 03:57:01 UTC - Tim Corbett: The numbers we pull are just some some 
random consumer generally, I don't know if we've checked multiple yet
----
2020-03-31 04:27:22 UTC - Raman Gupta: Wow, I just assumed no messages would be 
delivered to a new consumer until all previous already delivered messages in 
that keyspace had been acked. Good to know that's not the case, as ordering is 
important for my use case. The docs should have a big warning about this.
+1 : Hiroyuki Yamada
----
2020-03-31 05:11:42 UTC - Penghui Li: @Raman Gupta Maybe you can try sticky 
consumer that a consumer with fixed key hash range.
----
2020-03-31 05:18:39 UTC - Kartik Gupta: Hey folks, Need to confirm one thing - 
we know for client producer/consumer calls, one broker can ask the client to 
connect to the topic-partition owner broker. Can  such broker redirection 
happen for admin calls also, or will any individual broker be able to satisfy 
all the admin calls for any topic
----
2020-03-31 05:21:09 UTC - Raman Gupta: That doesn't easily solve the scaling 
up/down problem, unless I implement all the relevant logic at the application 
level.
+1 : Hiroyuki Yamada, Poul Henriksen
----
2020-03-31 05:28:52 UTC - Penghui Li: Yes, it's better to improve the 
auto_split behavior.
----
2020-03-31 05:34:38 UTC - Poul Henriksen: We are in the process of migrating to 
Pulsar, and this issue would be a deal-breaker for us if not fixed. Is it 
realistic that this will be fixed with the next release?
+1 : Hiroyuki Yamada, Raman Gupta
----
2020-03-31 05:38:44 UTC - Penghui Li: @Tim Corbett @Evan Furman Could you 
please send me the topic stats after the problem happens? I checked the log 
file that @Evan Furman sends in the thread, the consume throughput lowered a 
lot at that moment. So if there are much un-acked messages on the consumer or 
much backlogs on the subscription?

I test it on my laptop, start a producer to publish messages and start 200 
consumers to consume messages. If I disable the batch message, the message 
consumption rate can't catch up the message publish rate, so that this will 
result in high consume latency. The publish rate is almost 50k/s and the 
consume rate almost 18k/s. I changed the subscription type Shared, the result 
is the same. So is your test with high throughput?
----
2020-03-31 05:39:37 UTC - Penghui Li: &gt; We are in the process of migrating 
to Pulsar, and this issue would be a deal-breaker for us if not fixed. Is it 
realistic that this will be fixed with the next release?
Version 2.5.1 is cut, I think we can fix it at version 2.5.2 or 2.6.0
+1 : Hiroyuki Yamada, Raman Gupta, Poul Henriksen
----
2020-03-31 05:59:25 UTC - Poul Henriksen: Question about message retention: The 
message retention polices can be configured on instance or namespace level, but 
at what granularity does the message retention policy apply? Namespace, topic, 
subscription or something else? If topic/subscription then how does it apply to 
partioned topics?
----
2020-03-31 06:22:00 UTC - Tim Corbett: I can't get you logs right now, that 
will have to wait until tomorrow.  @Evan Furman can assist hopefully.  As far 
as publishing, we are not testing under particular stress.  I believe we capped 
our publisher to 10k msgs/s, but even lower should probably have reproducible 
results.  As you can see in our logs, when we have 200 consumers, consumption 
is down pretty low per (topic)/consumer. (around 90-100 msgs/s each)
----
2020-03-31 06:24:12 UTC - Tim Corbett: One other thing to note, we could not 
use pulsar-perf to publish the messages, because it did not seem to include any 
routing/partitioning key diversity, so messages tended to clump up on only 
one/some consumers.
----
2020-03-31 06:28:14 UTC - Ken Huang: Hi, I use a helm chart to deploy 3 pulsar 
cluster and I want to enable Geo-Replication.
Should I need to deploy a global zookeeper? but I don't know how to do it.
Or I can add other clusters in each cluster
----

Slack digest for #general - 2020-03-31

Reply via email to