Slack digest for #general - 2020-06-16

Apache Pulsar Slack Tue, 16 Jun 2020 02:11:17 -0700

2020-06-15 09:51:41 UTC - Tushar Sonawane: @Tushar Sonawane has joined the 
channel
----
2020-06-15 11:00:39 UTC - Dave Winstone: @Dave Winstone has joined the channel
----
2020-06-15 11:10:21 UTC - Grzegorz: @Grzegorz has joined the channel
----
2020-06-15 11:18:07 UTC - Grzegorz: Hi guys. I have a question regarding topics 
in Pulsar. I don't understand how partitioned and non-partitioned topics are 
related to persistence.
According to docs #1 both non-partitioned and partitioned topics must be 
explicitly created.  But here #2 I can see that topics are automatically 
created.
How these two categories are connected? Are auto-created topics non-partitioned 
by default? If so why #1 says non-partitioned must be created explicitely? Can 
I have persistent and partitioned topic?


#1 <https://pulsar.apache.org/docs/en/2.5.0/admin-api-non-partitioned-topics/>
#2 <https://pulsar.apache.org/docs/en/2.5.0/concepts-messaging/>
----
2020-06-15 11:21:24 UTC - Ankur Jain: New to pulsar (have worked with kafka in 
production) and have been going through docs/talks on internet around the same. 
We are evaluating it for replacing a bunch of kafka deployments with a 
multi-tenant messaging platform and impressed with the pains related to Kafka 
that it can potentially solve. Throughput requirements for such a platform are 
~5M messages/sec but cumulative across topics including log/metric ingestion, 
data streaming, etc. We are yet to run some benchmarks. I had few concerns 
around pulsar's setup and would love to get some pointers from the community:
1. Read from partition in pulsar is tied to a broker similar to kafka. We have 
ordering requirements so scaling out consumption in kafka means adding new 
partitions but old partitions continue to remain a bottleneck. I believe this 
is the same with pulsar? Any guidelines on how to configure partitions for a 
topic and increase them for ordered streams? Fanouts are not preknown to us 
always and a high-throughput topic can become popular later.
2. Lagging consumers compete for disk i/o on kafka brokers. Impacts produce 
throughput as well as consume throughput. Since brokers are stateless in 
pulsar, it would mean a n/w hop to bookie (old messages will not be in broker 
cache) and random disk i/o on bookie to read the fragment. Is my understanding 
correct and how does pulsar ensure optimal performance(better read 
latency/throughput than kafka?) for catch-up consumers. We unfortunately have 
systems where catch-up consumers are a reality.
3. We have machines with 1 hdd in our datacenter. ssd-specific machines are 
available but costlier and lower capacity so aggressive retention which we may 
not want. Recommended bookie configurations suggest 1 disk(ssd?) for wal and 1 
disk for ledger mounted on the same instance. If we don't have such instance 
types available, how much impact can we expect on reads/writes through 
bookkeeper as compared to kafka brokers which we run today on these single hdd 
instances. I understand that kafka does not do fsync and we are ok with the 
same level of durability which disabling fsync in bookkeeper would achieve. 
Keeping that in mind, do the instance types we have can make the cut for 
bookies and how can that be tuned? Official docs do not mention anything around 
this.
----
2020-06-15 11:41:56 UTC - Sankararao Routhu: yes @Sijie Guo they are in the 
same VPC
----
2020-06-15 13:25:14 UTC - Konstantinos Papalias: re 1. Some strategy both on 
Kafka and Pulsar would be to always over-provision partitions for a topic, did 
you have any restrictions for not doing it on Kafka? E.g large number of 
partitions across all topics ?
----
2020-06-15 14:17:03 UTC - Raphael Enns: Hi. I'm getting a lot of WARN log lines 
in my pulsar-broker.log file. It is always wrapped in the set of log lines like 
below (the 2nd line):
```10:13:07.025 [pulsar-io-22-1] INFO  
org.apache.pulsar.broker.service.ServerCnx - [/127.0.0.1:50580] Subscribing on 
topic <persistent://public/default/4736a6614c9a4e29aa73871956180ef1> / 
reader-02257d519f
10:13:07.025 [pulsar-io-22-1] WARN  
org.apache.pulsar.broker.service.BrokerService - No autoTopicCreateOverride 
policy found for <persistent://public/default/4736a6614c9a4e29aa73871956180ef1>
10:13:07.025 [pulsar-io-22-1] INFO  
org.apache.pulsar.broker.service.persistent.PersistentTopic - 
[<persistent://public/default/4736a6614c9a4e29aa73871956180ef1>][reader-02257d519f]
 Creating non-durable subscription at msg id 13144:1:-1:0
10:13:07.025 [pulsar-io-22-1] INFO  
org.apache.bookkeeper.mledger.impl.ManagedCursorImpl - 
[public/default/persistent/4736a6614c9a4e29aa73871956180ef1-reader-02257d519f] 
Rewind from 79:0 to 79:0
10:13:07.025 [pulsar-io-22-1] INFO  
org.apache.pulsar.broker.service.persistent.PersistentTopic - 
[<persistent://public/default/4736a6614c9a4e29aa73871956180ef1>] There are no 
replicated subscriptions on the topic
10:13:07.025 [pulsar-io-22-1] INFO  
org.apache.pulsar.broker.service.persistent.PersistentTopic - 
[<persistent://public/default/4736a6614c9a4e29aa73871956180ef1>][reader-02257d519f]
 Created new subscription for 2077315
10:13:07.025 [pulsar-io-22-1] INFO  org.apache.pulsar.broker.service.ServerCnx 
- [/127.0.0.1:50580] Created subscription on topic 
<persistent://public/default/4736a6614c9a4e29aa73871956180ef1> / 
reader-02257d519f```
Is something wrong happening here? Is there anything I can do to fix it?
Thanks
----
2020-06-15 14:27:11 UTC - Ankur Jain: Thanks for the response! On one of the 
bigger kafka clusters we have, there are ~4K topics. Over-provisioning 
partitions means increasing zk load on fetch requests, broker startup times get 
increased, etc. So while we do over provision, there is always an element of 
caution attached. Seeing how pulsar uses zk to do bookkeeping as well, I 
assumed similar concerns would apply here as well.
Hoping to get some response on remaining concerns also :slightly_smiling_face:
ok_hand : Konstantinos Papalias
----
2020-06-15 15:25:01 UTC - Marcio Martins: Thanks @Sijie Guo
----
2020-06-15 15:32:23 UTC - Sankararao Routhu: when single broker is given for 
brokerServiceUrl, proxy to broker lookup is working but when nlb is given, 
proxy to broker lookup is not working
----
2020-06-15 17:13:31 UTC - Addison Higham: Question for y'all: what are your key 
metrics for seeing how close a pulsar cluster is to capacity? I know that it is 
a simple question with a complicated answer, but just curious what sort of 
top-line metrics you pay the most attention to (or better yet, what you alert 
on to indicate it may be time to scale up)
+1 : Anup Ghatage
----
2020-06-15 17:43:01 UTC - Asaf Mesika: Very interesting design choice
----
2020-06-15 17:43:17 UTC - Asaf Mesika: Wondering the motivation to go this way?
----
2020-06-15 17:43:55 UTC - Asaf Mesika: So this code will only “live” as pulsar 
function, or you will also have library code running in your microservice?
----
2020-06-15 17:45:55 UTC - Asaf Mesika: Also, another question, how did you 
solve the double submission issue: Say you have a Job (e.g. Send Mail) and you 
launch it with parameters like: emailAddress and body. Now presume you 
submitted it twice with same parameters (say you did something and you didn’t 
commit, app got killed, restarted and did it again). How did you think to 
protect against running the same job twice (or even, due to this, in parallel, 
since now you have two messages representing the same execution)?
----
2020-06-15 17:49:02 UTC - Gilles Barbier: The intention is mainly to minimize 
the need for any infrastructure beyond Pulsar itself. I know from experience 
that scaling distributed systems is hard and I prefer to rely on Pulsar for 
that.
----
2020-06-15 17:52:12 UTC - Gilles Barbier: Regarding your question, each client 
producing a command includes an unique id with it. If a "Send Email" is sent 
twice with same id, the second one will be discarded by the engine function 
that maintains the state of this task.
----
2020-06-15 17:52:49 UTC - Asaf Mesika: ok. What happens if you have 1M unique 
IDs every min?
----
2020-06-15 17:53:07 UTC - Asaf Mesika: How is the state managed? In-memory in a 
single machine?
----
2020-06-15 17:56:20 UTC - Gilles Barbier: Yeah, for this scale Pulsar is still 
missing a  feature, but it should be a relatively easy one to add: to have a 
key-shared subscription for functions. With it, and using the id for keys, you 
should be able to scale horizontally your engine functions and even to keep 
states in memory if needed
----
2020-06-15 17:58:50 UTC - Gilles Barbier: I have entered an issue for that 
<https://github.com/apache/pulsar/issues/6527>
----
2020-06-15 18:03:45 UTC - Gilles Barbier: (note that even today it's possible 
to run the engine on a key-shared subscription as a "normal" consumer, but then 
you need to add your own  storage for states)
----
2020-06-15 18:07:44 UTC - Asaf Mesika: So the equivalent in my case which is 
only a library code is to have shared state like KV store which is persisted 
----
2020-06-15 18:08:55 UTC - Asaf Mesika: I really wish would allow dedup like 
they do but based on my own key
----
2020-06-15 18:09:13 UTC - Asaf Mesika: And not always increase key
----
2020-06-15 18:20:43 UTC - Gilles Barbier: You are aware also that Pulsar has 
message deduplication feature 
<http://pulsar.apache.org/docs/en/cookbooks-deduplication/>
----
2020-06-15 18:21:34 UTC - Gilles Barbier: oh I see, it's what you mean by "like 
they do but based on my own key"
----
2020-06-15 18:22:18 UTC - Manjunath Ghargi: For question #2: In Pulsar there is 
a different read cache v/s write cache, hence even if there are millions of 
backlogs producer throughput is not effected. consumers can consume at thier 
own pace.
----
2020-06-15 18:36:20 UTC - Manjunath Ghargi: Hi,
How do we equally balance the load across all the brokers?
We have 3 brokers configured with 9 topics * 3 partitions each, the load is 
equally getting divided across partitions but not across brokers, some times 
for a given topic all 3 partitions are being served by a single broker. which 
is causing the CPU to go high as 80% on one broker while the other 2 brokers 
have less than 20% CPU.
On the producer client end, we are setting 
(MessageRoutingMode=RoundRobinParition)
Can someone suggest the right configuration for balancing the load across 
brokers equally?
----
2020-06-15 18:54:34 UTC - Manjunath Ghargi: @Sijie Guo: Can you please suggest 
the right configuration?
----
2020-06-15 18:55:16 UTC - Asaf Mesika: Btw: can you live good without the TX 
that is coming in 2.7.0? Your function in produce a message but fail to ack
----
2020-06-15 18:59:56 UTC - Alexandre DUVAL: Hi, there is a way to tune max http 
header content length/size? :D
----
2020-06-15 19:14:56 UTC - Gilles Barbier: As far as I can say, the worst case 
is a duplicate task processing
----
2020-06-15 19:22:05 UTC - Addison Higham: are all these topics in the same 
namespace @Manjunath Ghargi?
----
2020-06-15 19:24:13 UTC - Addison Higham: topics get grouped into "bundles" 
which is just a hash ring. Bundles are what get mapped to brokers. By default, 
a namespace has 4 bundles. Bundles should eventually split naturally, but it 
can take a while, so if they are all in the same namespace, it may make a lot 
of sense to turn those 4 bundles into like 12 bundles, giving you more "units" 
that can be  distributed across the cluster
----
2020-06-15 20:06:05 UTC - Alexandre DUVAL: (on proxy)
----
2020-06-15 20:41:04 UTC - Sree Vaddi: 
<https://pulsar-summit.org/schedule/first-day>
----
2020-06-15 20:48:30 UTC - David Lanouette: Will this be recorded?  There's lots 
of interesting talks, but I've gotta work during those times. :disappointed:
----
2020-06-15 20:55:13 UTC - Ankur Jain: @Manjunath Ghargi Won't the write cache 
required to be flushed to log entry disk while they are reads happening from 
the same disk for catchup-consumers causing contention similar to kafka? Maybe 
I am missing something here?
----
2020-06-15 21:31:33 UTC - Addison Higham: is there a way to force major 
compaction on a bookie?
----
2020-06-15 22:24:44 UTC - Anup Ghatage: You can set config param 
`isForceGCAllowWhenNoSpace=true`  if you want to force GC when you’re running 
out of space.
Other than that, I’ve not done this myself, but there are configuration options 
we can play around with for this.
viz. `majorCompactionInterval`, `majorCompactionThreshold`  and `gcWaitTime`

Even if you do set these, you’d have to restart bookies with these config 
options set to whatever values you want.
----
2020-06-15 23:18:21 UTC - Sijie Guo: I think you also provide the http endpoint 
to do so.
----
2020-06-15 23:19:07 UTC - Sijie Guo: It will be recorded and uploaded to 
Youtube after the summit. Follow @PulsarSummt to the updates 
:slightly_smiling_face:
----
2020-06-15 23:20:30 UTC - Sijie Guo: 1. Disk space
2. Storage size &amp; msg backlog
3. Latency
----
2020-06-15 23:21:33 UTC - Sijie Guo: I think the warning logs are introduced by 
a new feature. They are  annoying. @Penghui Li @jia zhai we should consider 
reduce the logging level.
----
2020-06-15 23:49:47 UTC - Matteo Merli: You mean to impose a max size?
----
2020-06-15 23:51:54 UTC - Alexandre DUVAL: there is currently a max size and 
i'd like to configure it
----
2020-06-15 23:52:11 UTC - Alexandre DUVAL: ```Jun 15 21:41:31 yo-pulsar-c1-n9 
pulsar-proxy[28315]: 21:41:31.096 [pulsar-external-web-5-6] DEBUG 
org.eclipse.jetty.client.HttpSender - Request shutdown output HttpRequest[GET 
/admin/v2/namespaces/user_7684cfc9-f54e-4e09-848c-1953af6e3e89/pulsar_c61f4274-4725-4e9f-8196-3cd7edcd77e5/retention
 HTTP/1.1]@1965e4b
7
Jun 15 21:41:31 yo-pulsar-c1-n9 pulsar-proxy[28315]: 21:41:31.096 
[pulsar-external-web-5-6] DEBUG org.eclipse.jetty.client.HttpSender - Request 
failure HttpRequest[GET 
/admin/v2/namespaces/user_7684cfc9-f54e-4e09-848c-1953af6e3e89/pulsar_c61f4274-4725-4e9f-8196-3cd7edcd77e5/retention
 HTTP/1.1]@1965e4b7 HttpEx
change@6c710cea req=COMPLETED/org.eclipse.jetty.http.BadMessageException: 500: 
Request header too large@6e4cc94a res=PENDING/null@null on 
HttpChannelOverHTTP@47ba053f(exchange=HttpExchange@6c710cea 
req=COMPLETED/org.eclipse.jetty.http.BadMessageException: 500: Request header 
too large@6e4cc94a res=PENDING/null@null)[s
end=HttpSenderOverHTTP@7545f435(req=FAILURE,snd=FAILED,failure=org.eclipse.jetty.http.BadMessageException:
 500: Request header too 
large)[HttpGenerator@571d274{s=END}],recv=HttpReceiverOverHTTP@1c455293(rsp=IDLE,failure=null)[HttpParser{s=START,0
 of -1}]]: {}
Jun 15 21:41:31 yo-pulsar-c1-n9 pulsar-proxy[28315]: 21:41:31.096 
[pulsar-external-web-5-2] DEBUG org.eclipse.jetty.io.ManagedSelector - Created 
SocketChannelEndPoint@3ac82237{yo-pulsar-c1-n2/192.168.10.11:2005&lt;-&gt;/192.168.10.14:55540,OPEN,fill=FI,flush=-,to=1/30000}{io=1/1,kio=1,kro=8}-&gt;SslConnection
@2d469d74{NOT_HANDSHAKING,eio=-1/-1,di=-1,fill=INTERESTED,flush=IDLE}~&gt;DecryptedEndPoint@56e4c405{yo-pulsar-c1-n2/192.168.10.11:2005&lt;-&gt;/192.168.10.14:55540,OPEN,fill=FI,flush=-,to=2/30000}=&gt;HttpConnectionOverHTTP@492f6c59(l:/192.168.10.14:55540
 &lt;-&gt; r:yo-pulsar-c1-n2/192.168.10.11:2005,closed=false)=&gt;
HttpChannelOverHTTP@47ba053f(exchange=HttpExchange@6c710cea 
req=COMPLETED/org.eclipse.jetty.http.BadMessageException: 500: Request header 
too large@6e4cc94a 
res=PENDING/null@null)[send=HttpSenderOverHTTP@7545f435(req=FAILURE,snd=FAILED,failure=org.eclipse.jetty.http.BadMessageException:
 500: Request header too large)[
HttpGenerator@571d274{s=END}],recv=HttpReceiverOverHTTP@1c455293(rsp=IDLE,failure=null)[HttpParser{s=START,0
 of -1}]]
Jun 15 21:41:31 yo-pulsar-c1-n9 pulsar-proxy[28315]: 21:41:31.096 
[pulsar-external-web-5-6] DEBUG org.eclipse.jetty.client.HttpExchange - 
Terminated request for HttpExchange@6c710cea 
req=TERMINATED/org.eclipse.jetty.http.BadMessageException: 500: Request header 
too large@6e4cc94a res=PENDING/null@null, result
: null```
----
2020-06-15 23:52:17 UTC - Alexandre DUVAL: @Matteo Merli ^
----
2020-06-15 23:52:56 UTC - Alexandre DUVAL: so there is a limitation on the 
request from proxy to broker, right?
----
2020-06-15 23:53:03 UTC - Matteo Merli: I think there should be one that we 
configure on jetty 
----
2020-06-15 23:53:51 UTC - Alexandre DUVAL: currently i think it comes from 
jetty defaults
----
2020-06-15 23:53:54 UTC - Alexandre DUVAL: ```        value = 
config.getInitParameter("requestBufferSize");
        if (value != null)
            client.setRequestBufferSize(Integer.parseInt(value));```
----
2020-06-15 23:54:44 UTC - Alexandre DUVAL: (if it's the request from proxy to 
broker that throw this)
----
2020-06-15 23:56:01 UTC - Matteo Merli: In the exception above, is it thrown by 
broker or proxy?
----
2020-06-15 23:56:43 UTC - Alexandre DUVAL: proxy
----
2020-06-15 23:57:14 UTC - Alexandre DUVAL: i tried to run the request without 
proxy, it works well and doesn't throw this
----
2020-06-15 23:57:34 UTC - Alexandre DUVAL: so i admitted it was during the 
proxy to broker request
----
2020-06-15 23:59:11 UTC - Alexandre DUVAL: so i think the only way is to put 
`client.setRequestBufferSize(32000);` on AdminProxyHandler httpclient
----
2020-06-15 23:59:18 UTC - Alexandre DUVAL: as quickfix
----
2020-06-16 00:14:17 UTC - jia zhai: sure
----
2020-06-16 00:21:11 UTC - Greg Methvin: I think we’re talking about the 
subscription backlog, which is `msgBacklog` in the topic stats. My 
understanding is that this is actually the number of message _batches_ rather 
than the actual number of messages. Is that still the case?
----
2020-06-16 00:30:16 UTC - Penghui Li: This is fixed by 
<https://github.com/apache/pulsar/pull/7080> at will release at 2.6.0
----
2020-06-16 00:48:54 UTC - David Lanouette: Great!  Thanks for the response.
----
2020-06-16 00:54:28 UTC - Matteo Merli: Got it. We should then probably make 
that configurable.
----
2020-06-16 00:56:25 UTC - Matteo Merli: @Ankur Jain

&gt;  We have ordering requirements so scaling out consumption in kafka means 
adding new partitions but old partitions continue to remain a bottleneck. I 
believe this is the same with pulsar?
You can use KeyShared subscription type to be able to scale consumers within a 
single partition, while retaining ordering (per key)
----
2020-06-16 00:56:43 UTC - Matteo Merli: So, no need to over-provision partitions
----
2020-06-16 01:01:22 UTC - Matteo Merli: &gt;  Won't the write cache required to 
be flushed to log entry disk while they are reads happening from the same disk 
for catchup-consumers causing contention similar to kafka? Maybe I am missing 
something here?
The way bookies write to disk is very different from Kafka. All writes the 
bookie does are sequential.

The critical writes are on the journal, which is dedicated to the writes, so no 
interferences from reads.

When we flush the write-cache, this happens in background, so it does not 
impact the latency of requests. Also, it's a bulk write on one single big file 
(or few files), so it's as efficient as it can get when writing to disk.

In kafka, you're basically relying on page cache to do background flushes, but 
it's writing on multiple files at the same time (eg: 5 files per partition, 
between data and indices) so it's really generating random IO write workload 
which works not very well for SSDs. When the page cache is under-pressure 
(since the reads will pull in old files), this *will* definitely impact your 
write latency.
----
2020-06-16 01:03:50 UTC - Alexandre DUVAL: indeed
----
2020-06-16 01:59:18 UTC - Rich Adams: @Rich Adams has joined the channel
----
2020-06-16 05:28:54 UTC - Hiroyuki Yamada: @Penghui Li @Matteo Merli I reported 
in the Github issue page.
<https://github.com/apache/pulsar/issues/5819#issuecomment-644510797>

As far as I checked, the issue is still happening. But  it doesn’t happen if 
consistent hashing is used.
Please take a look.
----
2020-06-16 05:42:16 UTC - Narayan: @Narayan has joined the channel
----
2020-06-16 05:44:01 UTC - Penghui Li: Thanks for you update, maybe there are 
some problems with the auto-split mechanism. I have reopened the 
<https://github.com/apache/pulsar/issues/6554|#6554> for tracking this issue.
man-bowing : Hiroyuki Yamada
+1 : Hiroyuki Yamada
----
2020-06-16 06:27:04 UTC - Isaiah Rairdon: Looking for some guidance with 
liveness probe. I was trying to add liveness probe to our helm chart. I haven't 
been able to get the OOTB probe to work as we are using mtls for Auth. I have 
working curl calls working for basic admin calls like
curl <https://pulsar-broker:8443/admin/clusters> --key server.key --cacert 
issue_ca.crt --cert server.crt
but doing the same call using route for status.html give me a 404. Am I doing 
something wrong with the route or call here?
curl <https://pulsar-broker:8443/status.html> --key server.key --cacert 
issue_ca.crt --cert server.crt
&lt;html&gt;
&lt;head&gt;
&lt;meta http-equiv="Content-Type" content="text/html;charset=utf-8"/&gt;
&lt;title&gt;Error 404 Not Found&lt;/title&gt;
&lt;/head&gt;
&lt;body&gt;&lt;h2&gt;HTTP ERROR 404&lt;/h2&gt;
&lt;p&gt;Problem accessing /status.html. Reason:
&lt;pre&gt;    Not Found&lt;/pre&gt;&lt;/p&gt;&lt;hr&gt;&lt;a 
href="<http://eclipse.org/jetty>"&gt;Powered by Jetty:// 
9.4.20.v20190813&lt;/a&gt;&lt;hr/&gt;

&lt;/body&gt;
&lt;/html&gt;
----
2020-06-16 07:17:21 UTC - Sankararao Routhu: <!here> is there a plugin or any 
other way to add whitelisting solution in pulsar-proxy? We wanted to allow 
connections from only trusted/known networks/accounts
----

Slack digest for #general - 2020-06-16

Reply via email to