Slack digest for #general - 2020-06-18

Apache Pulsar Slack Thu, 18 Jun 2020 02:11:22 -0700

2020-06-17 09:47:42 UTC - Mathias Karlsson: @Mathias Karlsson has joined the 
channel
----
2020-06-17 11:42:54 UTC - Marcio Martins: How would I disable it? I didn't find 
anything in the docs.
----
2020-06-17 11:43:01 UTC - Marcio Martins: @Sijie Guo
----
2020-06-17 14:10:08 UTC - jujugrrr: Hi, I'm testing losing all bookkeeper and 
recovering with only tiered storage offloaded messages. It works well, however 
the last ledger was not offloaded, therefore my reader is blocking trying to 
read it. I'm added some other messages now, so I have something like:
```ledger 1 offloaded
ledger 2 offloaded
ledger 3 not offloaded (no data, lost when I deleted local storage)
ledger 4 offloaded
ledger 5 offloaded
ledger 6 opened```
Is there a way to remove ledger 3 so my reader can keep going and jump from 2 
to 4? I've tried, ./bin/bookkeeper delete ledger,  
./bin/pulsar-managed-ledger-admin delete-managed-ledger-ids, delete 
/ledgers/00/0000/L003 in zookeeper. But it looks like there are still 
references to it. pulsar-admin topics stats-internal still shows the ledger. Is 
it not possible to remove it?
----
2020-06-17 14:28:44 UTC - Fred George: &gt; You can use KeyShared subscription 
type to be able to scale consumers within a single partition, while retaining 
ordering (per key)
This isn't strictly true unless it's been fixed recently.  Messages can be 
delivered to multiple consumers during rehashing causing ordering anomalies.
----
2020-06-17 15:09:20 UTC - Matteo Merli: That's correct. I've been working in 
fixing several issue on KeyShared delivery lately. The 2.6 release is having a 
more solid story there
----
2020-06-17 15:22:32 UTC - Addison Higham: does it work with a normal consumer? 
(not a reader) if you manually set the position before ledger 3?
----
2020-06-17 15:24:09 UTC - jujugrrr: I haven't tried as I focused on my use 
case. let me give it a go a bit later
----
2020-06-17 15:39:52 UTC - Marshall Brandt: @Marshall Brandt has joined the 
channel
----
2020-06-17 16:02:23 UTC - Caito Scherr: @Caito Scherr has joined the channel
----
2020-06-17 16:07:04 UTC - Manpreet Babbra: @Manpreet Babbra has joined the 
channel
----
2020-06-17 16:07:28 UTC - Mike: @Mike has joined the channel
----
2020-06-17 16:09:14 UTC - Craig Haywood: @Craig Haywood has joined the channel
----
2020-06-17 16:10:30 UTC - Leonard Ge: @Leonard Ge has joined the channel
----
2020-06-17 16:53:08 UTC - Joao Oliveirinha: @Joao Oliveirinha has joined the 
channel
----
2020-06-17 16:53:56 UTC - Jesse Anderson: to hit the millions of topics 
potential in Pulsar, would the cluster have to be scaled to handle the 
metadata? e.g. could a 5 node cluster have 5 million topics provided the 
message load could be handled?
----
2020-06-17 16:57:00 UTC - Matteo Merli: It depends on various aspect of the 
work-load, but yes, there's a limit of topics per broker that is recommendable.


That is for memory reasons as well as for how fast do you need your failovers 
to be (eg: worst case scenario acceptable publish latency).

For low-latency requirements, we were typically putting the recommended 
ballbpark of max topics/broker to ~100K.
+1 : Jesse Anderson, Julius S, Shivji Kumar Jha, Tamer
----
2020-06-17 16:58:22 UTC - Andrew: @Andrew has joined the channel
----
2020-06-17 17:46:24 UTC - PLarboulette: @PLarboulette has joined the channel
----
2020-06-17 18:33:45 UTC - Kate Kinnear: @Kate Kinnear has joined the channel
----
2020-06-17 18:42:14 UTC - Patrik Kleindl: @Matteo Merli Just for my 
understanding, for partitioned topics this would mean 100k partitions per 
broker or is this independent of partitioning?
----
2020-06-17 18:43:09 UTC - Matteo Merli: correct, that would be 100k partitions. 
From broker's perspective a partition is exactly the same as a non-partitioned 
topic.
+1 : Patrik Kleindl
----
2020-06-17 18:49:27 UTC - Muljadi: @Muljadi has joined the channel
----
2020-06-17 18:52:14 UTC - Abhishek: @Abhishek has joined the channel
----
2020-06-17 18:54:49 UTC - Vijay Bhore: @Vijay Bhore has joined the channel
----
2020-06-17 18:57:05 UTC - Yezen: @Yezen has joined the channel
----
2020-06-17 18:57:24 UTC - Simon Crosby: @Simon Crosby has joined the channel
----
2020-06-17 19:20:09 UTC - Ankit Jain: @Ankit Jain has joined the channel
----
2020-06-17 19:23:16 UTC - maurice barnum: @maurice barnum has joined the channel
----
2020-06-17 19:29:06 UTC - Joe Francis: @Jesse Anderson We have gone beyond a 
million with PIP-8 and we will mention it in our talk tomorrow
----
2020-06-17 19:33:12 UTC - Jesse Anderson: @Joe Francis are you sticking to 
~100k topics per broker?
----
2020-06-17 19:36:31 UTC - Joe Francis: We have stringent start up limits, our 
limit is 60K or so. It is all about how quickly you want to do a cold start. We 
have actually done DC power loss recovery scenarios and this number is based on 
our recovery time guarantees for the whole cluster
----
2020-06-17 19:53:38 UTC - Mihir Rane: @Mihir Rane has joined the channel
----
2020-06-17 19:57:53 UTC - Philip Ittmann: @Philip Ittmann has joined the channel
----
2020-06-17 21:05:02 UTC - Olivier Brazet: @Olivier Brazet has joined the channel
----
2020-06-17 21:33:28 UTC - Ankur Jain: Was attending the summit and hearing 
success stories of moving from kafka to pulsar. I had one question around 
equivalence with kafka when it comes to consumer groups.
If processing capacity of one consumer is a bottleneck, how can we scale to N 
consumers in pulsar for a partitioned topic (with M partitions where M &gt;= N) 
while maintaining strict ordering guarantees for unacked messages (similar to 
consumer group in kafka)? This would mean that if I were to add or remove 
consumers, partitions should be auto balanced among available consumers.
----
2020-06-17 22:19:24 UTC - Kalyn Coose: @Kalyn Coose has joined the channel
----
2020-06-17 22:20:07 UTC - Jesse Anderson: Pulsar works different than Kafka in 
that scenario. For this, you'd use key shared subscriptions 
<http://pulsar.apache.org/docs/en/concepts-messaging/#key_shared>. I'll talk 
about it in my talk tomorrow. IMHO, this is a huge feature difference between 
Pulsar and Kafka.
----
2020-06-18 01:09:10 UTC - Renault: @Renault has joined the channel
----
2020-06-18 01:14:16 UTC - Oleg Kozlov: Hello everyone. I have a question for 
pulsar developers: is there any way to update scheduled delivery time  of a 
previously produced delayed message? Basically ,if i send a message with 1 hour 
delivery delay, and then 20 minutes later want to change that delay to 2 hours 
- what are my options?
----
2020-06-18 01:18:26 UTC - Matteo Merli: It's tough because messages are 
immutable
----
2020-06-18 01:22:59 UTC - Renault: Hi. Has anyone used the 
<https://pulsar.apache.org/docs/en/io-kinesis-source/|Kinesis source connector> 
 functionality? I have a Pulsar cluster running via Helm, but I’m seeing two 
different errors when creating a kinesis source depending on the way I’m 
running the `pulsar-admin source create` command. Thanks in advance!

Error 1 - likely due to misconfiguration of the `source create` command; it 
seems like the broker doesn’t know the kinesis-logging-source is a Kinesis 
stream
```root@pulsar-toolset-0:/pulsar# bin/pulsar-admin source status --name 
kinesis-logging-source
{
  "numInstances" : 1,
  "numRunning" : 0,
  "instances" : [ {
    "instanceId" : 0,
    "status" : {
      "running" : false,
      "error" : "UNAVAILABLE: Unable to resolve host 
pf-public-default-kinesis-logging-source-0.pf-public-default-kinesis-logging-source.pulsar-system.svc.cluster.local",
      "numRestarts" : 0,
      "numReceivedFromSource" : 0,
      "numSystemExceptions" : 0,
      "latestSystemExceptions" : [ ],
      "numSourceExceptions" : 0,
      "latestSourceExceptions" : [ ],
      "numWritten" : 0,
      "lastReceivedTime" : 0,
      "workerId" : 
"c-pulsar-fw-pulsar-broker-2.pulsar-broker.pulsar-system.svc.cluster.local-8080"
    }
  } ]
}```
Error 2 - broker throws a 500 error possibly due to lack of memory
```00:31:08.165 [pulsar-web-44-3] WARN  org.eclipse.jetty.server.HttpChannel - 
/admin/v3/source/public/default/kinesis-logging-source
javax.servlet.ServletException: javax.servlet.ServletException: 
org.glassfish.jersey.server.ContainerException: 
io.netty.util.internal.OutOfDirectMemoryError: failed to allocate 16777216 
byte(s) of direct memory (used: 251658247, max: 268435456)
...
00:31:08.167 [pulsar-web-44-3] INFO  org.eclipse.jetty.server.RequestLog - 
127.0.0.1 - - [18/Jun/2020:00:30:56 +0000] "POST 
/admin/v3/source/public/default/kinesis-logging-source HTTP/1.1" 500 382 "-" 
"Puls
ar-Java-v2.5.2" 11427```
----
2020-06-18 01:32:06 UTC - Oleg Kozlov: Right. We don't really need to change 
the message body, just it' s schedule. Either that, or cancel / delete that 
message altogether, so that we can produce a new one with a new delayed value.
----
2020-06-18 01:33:35 UTC - Oleg Kozlov: Either update delivery time in 
DelayedDeliveryTracker described here 
<http://pulsar.apache.org/docs/en/concepts-messaging/#delayed-message-delivery>.
 But if can just cancel a message before it's scheduled delivery - that would 
work too
----
2020-06-18 02:35:57 UTC - Matteo Merli: If you know the message id, you should 
be able to ack the message and therefore it would be "cancelled"
----
2020-06-18 03:13:59 UTC - Oleg Kozlov: i can ack it even before it's delivered 
to consumer?
----
2020-06-18 03:52:49 UTC - Madhu A: @Madhu A has joined the channel
----
2020-06-18 04:12:58 UTC - Sankararao Routhu: @Luke Stephenson @Matteo Merli any 
thoughts
----
2020-06-18 04:35:14 UTC - Luke Stephenson: Not from me, I just shared what I 
had done in case it could help
----
2020-06-18 07:17:34 UTC - Patrik Kleindl: Yet another question, as Pulsar uses 
Zookeeper too, are the limits in Pulsar related to ZK usage like it is in Kafka 
or is there a difference?
----
2020-06-18 07:47:58 UTC - Pavels Sisojevs: @Pavels Sisojevs has joined the 
channel
----
2020-06-18 08:18:21 UTC - Pavels Sisojevs: hello, I’ve noticed interesting 
behaviour of topic clean up, which looks like a bug to me:

Scenario A:
System publishes messages to topic A. When there are no consumers and publisher 
stops emitting messages topic A is garbage collected.

Scenario B:
System publishes messages to topic B, but also a Pulsar Function publishes 
messages to topic B. When there are no consumers and publishers in the system, 
and also the function do not emit any messages to topic B, I would expect topic 
B to be garbage collected but it is not. Topic B stays there forever. Eg I have 
a topic which didn’t had any consumers or publishers for 3 days but I still can 
see it when listing topics.
Also, it might be important that I’m using a 
`org.apache.pulsar.functions.api.Function` (Java API) and sending the message 
using `newOutputMessage` function
----
2020-06-18 08:39:30 UTC - Pushkar Sawant: @Sijie Guo Any guidance here? Right 
now i have one node at about 99% utilization. I had to force the node into read 
only mode. The node did not transition to read only mode upon 95% usage.
Other servers are showing variable usage between 60% and 80%
----
2020-06-18 09:00:11 UTC - Pushkar Sawant: Now the bookie can not start with the 
error  (<http://java.io|java.io>.IOException: Error open RocksDB database)
Caused by: org.rocksdb.RocksDBException: While open a file for appending: 
/mnt/bookie-hdd/current/ledgers/MANIFEST-000014: No space left on device
----
2020-06-18 09:04:59 UTC - Pushkar Sawant: There are about 1872 unreplicated 
ledger. The number is slowly going down
----

Slack digest for #general - 2020-06-18

Reply via email to