Slack digest for #general - 2020-07-28

Apache Pulsar Slack Tue, 28 Jul 2020 02:12:09 -0700

2020-07-27 10:29:32 UTC - Giacomo Porro: @Giacomo Porro has joined the channel
----
2020-07-27 11:35:11 UTC - Giacomo Porro: Hi everyone, first of all let me 
express my appreciation for this project, really really cool stuff!
I don't know if this is the right place to ask, but here's my question:
I am trying to use the BACKWARD_COMPATIBILITY schema check strategy that forces 
me to update my consumers first, then my producers. I found the flow chart on 
this page on the pulsar website 
<https://pulsar.apache.org/docs/en/schema-understand/>.
Thing is: given an already existing schema on a certain topic which my consumer 
is subscribed to, when I try to update it by deploying the consumer with the 
new schema, Pulsar raises an exception like this one "Exception: Pulsar error: 
IncompatibleSchema". I checked all my configurations and according to the flow 
chart the schema should be updated, what am I doing wrong?

Thanks a lot folks!
P.S. I am using pulsar v2.6.0 with the python client v2.6.0 as well
----
2020-07-27 13:41:00 UTC - Roy Tarantino: @Roy Tarantino has joined the channel
----
2020-07-27 16:39:06 UTC - Addison Higham: @Walter how many ledgers are left? If
you had ledgers that were only available on a single bookie, then the recovery
process will never finish. You can use this command
<https://bookkeeper.apache.org/docs/4.10.0/reference/cli/#bookkeeper-shell-listunderreplicated>
to see the list of under replicated ledgers, then you can this command:
<https://bookkeeper.apache.org/docs/4.10.0/reference/cli/#bookkeeper-shell-ledgermetadata>
on the remaining ledgers to see details of what the ensemble size was. If the
ensemble size is 1, then you likely have lost the ledger and can delete the
ledger (using
<https://bookkeeper.apache.org/docs/4.10.0/reference/cli/#bookkeeper-shell-deleteledger>)
to get back to all ledgers being replicated
----
2020-07-27 16:46:28 UTC - alex kurtser: Hi everyone.
May i know which zookeeper and bookkeeper versions the pulsar 2.6.0 is using ?
----
2020-07-27 16:50:32 UTC - Varghese C: @Shivji Kumar Jha lets get this started
please! :slightly_smiling_face:
+1 : Shivji Kumar Jha
----
2020-07-27 16:54:11 UTC - Shivji Kumar Jha: I am really sorry for the delay,
but i am happy to be reminded :slightly_smiling_face: Got busy securing our
pulsar cluster and this slipped my mind...
----
2020-07-27 16:55:12 UTC - Addison Higham:
<https://github.com/apache/pulsar/blob/v2.6.0/pom.xml#L157> &lt;- you can
always find current versions in the top level pom file
----
2020-07-27 16:55:58 UTC - alex kurtser: :+1:
----
2020-07-27 16:56:02 UTC - alex kurtser: thanks
----
2020-07-27 16:56:36 UTC - Addison Higham: np :slightly_smiling_face:
----
2020-07-27 18:04:15 UTC - Varghese C: Thank you!
----
2020-07-27 18:07:05 UTC - Ryan: Interested in your thoughts: Has anyone
considered decoupling message content (BLOB) data from Pulsar messages, storing
the BLOB data in an external store/repository and simply storing URIs/pointers
to the BLOB data in the Pulsar messages, then lazy-loading the BLOB data
on-demand when the client needs to retrieve it? My thought is to use Bookkeeper
for BLOB storage, such as done at <https://github.com/diennea/blobit>. This
could eliminate complicated existing Pulsar large BLOB chunking strategies
(e.g. subscription limitations, transaction support, etc.), reduces overall
network usage from transmitting BLOB data unnecessarily and ensures Pulsar
messages remain lightweight.
----
2020-07-27 18:34:55 UTC - Addison Higham: I think that is a fairly common
pattern, called the "claim check" pattern, this doc
<https://docs.microsoft.com/en-us/azure/architecture/patterns/claim-check>
talks about it more. I implemented that same idea previously using S3 for the
objects.

As far as using bookkeeper, I think that makes sense but does have a trade-off
in that your client must now be able to communicate directly with bookkeeper.
For that reason, I think it might be a challenge to standardize that, as it
won't work for all situations

What might be an interesting discussion is to see if it is a pattern common
enough to include direct support in the client for offloading and
"re-hydrating" large messages. There actually is support already for something
that can do that, via "interceptors", see
<http://pulsar.apache.org/api/client/org/apache/pulsar/client/api/ProducerInterceptor.html>
and
<http://pulsar.apache.org/api/client/org/apache/pulsar/client/api/ConsumerInterceptor.html>
+1 : Ryan
----
2020-07-27 18:45:22 UTC - Ryan: Yes, the claim check pattern, thank you. I
didn't know about the Interceptors, very interesting, definitely will have to
dig further. My thoughts on supporting Bookkeeper, as a primary option (S3
makes sense too), is because Pulsar already uses Bookkeeper so there would not
be any additional infrastructure to support (in non-cloud environments). If the
Interceptors work as you describe, then you could leverage claim check
capabilities transparently, via a "lazy-load" configuration flag. From the
producer/consumer perspective, it should be transparent. Lazy vs. non-lazy load
would be configurable per use case.
----
2020-07-27 20:29:25 UTC - Bre Gielissen: @Bre Gielissen has joined the channel
----
2020-07-27 22:25:32 UTC - Kalyn Coose: Hey all, what would be a typical use
case for message chunking in Pulsar?
----
2020-07-28 00:39:55 UTC - Thomas O'Neill: @Thomas O'Neill has joined the channel
----
2020-07-28 02:54:27 UTC - Ryan: Messages larger than 5MB that you intend to
send through Pulsar, because you do not have an alternative means of storage
(e.g. S3, NFS, etc.) or your architecture/use cases do not support external
storage access.
----
2020-07-28 04:44:34 UTC - Takahiro Hozumi: Hi, I have updated a pulsar
cluster(5 node of zk, bookie and broker) from 2.5.0, which seems to have a
problem of compaction( `<https://github.com/apache/pulsar/issues/6173>` ), to
2.5.2 just now.
I have a topic of 300GB retained data, which have many duplicated keys.
After updating, a compaction of the topic have started.
And I noticed that brokers became unstable maybe due to the load of compaction.
The brokers disapper and appear repeatedly in the results of `bin/pulsar-admin
brokers list mycluster` .

I think that this unstablity is okay if this is only one time problem, but I am
concering that compaction always affect brokers availability.
it is helpful if a compaction load will be limited to predictable degree
without manual operation.
----
2020-07-28 04:56:27 UTC - Sijie Guo: Did you see another behaviors beside the
broker disappear and appear in `bin/pulsar-admin brokers list` result?
----
2020-07-28 05:02:37 UTC - Takahiro Hozumi: The http service of `brokers:8080`
has been down repeatedly. And here is a result of top command of a node on
which a zk, bookie and broker containers are.
----
2020-07-28 05:39:14 UTC - Luke Stephenson: Thanks @xiaolong.ran. Looking
forward to it
----
2020-07-28 05:44:56 UTC - Kadoi Takemaru: @Kadoi Takemaru has joined the channel
----
2020-07-28 07:06:36 UTC - Sijie Guo: did broker restart?
----
2020-07-28 07:11:25 UTC - Takahiro Hozumi: Yes, according to docker status.
----
2020-07-28 08:13:28 UTC - Takahiro Hozumi: And I noticed that the `msgBacklog`
of `__compaction` subscription has not changed. Is compaction being processed?
I'm thinking that the restart might keep reseting compaction progress over and
over again.
----

Slack digest for #general - 2020-07-28

Reply via email to