Slack digest for #general - 2020-05-22

Apache Pulsar Slack Fri, 22 May 2020 02:19:31 -0700

2020-05-21 10:05:18 UTC - fenghao007: @fenghao007 has joined the channel
----
2020-05-21 11:51:47 UTC - Ermir Zaimi: hi, i configured pulsar standalone with 
JWT according to documentation. but we get http 401 unauthorized on strating 
the standalone pulsar service. any suggestions
----
2020-05-21 12:20:53 UTC - Raman Gupta: @Patrik Kleindl I don't believe its 
about treating Slack as a "free support" channel. In a healthy Slack community, 
the community owner(s) get as much out of the interactions as users do, which 
is also why I don't believe scaling up here for Pulsar is that big a challenge. 
As one counter-example, the Kotlin Slack community is far far larger than the 
Confluent Slack community, and it generally works amazingly well. Don't forget, 
its Confluent that is raising "size of community" as a comparison point in 
their favor. My point is that "health of community" != "size of community".
----
2020-05-21 12:51:45 UTC - Patrik Kleindl: @Raman Gupta As I said I am not a 
Confluent employee nor do I agree with everything they say or do. I just do not 
agree with your statement about the Kafka community.
The scalability issue I see is that currently the most senior Pulsar developers 
have time to do community work which is great but kinda limited as soon as they 
have to do paid work elsewhere or the volume simply exceeds their capacity.
----
2020-05-21 12:55:34 UTC - Raman Gupta: And like I said, community is not just 
about the unwashed masses begging for help from a few senior devs. Lets agree 
to disagree.
----
2020-05-21 13:19:08 UTC - Hiroyuki Yamada: Hi, I’ve asked a question before 
about backup in Pulsar and knew there was no backup solution except for (Auto) 
Recovery.
I really feel we need some snapshot of bookie data as discussed 
<https://github.com/apache/pulsar/issues/4942|here> and I’m wondering backing 
up (for example doing `rsync` ) closed ledgers could be a feasible solution.
Does anyone know about this ?
I’m also wondering how people run and operate Pulsar in production without 
backup. What if a disk of one of the nodes is broken and needs to be replaced ?
It would be great if anyone can help me. Thanks.
----
2020-05-21 14:24:58 UTC - Deepa: Hi @Sijie Guo,

As mentioned above, even without changing any properties on keepAlive (both and
broker and client side), connections get closed after 60 seconds and a new
connection is established automatically and it is used for futher
produce/consume messages. Is there an option to keep the connections alive for
a given time period, as in JMS?
(I see this happening only when i pause above program with debug mode).
Attaching the program used and log here
----
2020-05-21 16:03:20 UTC - Sijie Guo: If you pause the program, it means that
JVM will stop. It means the client will stop sending keep-alive messages.
----
2020-05-21 16:04:58 UTC - Sijie Guo: Did you see a problem if you didn’t pause
the program?
----
2020-05-21 16:48:36 UTC - David Kjerrumgaard: I have a working standalone
Docker image here with JWT security enabled that you can review to your
configuration.
<https://github.com/david-streamlio/pulsar-in-action/tree/master/docker-images/pulsar-standalone>
----
2020-05-21 16:49:53 UTC - David Kjerrumgaard: Data is automatically replicated
inside the BookKeeper layer itself. Therefore, you have multiple copies of the
same data available even in the event of disk or even Bookie failure.
----
2020-05-21 16:51:08 UTC - David Kjerrumgaard:
<https://pulsar.apache.org/docs/en/concepts-architecture-overview/#ledgers>.
"A ledger is an append-only data structure with a single writer that is
assigned to multiple BookKeeper storage nodes, or bookies. Ledger entries are
replicated to multiple bookies."
----
2020-05-21 16:52:33 UTC - Ermir Zaimi: thanks i will look at it
----
2020-05-21 17:41:02 UTC - Matt Mitchell: I’m investigating a way to implement
request/reply using Pulsar, where a producer sends a request and consumers are
subscribed via an exclusive subscription (only 1 consumer needs to “reply”).
Right now, a consumer replies to a “replies” topic and subscribers of that
topic do so via Shared subscription (every reply consumer receives the reply).
What I’d like to do instead, is change the reply behavior so that the
server/jvm that sent the request, is the only reply consumer that receives the
reply. Is that possible? Are there any examples of the request/reply pattern
implemented using Pulsar?
----
2020-05-21 17:52:53 UTC - Addison Higham: two ideas:
1. if you don't have that many unique hosts/processes/etc, it may not be that
insane to have a reply topic per host/processes. Topics in pulsar are *fairly*
cheap, having a few thousand really isn't too big of a deal. Topics can also be
transient, if there aren't active subscriptions/producers and no retained
messages, topics can be configured to be automatically deleted. AFAIK, this is
pretty much how this works in rabbitmq with transient reply topics. If you have
potentially tens of thousands of topics this may get scary
2. there was talk recently (not finding it right away) of implementing a server
side filter of keys for subscriptions. That would allow you to have a single
reply *topic* but an exclusive subscription filtered just to a given key. I
think that would be ideal, but once again, not yet implemented.
----
2020-05-21 17:56:13 UTC - David Kjerrumgaard: You could have the producers each
subscribe to their own second "control" topic, e.g.
`<persistent://my-tenant/my-ns/producer-1-control-topic>` Then you have each
producer embed the name of their control topic inside of the message properties
that they send. The consumer can read the properties to get the control topic
name and publish a response directly to that topic (which only that particular
producer is subscribed to)
----
2020-05-21 17:57:25 UTC - David Kjerrumgaard: That is more of the "return
address" pattern, but I think it would meet your requirements (if I understand
them correctly)
----
2020-05-21 18:27:54 UTC - Matt Mitchell: Ok that sounds very straight forward
(each requesting server subscribing to its own dedicated topic). This system
will never have more than 10-20 requesting servers, so it should work fine.
Time to give it a try. Thank you both!
----
2020-05-21 18:29:53 UTC - Matt Mitchell: Actually, one question… how do I
configure topics to be transient?
----
2020-05-21 19:06:57 UTC - Franck Schmidlin: This blog from @Kirill Merkushev
has proven super useful in writing automated integration tests around my first
pulsar function. Thank you!

<https://lanwen.ru/posts/pulsar-functions-how-to-debug-with-testcontainers/|https://lanwen.ru/posts/pulsar-functions-how-to-debug-with-testcontainers/>
slightly_smiling_face : Patrik Kleindl, Kirill Merkushev, David Kjerrumgaard
clap : David Kjerrumgaard, Karthik Ramasamy
ok_hand : Konstantinos Papalias
----
2020-05-21 20:05:37 UTC - Kirill Merkushev: Glad to be helpful! :)
----
2020-05-21 20:44:14 UTC - David Kjerrumgaard: @Matt Mitchell what do you mean
by transient?
----
2020-05-21 21:04:17 UTC - Matt Mitchell: Base on what base on what @Addison
Higham mentioned here:
<https://apache-pulsar.slack.com/archives/C5Z4T36F7/p1590083573316300> -
curious to know how topics can be auto-deleted if there are no active
consumers/producers.
----
2020-05-21 21:07:24 UTC - Jeff Schneller: Completely new to Pulsar myself but
could the producer put the topic that the consumer should respond to in the
message? The topic name could be a guid so it is unique. Then the consumer
does what it needs to do an replys to the topic that the producer said to reply
on. You would need to do some topic cleanup or auto-delete if no messages over
a certain period of time ( I think that is a possibility).
----
2020-05-21 21:12:00 UTC - David Kjerrumgaard: Ah, so configuring the topics to
get deleted if no subscriptions are available?
----
2020-05-21 21:14:06 UTC - David Kjerrumgaard: The default behavior is to delete
topics without any data or active subscriptions, and is controlled by the
`brokerDeleteInactiveTopicsEnabled=false` property in the broker.conf file.
----
2020-05-21 21:14:22 UTC - Patrik Kleindl: @David Kjerrumgaard Thanks for the
<http://lenses.io|lenses.io> article, but from what I understand the problems
described would not change with Pulsar. A complex processing topology is a
challenge, and Pulsar Functions are not comparable to Kafka Streams.
----
2020-05-21 21:16:41 UTC - David Kjerrumgaard: Sure, I just intended to
demonstrate that there are several "horror" stories with Kafka out there that
can be used to counter the "Kafka" is easy narrative that Confluent is
spreading. Having a long history is a double-edged sword. It make your product
more "mature" but also exposes more weaknesses over time
----
2020-05-21 21:17:23 UTC - David Kjerrumgaard: "It got to the point where the
CEO would be asking whether it was a Kafka issue every time there was a problem
with the data flow. “In 99% of the cases, the answer was yes,” Schipka said."
my favorite quote
----
2020-05-21 21:19:43 UTC - Raman Gupta: However, they then went on to say that
Lenses helped them figure out they were doing it wrong in the first place, so
I'm not sure that in particular is the best example. That being said, its
pretty much the same story at our startup with Kafka.
+1 : David Kjerrumgaard
----
2020-05-21 21:19:55 UTC - David Kjerrumgaard: While
<http://lenses.io|lenses.io> solved this particular case, there are dozens of
R/T pipelines with similar struggles due to Kafka. There solution was to go
with a managed service offering because Kafka was too complex to manage
themselves, and use K8s.
----
2020-05-21 21:21:25 UTC - David Kjerrumgaard: Definitely not the best generic
example, but one that a lot of people can relate to on some level when working
with Kafka which is has reliability and scalability issues.
+1 : Raman Gupta
----
2020-05-21 21:24:24 UTC - Raman Gupta: They talked about seeing how complex
Kafka streams made their topology through lenses. It was a bit naiive on their
part. Didn't they see the tens of intermediate topics Kafka creates for any
mildly complex stream? Its kind of crazy.
----
2020-05-21 21:25:07 UTC - Raman Gupta: This tool is great for seeing how
ridiculous things get: <https://zz85.github.io/kafka-streams-viz/>
grinning : David Kjerrumgaard
----
2020-05-21 21:28:56 UTC - Patrik Kleindl: We had anvery similar situation, but
our problem was not with Kafka as a platform which was really stable but the
complexity of stream processing which I doubt is much better in other tools.
And yes, what Raman just mentioned helps to visualize things. Better visibility
than that is usually the domain of commercial products.
----
2020-05-21 21:30:31 UTC - Raman Gupta: I've had tonnes of issues with Kafka as
a platform. I believe I've reported close to 10 issues to the Kafka project in
the last year.
----
2020-05-21 21:31:11 UTC - Tanner Nilsson: I'm trying to create a python
function using the REST API, but I can't figure out how to do it and send a
local `.py` file the way you can with pulsar-admin....
----
2020-05-21 21:31:14 UTC - Tanner Nilsson: With pulsar-admin I would do
```bin/pulsar-admin functions create \
--tenant &lt;tenant&gt; \
--namespace &lt;namespace&gt; \
--name &lt;function_name&gt; \
--py &lt;path_to_py_or_zip&gt; \
--className &lt;className&gt;
--inputs &lt;inputs&gt; \
--output &lt;output&gt;```
with the REST API, I've done it with
```curl -X POST \
-H "Authorization: Bearer &lt;token&gt;" \
-F functionConfig='{
"tenant":"&lt;tenant&gt;",
"namespace":"&lt;namespace&gt;",
"className":"&lt;className&gt;",
"runtime":"PYTHON",
"inputs":"&lt;inputs&gt;",
"output":"&lt;output&gt;};type=application/json' \
-F url='http://&lt;url_to_file&gt;;type=application/text' \

http://&lt;pulsar_host&gt;/admin/v3/functions/&lt;tenant&gt;/&lt;namespace&gt;/&lt;function_name&gt;```
but that only works if you can provide a URL where the file can be downloaded.
Can the REST API be used for a local file (local to where the POST is
originating, not on the broker/function-worker)?
----
2020-05-21 21:31:54 UTC - Raman Gupta: They've fixed quite a few of them, and
any complex system has bugs, but still, dealing with its quirks and odd
behaviors, even short of bugs, is not easy.
----
2020-05-21 21:32:02 UTC - Patrik Kleindl: We had questions like from the CEO
above, and of course the first suspect was usually Kafka. Turned out more often
it was misconfigured OSes, triple mirrored storage systems and lots of other
reasons.
----
2020-05-21 21:33:01 UTC - Raman Gupta: Isn't that the point though @Patrik
Kleindl? Any system that requires that level of dedication to its
infrastructure can't claim to be easy to manage.
----
2020-05-21 21:33:44 UTC - Patrik Kleindl: I have reported issues and helped fix
some of them too. It‘s still a community project :wink:
----
2020-05-21 21:34:10 UTC - Raman Gupta: I think I remember seeing a comment from
you on one of my issues, IIRC :slightly_smiling_face:
----
2020-05-21 21:36:28 UTC - Raman Gupta: IMO, a tool like Pulsar/Kafka should
strive to fail fast or demonstrate poor performance in the face of
infrastructure issues or misconfigurations. Kafka unfortunately more often than
not just blows up in weird and crazy ways.
----
2020-05-21 21:36:48 UTC - Patrik Kleindl: I bet Pulsar or BK can be wrecked by
the same things. There‘s no free lunch.
----
2020-05-21 21:37:10 UTC - David Kjerrumgaard: Well, I am glad you are using
your experience to help the Pulsar community !
----
2020-05-21 21:37:22 UTC - Raman Gupta: I haven't used Pulsar as much as Kafka,
but so far its been rock-solid in comparison to Kafka.
----
2020-05-21 21:37:40 UTC - Raman Gupta: (Once the initial setup was done, which
admittedly, wasn't easy)
----
2020-05-21 21:38:08 UTC - David Kjerrumgaard: Helm chart didn't work? ^^^
----
2020-05-21 21:38:37 UTC - Raman Gupta: I had issues with the initial k8s setup
but I was using the obsolete templates in the pulsar repo, not the helm chart.
----
2020-05-21 21:39:37 UTC - Raman Gupta: I had one issue the other day that I
thought was Pulsar's fault, but it turned out, no, a Kafka consumer reset its
offsets for no particular reason and wrote a bunch of stuff to Pulsar that it
shouldn't have.
----
2020-05-21 21:39:52 UTC - Patrik Kleindl: @David Kjerrumgaard There are still
so many companies which run on prem and without k8s. And running k8s without
dedication won‘t help wirh Pulsar or Kafka :upside_down_face:
----
2020-05-21 21:40:34 UTC - Raman Gupta: I run both Pulsar and Kafka on k8s (with
dedication), so I'm comparing apples to apples).
----
2020-05-21 21:43:13 UTC - David Kjerrumgaard: @Patrik Kleindl While that is
definitely true, the overwhelming trend I am seeing these days is to migrate as
much as possible to the cloud. Even the traditional on-prem software vendors
have moved to cloud-based offerings due to customer demand.
----
2020-05-21 21:51:44 UTC - Patrik Kleindl: The vendors yes, but at least here in
Europe customer adoption is slow.
----
2020-05-21 21:59:23 UTC - David Kjerrumgaard: Why is that?
----
2020-05-21 22:01:03 UTC - David Kjerrumgaard: There are also a lot of BYOK8s
solutions now as well that allow you to run your own K8s environment , such as
<https://gravitational.com/gravity/docs/>.
----
2020-05-21 22:03:51 UTC - Patrik Kleindl: FUD regarding the cloud, mainly from
GDPR and corporate policies having your data with american companies and lots
of old-school on-prem operations with reluctance to change.
And it doesn‘t help if you only run your streaming infrastructure in the cloud,
your applications and other stuff have to move too.
----
2020-05-21 22:23:37 UTC - Greg Methvin: Did you try `-F
<mailto:'[email protected]|'[email protected]>'`?
----
2020-05-21 22:24:59 UTC - Luke Stephenson: @Matteo Merli Thanks for looking
into this. Here are the broker logs during startup.
----
2020-05-21 22:25:35 UTC - Matteo Merli: There you go :slightly_smiling_face:

```2020-05-21T06:21:19.620Z,i-0f0dcca57497394e9,pulsar-all,[conf/broker.conf]
Applying config managedLedgerDefaultWriteQuorum = 3
2020-05-21T06:21:19.620Z,i-0f0dcca57497394e9,pulsar-all,[conf/broker.conf]
Applying config managedLedgerDefaultEnsembleSize = 3
2020-05-21T06:21:19.620Z,i-0f0dcca57497394e9,pulsar-all,[conf/broker.conf]
Applying config managedLedgerDefaultAckQuorum = 2```
----
2020-05-21 22:25:54 UTC - Matteo Merli: `managedLedgerDefaultWriteQuorum = 3`
and `managedLedgerDefaultAckQuorum = 2`
----
2020-05-21 22:26:19 UTC - Matteo Merli: that's what I was suspecting
----
2020-05-21 22:26:59 UTC - Greg Methvin: Interestingly the functions API doesn’t
appear to be documented here:
<http://pulsar.apache.org/admin-rest-api/?version=2.5.1>
----
2020-05-21 22:27:05 UTC - Matteo Merli: I'd suggest to change ensembleSize and
writeQuorum to 2, matching the ack quorum
----
2020-05-21 22:27:36 UTC - Greg Methvin: I’m not sure how I figured out the
parameter I needed was named `data`, but I have code that uses that name so I
guess that’s right.
100 : Tanner Nilsson
----
2020-05-21 22:27:42 UTC - Matteo Merli: that will avoid the BK client to
accumulate messages in memory, when 1 of the bookies is slow/timing out
----
2020-05-21 22:28:58 UTC - Matteo Merli: If these values are the default on the
helm chart.. then the helm chart should be fixed ASAP :slightly_smiling_face:
----
2020-05-21 22:34:50 UTC - Matteo Merli: @Luke Stephenson
<https://github.com/apache/pulsar-helm-chart/pull/13>
----
2020-05-21 23:45:14 UTC - Hiroyuki Yamada: @David Kjerrumgaard Thank you. Sorry
my explanation was not enough.
Yes, I know the data in bookie is replicated but it is another thing.
I’m wondering how easily I can recover bookie node to keep it is fully
replicated.

For example, 3 replicas in bookie nodes and 1 node is crashed and unfortunately
lost all the date, so now there are only 2 replicas. How do you recover to a
state where 3 replicas are fully replicated ?

I think there are usually multiple ways to do in in distributed data management
systems such as;
1. Bring data from other nodes (like Bookie (Auto) recovery)
2. Use backup/snapshot to restore to a certain point then bring data from other
nodes.
As far as I investigated and used Pulsar so far, option 1 is only supported and
option 2 is not.

So, my first question is everyone does option 1 for such case ?
If there are other options (except for geo-replication since it’s another story
again), I would like to know.

My second question is, can we utilize closed ledgers as kind of realizing
option 2 with the current implementation since ledger data is immutable ?
----
2020-05-21 23:48:19 UTC - David Kjerrumgaard: I am only aware of people using
option #1, and relying on the BookKeeper
<https://bookkeeper.apache.org/docs/latest/admin/autorecovery/|auto-recovery
feature> to self-heal.
+1 : Hiroyuki Yamada
man-bowing : Hiroyuki Yamada
----
2020-05-22 00:31:35 UTC - Luke Stephenson: Thanks. I'll give the config
changes in your PR a go
----
2020-05-22 00:52:04 UTC - Matt Mitchell: Thanks @David Kjerrumgaard!
----
2020-05-22 01:26:46 UTC - Luke Stephenson: Seems to have made a huge difference
to stability.
+1 : Matteo Merli
----
2020-05-22 01:46:43 UTC - Matteo Merli: There are few problems in backing up
data for bookies:

1. If the traffic is non-trivial, the backup system needs to be very
performant.. as in a "log storage system"... otherwise it might not be able to
keep up
2. There are 2 parts to back up: the data and the metadata. It's not easy to
taken an "atomic" snapshot of the 2 and reconstruct a consistent view.
----
2020-05-22 02:41:05 UTC - snowcrumble: @snowcrumble has joined the channel
----
2020-05-22 02:41:13 UTC - Sijie Guo:
<http://pulsar.apache.org/functions-rest-api/?version=2.5.1>
----
2020-05-22 02:41:28 UTC - Sijie Guo: The function endpoints was separated to a
separate swagger file.
----
2020-05-22 03:53:28 UTC - Hiroyuki Yamada: @David Kjerrumgaard Thank you.

@Matteo Merli Thank you for the reply.
Ok, backing up ledger immutable data seems not very easy to do due to 2nd
problem.
Hmm, do we have any plans to support such atomic snapshot ? (it doesn’t seem to
be too difficult)
Or do you think the current recovery with (auto) recovery is good enough ?
----
2020-05-22 04:01:03 UTC - Matteo Merli: &gt; Hmm, do we have any plans to
support such atomic snapshot ? (it doesn’t seem to be too difficult)
Oh, it's a very difficult problem! :slightly_smiling_face:

In past we gave a it thought to implement a kind of "rollback" operation to
protect against accidental data deletion operations. (eg: rollback a topic to
the same exact state where it was 1h ago), though that's a slightly different
goal.

&gt; Or do you think the current recovery with (auto) recovery is good enough ?
To protect against node failures, yes. There are other variants of this
approach that we are playing with, though these are more geared towards cloud
deployments where disks can be lost more frequently.
----
2020-05-22 04:34:10 UTC - Hiroyuki Yamada: @Matteo Merli
&gt; Oh, it’s a very difficult problem!
Oh, excuse me, I wasn’t really sure about it. Since it is a in-node issue, I
thought it seems relatively less complex. (like by taking some locks or
something, which might kill the performance)
Anyways, I got it. Thank you.

&gt; To protect against node failures, yes. There are other variants of this
approach that we are playing with, though these are more geared towards cloud
deployments where disks can be lost more frequently.
You mean `node failures` includes node failure due to disk failures ?
I’m planning to use it in cloud environment and will store possibly lots of
data at least for several years so concerning that auto recovery possibly can’t
catch up.
----
2020-05-22 04:35:34 UTC - Matteo Merli: &gt; Since it is a in-node issue,
No, the metadata (pointing to these ledgers) is kept in ZooKeeper.

Even if we restore the data in new node, we need to update the metadata to make
it point to new node.
man-bowing : Hiroyuki Yamada
scream : Hiroyuki Yamada
----
2020-05-22 04:35:57 UTC - Matteo Merli: &gt; I’m planning to use it in cloud
environment and will store possibly lots of data at least for several years so
concerning that auto recovery possibly can’t catch up.
If you use EBS like storage volumes, you shouldn't be worrying about it.
----
2020-05-22 04:36:44 UTC - Matteo Merli: In the sense that the disk itself is
already replicated, so you can always restart a new container/VM and mount that
same volume.
----
2020-05-22 04:42:40 UTC - Hiroyuki Yamada: OK, thank you very much for your
support.
----
2020-05-22 05:56:32 UTC - Deepa: I dont see it when i run it normally with a
Thread.sleep(90*1000) for wait.
So with this if the client is idle and not producing any messages, the
connection is still intack and doesn't get terminated. But if the client goes
to hung state, current connection will be terminated at broker and whenever the
client comes back a new connection is established automatically (i didn't have
to create a new connection or the program didnt terminate, messages were
produced using the same client object). Please correct if my understanding is
wrong here?
----
2020-05-22 07:54:01 UTC - VanderChen: I have set keyBasedBatcher as follows but
it still doesn't work.
```producer = client.newProducer()
.batcherBuilder(BatcherBuilder.KEY_BASED)
.enableBatching(true)
.topic("my-topic")
.create();```
----
2020-05-22 08:28:54 UTC - Ken Huang: Hi, I want to do
<https://www.splunk.com/en_us/blog/it/geo-replication-in-apache-pulsar-part-1-concepts-and-features.html|synchronous
geo-replication>. I set this in the broker
```bookkeeperClientRegionawarePolicyEnabled: "true"
bookkeeperClientReorderReadSequenceEnabled: "true"```
Do I need to set-bookie-rack for bookie?
----

Slack digest for #general - 2020-05-22

Reply via email to