2020-08-25 09:32:28 UTC - Evion Cane: Hi guys. I am trying to configure
authentication and authorization in Pulsar to connect PulsarAdmin (Java API) to
it.
I have modified the broker.conf file with the following properties ->
```authenticationEnabled=true,
authenticationProviders=org.apache.pulsar.broker.authentication.AuthenticationProviderToken,
authorizationEnabled=true,
authorizationProvider=org.apache.pulsar.broker.authorization.PulsarAuthorizationProvider,
superUserRoles=admin,
tokenPublicKey=my-public.key```
The problem is when I test authentication and authorization they do not seem to
work. When I try to access tenants from pulsar without authorization as follow:
```PulsarAdmin admin = PulsarAdmin.builder()
.serviceHttpUrl(url)
.build();
List<String> tenants = admin.tenants().getTenants();```
From what I understand from the documentation you need to authenticate to get
the tentants but I still can access the default tenants without authentication.
----
2020-08-25 11:38:43 UTC - Ravi Shah: Hi Guys-
We are facing issue similar to "Consumer stop receive messages from
broker(<https://github.com/apache/pulsar/issues/3131>)".
We are using logstash and input
plugin(<https://github.com/se7enkings/logstash-input-pulsar>).
----
2020-08-25 11:39:05 UTC - Ravi Shah: Any help would be appreciated.
----
2020-08-25 14:43:40 UTC - Joshua Decosta: I don’t have that enabled but it
seems like it is
----
2020-08-25 14:43:50 UTC - Joshua Decosta: The functions are being created as
pods
----
2020-08-25 14:52:09 UTC - Joshua Decosta: Is this running in standalone or is
this deployed in a cloud cluster?
----
2020-08-25 14:53:05 UTC - Joshua Decosta: There seems to be a few missing
configurations regardless. You will need the brokerClient configs as well. It
also doesn’t look like you’re passing a token from your admin calls which would
lead to an error as well
----
2020-08-25 15:00:24 UTC - Evion Cane: I am running it in standalone. I was
testing authentication so I did not put a token o purpose to wait for an error
but it did not throw an error. If I am using java admin API do I need to
configure brokerClient(client.config) also?
----
2020-08-25 15:06:23 UTC - Joshua Decosta: You need to configure all of those
aspects in the standalone conf
----
2020-08-25 15:06:49 UTC - Joshua Decosta: And yes, you need the brokerClient
configs as well since those communicate and authenticate
----
2020-08-25 15:26:43 UTC - Chris DiGiovanni: If you have set a retention of `-1`
and you are offloading to s3. Is there any way that would prevent these
messages from getting deleted? Want to make sure that backlog quotas and or
TTLs can't prevent Pulsar from deleting ledgers from a bookie even when
retention is infinite and your are offloading. Currently my bookie disk space
usage doesn't look right. Pulsar is reporting 54G of ledgers on disk. I have
a 3x replica set and a combined raw storage of 1.3TB. Though I'm roughly
sitting at 1TB of used disk space...
----
2020-08-25 15:27:25 UTC - Chris DiGiovanni: This is running 2.5.2 of Pulsar
----
2020-08-25 15:46:59 UTC - Addison Higham: yeah if you have logs from the
stateful sets. The role your functions are running as by default is the role
you deployed your functions with. That role needs the "functions" permissions
to download the package
----
2020-08-25 15:48:23 UTC - Joshua Decosta: Do these aspects rely on the AuthZ
methods at all?
----
2020-08-25 15:49:30 UTC - Joshua Decosta: Is there no way to configure some
default functions permission to use with function creation
----
2020-08-25 15:52:21 UTC - Addison Higham: you can check which ledgers have been
offloaded with `pulsar-admin topics stats-internal <topic>` and that will
have details on which ledgers have been offloaded
As far as bookkeeper disk usage, see this doc
<https://bookkeeper.apache.org/docs/4.11.0/getting-started/concepts/>,
specifically the "disk compaction" for details, but essentially, bookkeeper
packs multiple ledgers into entry files. When you delete a ledger, you don't
immediately get back space as it has to wait until the entry is above a certain
ratio deleted to be compacted. You can tune those settings as needed
----
2020-08-25 15:55:24 UTC - Addison Higham: it does rely on AuthZ, see
`org.apache.pulsar.broker.authorization.AuthorizationProvider#allowFunctionOpsAsync`
And yes, you can implement another interface to have much more control over
function auth, see
`org.apache.pulsar.functions.auth.KubernetesFunctionAuthProvider`
----
2020-08-25 15:56:54 UTC - Joshua Decosta: Ok, but when creating the function,
does it take my client auth data and use that for the function permissions?
Considering I’m using tokens that will expire that would be a problem for the
functions
----
2020-08-25 15:56:58 UTC - Addison Higham: The way the default implementation
works:
- it snags the token you used to deploy the function
- it stores that token in a kuberentes secret
- the statefulset mounts that secret and uses it to interact with the cluster
If you aren't using token auth, you need to implement your own
`KubernetesFunctionAuthProvider`
+1 : Joshua Decosta
----
2020-08-25 15:59:25 UTC - Joshua Decosta: Is there a way to set the function
creds within config files?
----
2020-08-25 15:59:33 UTC - Joshua Decosta: Similar to the brokerClient, etc?
----
2020-08-25 15:59:57 UTC - Addison Higham:
<https://pulsar.apache.org/docs/en/next/functions-runtime/#kubernetes-functions-authentication>
<- this doc isn't in 2.6.1 yet (we should backport docs to release
branches) but here is the doc on master
+1 : Joshua Decosta
----
2020-08-25 16:00:14 UTC - Joshua Decosta: Thanks
----
2020-08-25 16:07:24 UTC - Addison Higham: that doc adds a lot more context, so
hopefully it helps
I have thought about this problem a fair bit and with the current extension
points I think you have 3 primary options:
1. Implement the KubernetesFunctionAuthProvider, it could easily go to talk
another service to fetch a token and it can updateTokens, but it doesn't have a
built-in way to refresh those automatically. However, you do have access to
modify the stateful set in that API, so adding a sidecar pod that would refresh
tokens and store them in a shared volume would be doable
2. You could also use the
`org.apache.pulsar.functions.runtime.kubernetes.KubernetesManifestCustomizer`
to do much the same and modify the statefulset spec to add in a sidecar or init
container
3. Use the default implementation then have some other task outside pulsar that
refresh tokens by manually updating the secrets. Because the secret is mounted
as a volume, it does get updates. So you could simply have a k8s cron job that
updates the secrets. The biggest downside here is you are still going to have
the restriction of at least the initial token being the token that was used to
upload the function
----
2020-08-25 16:18:25 UTC - Joshua Decosta: Is it not possible to configure the
clientAuth* keys in the function-worker to allow for a default token setup used
with the functions?
----
2020-08-25 16:43:55 UTC - Ming: Neither does it support topic regex because of
restriction of url formatting.
----
2020-08-25 17:04:36 UTC - Evion Cane: Ok, thank you. I will try that.
----
2020-08-25 17:05:08 UTC - Joshua Decosta: I’ve managed to get this working in
standalone before so feel free to reach back here
----
2020-08-25 18:20:05 UTC - Tim Corbett: In trying to minimize unnecessary
cross-rack/region traffic to minimize bandwidth costs, does the broker have any
mechanism to prefer reads from its local bookie? Does the broker even have a
notion of which rack/region it is in?
----
2020-08-25 19:54:36 UTC - Addison Higham: @Tim Corbett that is possible using
`bookkeeperClientReorderReadSequenceEnabled` and by enabling a
rackAware/regionAware placement policy: see
<https://github.com/apache/pulsar/pull/3171> and the original issue for more
details
+1 : Tim Corbett
----
2020-08-25 20:03:20 UTC - Aaron: What draft of Json schema does Pulsar use?
----
2020-08-25 20:57:25 UTC - Addison Higham: I think it might be possible if you
overwrite the start command, but I am not completely sure if there would be a
more fine grained way of doing it. See
<https://github.com/apache/pulsar/blob/3c5a4232461720b657c6be16f3d789d410da243b/pulsar-functions/runtime/src/main/java/org/apache/pulsar/functions/runtime/kubernetes/KubernetesRuntime.java#L236>
for where that command is built
----
2020-08-25 20:58:12 UTC - Joshua Decosta: Is the function-worker.conf only
meant for standalone?
----
2020-08-25 21:04:56 UTC - Addison Higham: no, it gets loaded by the
functions-worker regardless of how the functions-worker is run.
However, it is important to remember that functions executed in their own pod
*do not* run the function-worker. It is a bit poorly named, but is an outgrowth
of when functions started first with modes that either were in VM or forked a
VM.
With kubernetes, you might think of it as "function-coordinator" as all it does
is basically manage the creation and deletion of stateful sets.
----
2020-08-25 21:05:51 UTC - Pushkar Sawant: Is anyone running pulsar-manager with
JWT authentication? I am getting this error when i try to list tenants on a
2.6.0 cluster.
```<html>
<head>
<meta http-equiv="Content-Type" content="text/html;charset=utf-8"/>
<title>Error 401 Authentication required</title>
</head>
<body><h2>HTTP ERROR 401</h2>
<p>Problem accessing /admin/v2/tenants/public. Reason:
<pre> Authentication required</pre></p><hr><a
href="<http://eclipse.org/jetty>">Powered by Jetty://
9.4.20.v20190813</a><hr/>
</body>
</html>```
+1 : Ryan Nowacoski
----
2020-08-25 21:06:56 UTC - Joshua Decosta: If I don’t have “functionAsPods”
enabled. How should the pods operate?
----
2020-08-25 21:07:14 UTC - Joshua Decosta: Cause i don’t have it enabled but i
keep seeing function pods showing up
----
2020-08-25 21:07:30 UTC - Addison Higham: what is the way they are named?
----
2020-08-25 21:07:55 UTC - Joshua Decosta: I’ll have to respond next week to
that cause i destroyed my cluster
----
2020-08-25 21:08:12 UTC - Addison Higham: intentionally I hope :wink:
----
2020-08-25 21:08:26 UTC - Joshua Decosta: For sure
----
2020-08-25 21:08:49 UTC - Joshua Decosta: Currently trying to determine best
way to run those token scripts during a cluster config
----
2020-08-25 21:09:38 UTC - Addison Higham: oh it looks like both `functions` and
`functionsAsPods` both configure the functions worker to execute functions as
pods
----
2020-08-25 21:09:50 UTC - Addison Higham: (I haven't worked with the chart a
ton recently)
----
2020-08-25 21:10:31 UTC - Addison Higham: they should be created like
`pf-<functionname>-<parallel number>`
----
2020-08-25 21:10:48 UTC - Joshua Decosta: That looks like the naming convention
i did see
----
2020-08-25 21:11:09 UTC - Joshua Decosta: Is that for when they aren’t running
as pods?
----
2020-08-25 21:12:08 UTC - Addison Higham: they are running as pods, I was just
mistaken, `functionsAsPods: true` and `functions: true` configure it the exact
same way now to both have them run as pods
----
2020-08-25 21:13:21 UTC - Joshua Decosta: Oh, is that on purpose?
----
2020-08-25 21:15:54 UTC - Addison Higham: yes
----
2020-08-25 21:16:25 UTC - Addison Higham: we basically assume that if you want
to run functions on k8s, you want to run as pods, otherwise it runs the
functions as processes in the broker, which isn't ideal
----
2020-08-25 21:17:06 UTC - Joshua Decosta: That makes sense although the custom
class is tossing a wrench into my function stuff
----
2020-08-25 21:17:24 UTC - Joshua Decosta: Every time i think I’ve finished my
classes i then need to implement another class
----
2020-08-25 21:18:08 UTC - Tim Corbett: In trying to validate some performance
characteristics of the .Net pulsar client, we're comparing against the
pulsar-perf tool. However, we're seeing results that don't match, and one
large difference is the pulsar-perf tool does not seem to set/vary its routing
keys, so all messages produced are going to only one consumer, even when
setting the consumers to key-shared mode. Is there a way to force it to set
random keys, or some other way to validate performance with a key routed
workload?
----
2020-08-25 21:29:31 UTC - Addison Higham: I assume you are talking about the
the performance producer? it does not set any keys, but does use round-robin,
meaning it will spread load evenly across partitioned topics.
With a consumer against partitioned topics, that is actually multiple consumers.
I would assume what you need to do is enable round-robin in your sending mode
----
2020-08-25 21:33:04 UTC - Tim Corbett: Specifically talking about the included
pulsar-perf tool, and yes, in producer mode. Round-robin for partitions is
fine and all, but what we also need is random or incrementing routing keys so
KeyShared subscriptions can route to all available consumers?
----
2020-08-25 21:33:32 UTC - Tim Corbett: For any given partition, if we have say
200 consumers connected, currently all messages are going to only one of them.
----
2020-08-25 21:36:47 UTC - Addison Higham: pulsar-perf does not currently have a
way to set keys on messages, but I am sure it would be a useful feature!
If you wanted to open an issue for it that would be wonderful. If you are
interested in contributing it, it should be fairly straight forward, we already
just generate a payload, it would make sense to have an argument to just hash
the payload to set a key
----
2020-08-25 21:37:37 UTC - Tim Corbett: Alright, just wanted to make sure it
wasn't something I was missing. Thanks!
----
2020-08-26 06:14:45 UTC - Guillaume Audic: Hi, we are faced with an issue with
event processing.
We have a batch job which produces a lot of messages (thousand of messages) to
a topic and a scaled microservice which consume.
To trigger an another event we have to know if all of the messages of the batch
job have been consumed.
I found this approach with Kafka:
<https://medium.com/wix-engineering/6-event-driven-architecture-patterns-part-2-455cc73b22e1>
in the subpart 6.
I'm looking for something similar with Pulsar, but I don't know if it exists
We are thinking to use a KV store to store the state of the job (id of job,
number of messages produces, number of messages consumes, status)
When producer start to produce events, he puts the number of produced messages
in the kv store
The consumer increments number of consumed message in the kv store
If we have a number of producted messages equal to consumed messages then we
set the status in the KV store to complete.
If there is no native approach with pulsar, can you give me some advices ?
----
2020-08-26 07:04:28 UTC - Julius S: A (possible?) generic solution: Instead of
storing external state, put the state incrementally into each message. If your
producer knows enough about the batch it is working on then the producer could
place some metadata into each message about job_id and the number of the
message within the job. Eg. (job_id xyz, msg 3/10), (job_id xyz, msg 4/10)...
Your consumer can publish to another topic to update its status on how the job
is going. Once consumer hits msg with 10/10 it can act accordingly.
Re: your q about native solution the Pulsar functions framework and the state
store it offers via Pulsar’s own bookkeeper is definitely something to look at
if you really need to store state.
Other ideas: Pulsar also has message chunking so depending on the size of the
batch you might actually be able to put the entire job into a single message.
Also, Pulsar’s individual message ack (instead of just high water mark like
Kafka) will help you with a cleaner solution to a queuing task like this.
----
2020-08-26 07:35:32 UTC - Olivier Chicha: @Addison Higham No, sorry I am kind
of overloaded. Now that we have identified the issue we had we have implemented
a workaround and I had to move on something else.
regards,
Olivier
----