Slack digest for #general - 2020-09-17

Apache Pulsar Slack Thu, 17 Sep 2020 02:11:42 -0700

2020-09-16 13:50:49 UTC - Marcio Martins: Hey guys, is it possible (in Java) to 
consume/read from a topic with schema, without knowing the schema, and then 
post it back to another topic with the original schema? i.e. I want to do a 
sort of join operator, but I would like it to be as generic as possible and 
only look at the message metadata without (de)serialization of the data.
----
2020-09-16 14:02:22 UTC - Ebere Abanonu: for consumers there is 
`AutoConsumeSchema`  and for producers something like `AutoProduceBytesSchema`
----
2020-09-16 14:02:59 UTC - Sankararao Routhu: Hi, When synchronous replication 
enabled between two clusters(west and east), is it possible to assume two 
brokers as the owners for a given topic. I will have my publisher client in 
active-active in west and east, and my publisher connects to same topic from 
both west and east.  Can my publisher in west connect to west broker and my 
publisher in east connect to east broker with sync replication? We chose sync 
replication to avoid duplicate messages on consumer side between west and east
----
2020-09-16 14:35:10 UTC - Marcio Martins: thanks, I will have a look at it 
later,
----
2020-09-16 14:55:29 UTC - Marcio Martins: Is there a way to get list of topics 
in a namespace through the Java API? LookupService is not part of it :[
----
2020-09-16 14:58:35 UTC - Evan Furman: Hi guys, anything else we can check 
here? Is it because we’re using `public/default` potentially?
----
2020-09-16 15:43:37 UTC - Addison Higham: no, a topic may be owned only by a 
single broker.


You could do something somewhat like this perhaps involving pulsar functions 
and tenant isolation. Essentially, you would have topic-east and topic-west and 
use the isolation features to ensure those topics only live on east and west 
brokers, then pulsar functions could be used to replicate messages written from 
east producers to west topic and vice versa.
----
2020-09-16 15:43:43 UTC - Andy Walker: @Andy Walker has joined the channel
----
2020-09-16 15:48:18 UTC - Andy Walker: I am a full-time Go developer and I like 
the idea of Pulsar a lot, and I am excited to start using it, but I was looking 
into Pulsar Functions, and it seems the Go SDK is _really_ feature-poor. Is 
there much point to using it in its current state as opposed to just using a 
client? Are functions mainly intended for zippy transforms, or are they more 
general than that? Who would be a good person to talk to about bringing the Go 
SDK into parity with the others?
----
2020-09-16 16:01:53 UTC - Addison Higham: oh hey, I just tweeted at you to join 
here, looks like you are on it already
----
2020-09-16 16:02:33 UTC - Andy Walker: If I'm going to complain, I should be 
prepared to follow through
the_horns : Rich Adams
----
2020-09-16 16:02:35 UTC - Andy Walker: so here I am.
----
2020-09-16 16:02:55 UTC - Andy Walker: Only fair.
----
2020-09-16 16:04:22 UTC - Addison Higham: heh, no problem, the go runtime is 
pretty behind the java and python runtimes, totally reasonable feedback, you 
following through with being willing to help work on it is awesome 
:slightly_smiling_face:
----
2020-09-16 16:08:57 UTC - Addison Higham: so some background on functions:
- it is pretty much a lightweight wrapper over the SDK, so adding new 
functionality is pretty much just improving the little runtime wrapper
- functions are mostly for what we call "lightweight transformations", it isn't 
intended to be a full fledged processing engine like flink or ksql/kstreams. If 
you need that, we recommend flink
- that said, we still think there are a lot of use cases that should be 
supported in functions like transformations, fan-out to multiple topics, small 
stateful transformations (though that is in an alpha state, not recommended for 
production right now)
- functions are also the building block for pulsar "connectors" which currently 
are all implemented in java, but there isn't a reason that couldn't change in 
the future
----
2020-09-16 16:09:50 UTC - Andy Walker: &gt; - functions are mostly for what we 
call "lightweight transformations", it isn't intended to be a full fledged 
processing engine like flink or ksql/kstreams. If you need that, we recommend 
flink
That's what  I needed to hear
----
2020-09-16 16:10:01 UTC - Andy Walker: At least in the short term
----
2020-09-16 16:10:28 UTC - Addison Higham: the biggest reason why you might want 
to use functions over a client is that pulsar takes care of running the 
function either inside the broker or on a pool of deployed functions, or on 
kubernetes, which is where functions really shine
----
2020-09-16 16:10:38 UTC - Addison Higham: what is your use case?
----
2020-09-16 16:13:07 UTC - Andy Walker: I have a stream of incoming JSON 
documents from various sources that users need to be able to run matching, 
transforming and simple enrichment workflows on.
----
2020-09-16 16:15:37 UTC - Andy Walker: so, for example, one workflow might say 
"alert me immediately on anything on the foo topic that matches this 
jsonpath/jmespath rule" while another might say "if something on the bar topic 
has 'baz' contained in some.collection, enrich the document with a query from 
this data source and then pass it on to another topic or alert on it"
----
2020-09-16 16:16:45 UTC - Andy Walker: And I need to design it such that they 
can set up these workflows themselves (more or less). So in that way the 
"functions" are appealing because I could just set up a function that takes a 
config and sets up that internal pipeline and then pulsar takes care of 
deploying it and checking on its metrics and logs
----
2020-09-16 16:16:50 UTC - Andy Walker: but I guess that's not in the cards?
----
2020-09-16 16:20:37 UTC - Andy Walker: Also my impression was that Flink's Go 
support was nonexistent
----
2020-09-16 16:24:26 UTC - Rich Adams: My two cents, if I were going to try to 
do Go in Flink I would probably go through Apache Beam. Although they have 
marked their Go API as “experimental” Beam is backed by Google, and Google 
likes Go.
----
2020-09-16 16:28:22 UTC - Andy Walker: Yep. That's what I figured already
----
2020-09-16 16:29:10 UTC - Andy Walker: was hoping Pulsar functions might free 
me from that, but not looking like it. Also, Beam's Go support is...okay.  I've 
reached out to some of the people working on its sdk, but haven't heard back 
yet.
----
2020-09-16 16:29:37 UTC - Addison Higham: (sorry, few other conversations going 
on)

I think that is a great use case for how functions are intended to work. The 
"different" part about golang is that because it is a single compiled binary, 
rather than just a dependency that can be linked against and injected at 
runtime, it was different enough that it hasn't happened yet, but it should in 
fact be quite straight forward work
----
2020-09-16 16:30:28 UTC - Andy Walker: even with the whole "reaching out to 
databases for enrichment" thing?
----
2020-09-16 16:30:46 UTC - Andy Walker: earlier you said they were intended for 
lightweight transforms, and this is definitely not
----
2020-09-16 16:30:52 UTC - Andy Walker: at least some of the use cases aren't
----
2020-09-16 16:31:04 UTC - Rich Adams: I’m curious about the join as well, that 
seems like the complicated part.
----
2020-09-16 16:31:06 UTC - Andy Walker: and if it's not one-size fits all I feel 
like I'm better off just doing a client
----
2020-09-16 16:40:32 UTC - Addison Higham: maybe the best way to talk about 
functions is what they aren't, they are not trying to represent big long 
complicated DAGs with multiple re-keying and re-partitioning of work like 
flink, nor are they intended for complex aggregations of previous messages with 
watermarks, exactly once state semantics, etc.

If you are mostly taking a single message, enriching by talking to external 
service, using some matching rules, etc. then that is a good fit assuming that 
it isn't tons of steps that you would want to be decoupled at the compute layer.

So when I say lightweight, that might be relative to some of the really long 
and complicated workflows that flink is optimal for or for where you need to do 
really sophisticated things with aggregating a stream

If these workflows are tons of steps
+1 : Rich Adams
----
2020-09-16 16:44:22 UTC - Andy Walker: That mostly makes sense, but I am mostly 
new to streaming workflows like these, so it might help to break down a couple 
of these concepts and dive more into what "sophisticated" might mean to see if 
that's the direction I'm interested in going in or not.
----
2020-09-16 16:45:31 UTC - Andy Walker: at the very least I feel like I would 
need metrics and user configuration to work. In addition I would probably need 
to allow users to "test" their functions somehow, and to throttle them and 
alert if they start to misbehave.
----
2020-09-16 16:56:42 UTC - Addison Higham: re: the golang improvements, I can 
see with the rest of the core devs if there is already some timeline we have 
for getting the golang SDK on par with others. That would give you metrics and 
the like. As far as throttling, there are rate limits in pulsar that apply the 
same to functions as they do other consumers.

 If you are interested in helping with the golang improvements, I can help 
myself (or in touch with one of the other devs) to hopefully kick start that 
process. As mentioned, it should be pretty self contained and isn't super 
difficult.

Also happy to help answer any more questions about stream processing in general 
and how you might use pulsar vs flink or whatever, in fact, if a 30 minute call 
would be helpful to accelerate that, we could certainly make it happen 
:slightly_smiling_face:
----
2020-09-16 17:03:47 UTC - Vijay: @Vijay has joined the channel
----
2020-09-16 17:04:29 UTC - Andy Walker: I would like that, yes. I will PM you
----
2020-09-16 19:34:40 UTC - Vijay: #pulsartls Hi Community, i am pretty excited 
about pulsar and wanted to know more about it. So, lately i have been trying to 
setup TLS for pulsar using helm chart ( referring to Apache pulsar helm chart 
). I tried with cert manager and was able to achieve it . I was able to login 
into toolset pod and create consumer and producer to know whether it was 
working or not. All I had to do is just to enable cert manager and TLS configs 
for components. But , now I have modified TLS setup not using certmanager to 
create certs , instead I have cert for each component as kube secrets. For 
parity purpose I have put same cert names for components as jn chart ( same as 
when cert manager is enabled, how certs are created) . I was able to get 
cluster up with TLS enabled. I was able to create producer and consumer using 
toolset component and able to produce/consume messages with pulsar+ssl for 
broker. This mean  my TLS config  is properly setup ? Or is there anyway we can 
confirm component wise TLS enabled or not ,may be using some curl commands or 
something. I want to verify my TLS setup for each component. Any help is 
appreciated!
----
2020-09-16 19:42:31 UTC - Addison Higham: for the broker, you can simply check 
with curl against the admin APIs. Do I understand that you also set up TLS for 
bookkeeper and zookeeper? To validate those you can use `openssl s_client` 
which will just do a TCP connection and validate the cert
----
2020-09-16 19:42:57 UTC - Addison Higham: but if you are using the helm chart 
and still seeing everything pass and able to produce, that should be exercising 
all of the TLS endpoints
----
2020-09-16 22:27:11 UTC - Jim Martin: Hello all.  Question.  I have a process 
where 12 pods are trying to push events to a non partitioned topic.  When I 
view the stats, it only shows 5 producers, not 12.  I see this setting 
"maxConsumersPerSubscription" : "5".  But that is for consumers, is there one 
for producers?  If so can you tell me how to find/edit it?  The other 7 
producers are crashing when trying to push events and getting an `Unknown 
Error` from Pulsar.  It feels like there is a setting I am missing.
----
2020-09-16 22:33:24 UTC - Addison Higham: There is a setting `pulsar-admin 
namespaces get-max-producers-per-topic`which does set a limit to the number of 
producers on a given topic.

The other thing you can do is run `pulsar-admin namespaces policies` that will 
give you all the configuration settings for your namespace.

If it doesn't look like that is a case, you should be able to look at the 
broker logs and likely see more context about what the error is
----
2020-09-16 22:34:46 UTC - Jim Martin: thanks.  Is there some recommended 
numbers that we should not go past for producers / consumers?
----
2020-09-16 22:41:55 UTC - Addison Higham: that is pretty dependent upon your 
cluster size, but even a fairly small pulsar cluster can handle hundreds of 
producers and consumers on a single topic
----
2020-09-16 22:45:07 UTC - Jim Martin: Thanks for the info.
----
2020-09-17 00:44:37 UTC - xiewz1112: @xiewz1112 has joined the channel
----
2020-09-17 01:02:39 UTC - Jipei Wang: @Jipei Wang has joined the channel
----
2020-09-17 02:37:08 UTC - cruiseplus: @cruiseplus has joined the channel
----
2020-09-17 05:15:26 UTC - Jianyun Zhao: @Jianyun Zhao has joined the channel
----
2020-09-17 05:38:23 UTC - Rahul Vashishth: &gt; Do I understand that you also 
set up TLS for bookkeeper and zookeeper?
yes, TLS is enabled for all components
----
2020-09-17 07:14:08 UTC - Enrico: @Addison Higham i know consumertype, but i 
ask keysharedpolicy :slightly_smiling_face:
----
2020-09-17 08:35:25 UTC - Alan Hoffmeister: Hello everyone! So `(-1,-1,-1,-1)` 
is the string representation of `MessageId.earliest` , but what that means?
----
2020-09-17 08:35:45 UTC - Alan Hoffmeister: also how do I craft my own ids?
----
2020-09-17 08:37:56 UTC - Alan Hoffmeister: I was expecting the message id to 
be a simple uint64 or something similar, but it looks like a data structure 
:thinking_face:
----
2020-09-17 08:44:30 UTC - Alan Hoffmeister: oh nvm, it is indeed a int64, I'm 
so stupid
----

Slack digest for #general - 2020-09-17

Reply via email to