2020-09-16 13:50:49 UTC - Marcio Martins: Hey guys, is it possible (in Java) to consume/read from a topic with schema, without knowing the schema, and then post it back to another topic with the original schema? i.e. I want to do a sort of join operator, but I would like it to be as generic as possible and only look at the message metadata without (de)serialization of the data. ---- 2020-09-16 14:02:22 UTC - Ebere Abanonu: for consumers there is `AutoConsumeSchema` and for producers something like `AutoProduceBytesSchema` ---- 2020-09-16 14:02:59 UTC - Sankararao Routhu: Hi, When synchronous replication enabled between two clusters(west and east), is it possible to assume two brokers as the owners for a given topic. I will have my publisher client in active-active in west and east, and my publisher connects to same topic from both west and east. Can my publisher in west connect to west broker and my publisher in east connect to east broker with sync replication? We chose sync replication to avoid duplicate messages on consumer side between west and east ---- 2020-09-16 14:35:10 UTC - Marcio Martins: thanks, I will have a look at it later, ---- 2020-09-16 14:55:29 UTC - Marcio Martins: Is there a way to get list of topics in a namespace through the Java API? LookupService is not part of it :[ ---- 2020-09-16 14:58:35 UTC - Evan Furman: Hi guys, anything else we can check here? Is it because we’re using `public/default` potentially? ---- 2020-09-16 15:43:37 UTC - Addison Higham: no, a topic may be owned only by a single broker.
You could do something somewhat like this perhaps involving pulsar functions and tenant isolation. Essentially, you would have topic-east and topic-west and use the isolation features to ensure those topics only live on east and west brokers, then pulsar functions could be used to replicate messages written from east producers to west topic and vice versa. ---- 2020-09-16 15:43:43 UTC - Andy Walker: @Andy Walker has joined the channel ---- 2020-09-16 15:48:18 UTC - Andy Walker: I am a full-time Go developer and I like the idea of Pulsar a lot, and I am excited to start using it, but I was looking into Pulsar Functions, and it seems the Go SDK is _really_ feature-poor. Is there much point to using it in its current state as opposed to just using a client? Are functions mainly intended for zippy transforms, or are they more general than that? Who would be a good person to talk to about bringing the Go SDK into parity with the others? ---- 2020-09-16 16:01:53 UTC - Addison Higham: oh hey, I just tweeted at you to join here, looks like you are on it already ---- 2020-09-16 16:02:33 UTC - Andy Walker: If I'm going to complain, I should be prepared to follow through the_horns : Rich Adams ---- 2020-09-16 16:02:35 UTC - Andy Walker: so here I am. ---- 2020-09-16 16:02:55 UTC - Andy Walker: Only fair. ---- 2020-09-16 16:04:22 UTC - Addison Higham: heh, no problem, the go runtime is pretty behind the java and python runtimes, totally reasonable feedback, you following through with being willing to help work on it is awesome :slightly_smiling_face: ---- 2020-09-16 16:08:57 UTC - Addison Higham: so some background on functions: - it is pretty much a lightweight wrapper over the SDK, so adding new functionality is pretty much just improving the little runtime wrapper - functions are mostly for what we call "lightweight transformations", it isn't intended to be a full fledged processing engine like flink or ksql/kstreams. If you need that, we recommend flink - that said, we still think there are a lot of use cases that should be supported in functions like transformations, fan-out to multiple topics, small stateful transformations (though that is in an alpha state, not recommended for production right now) - functions are also the building block for pulsar "connectors" which currently are all implemented in java, but there isn't a reason that couldn't change in the future ---- 2020-09-16 16:09:50 UTC - Andy Walker: > - functions are mostly for what we call "lightweight transformations", it isn't intended to be a full fledged processing engine like flink or ksql/kstreams. If you need that, we recommend flink That's what I needed to hear ---- 2020-09-16 16:10:01 UTC - Andy Walker: At least in the short term ---- 2020-09-16 16:10:28 UTC - Addison Higham: the biggest reason why you might want to use functions over a client is that pulsar takes care of running the function either inside the broker or on a pool of deployed functions, or on kubernetes, which is where functions really shine ---- 2020-09-16 16:10:38 UTC - Addison Higham: what is your use case? ---- 2020-09-16 16:13:07 UTC - Andy Walker: I have a stream of incoming JSON documents from various sources that users need to be able to run matching, transforming and simple enrichment workflows on. ---- 2020-09-16 16:15:37 UTC - Andy Walker: so, for example, one workflow might say "alert me immediately on anything on the foo topic that matches this jsonpath/jmespath rule" while another might say "if something on the bar topic has 'baz' contained in some.collection, enrich the document with a query from this data source and then pass it on to another topic or alert on it" ---- 2020-09-16 16:16:45 UTC - Andy Walker: And I need to design it such that they can set up these workflows themselves (more or less). So in that way the "functions" are appealing because I could just set up a function that takes a config and sets up that internal pipeline and then pulsar takes care of deploying it and checking on its metrics and logs ---- 2020-09-16 16:16:50 UTC - Andy Walker: but I guess that's not in the cards? ---- 2020-09-16 16:20:37 UTC - Andy Walker: Also my impression was that Flink's Go support was nonexistent ---- 2020-09-16 16:24:26 UTC - Rich Adams: My two cents, if I were going to try to do Go in Flink I would probably go through Apache Beam. Although they have marked their Go API as “experimental” Beam is backed by Google, and Google likes Go. ---- 2020-09-16 16:28:22 UTC - Andy Walker: Yep. That's what I figured already ---- 2020-09-16 16:29:10 UTC - Andy Walker: was hoping Pulsar functions might free me from that, but not looking like it. Also, Beam's Go support is...okay. I've reached out to some of the people working on its sdk, but haven't heard back yet. ---- 2020-09-16 16:29:37 UTC - Addison Higham: (sorry, few other conversations going on) I think that is a great use case for how functions are intended to work. The "different" part about golang is that because it is a single compiled binary, rather than just a dependency that can be linked against and injected at runtime, it was different enough that it hasn't happened yet, but it should in fact be quite straight forward work ---- 2020-09-16 16:30:28 UTC - Andy Walker: even with the whole "reaching out to databases for enrichment" thing? ---- 2020-09-16 16:30:46 UTC - Andy Walker: earlier you said they were intended for lightweight transforms, and this is definitely not ---- 2020-09-16 16:30:52 UTC - Andy Walker: at least some of the use cases aren't ---- 2020-09-16 16:31:04 UTC - Rich Adams: I’m curious about the join as well, that seems like the complicated part. ---- 2020-09-16 16:31:06 UTC - Andy Walker: and if it's not one-size fits all I feel like I'm better off just doing a client ---- 2020-09-16 16:40:32 UTC - Addison Higham: maybe the best way to talk about functions is what they aren't, they are not trying to represent big long complicated DAGs with multiple re-keying and re-partitioning of work like flink, nor are they intended for complex aggregations of previous messages with watermarks, exactly once state semantics, etc. If you are mostly taking a single message, enriching by talking to external service, using some matching rules, etc. then that is a good fit assuming that it isn't tons of steps that you would want to be decoupled at the compute layer. So when I say lightweight, that might be relative to some of the really long and complicated workflows that flink is optimal for or for where you need to do really sophisticated things with aggregating a stream If these workflows are tons of steps +1 : Rich Adams ---- 2020-09-16 16:44:22 UTC - Andy Walker: That mostly makes sense, but I am mostly new to streaming workflows like these, so it might help to break down a couple of these concepts and dive more into what "sophisticated" might mean to see if that's the direction I'm interested in going in or not. ---- 2020-09-16 16:45:31 UTC - Andy Walker: at the very least I feel like I would need metrics and user configuration to work. In addition I would probably need to allow users to "test" their functions somehow, and to throttle them and alert if they start to misbehave. ---- 2020-09-16 16:56:42 UTC - Addison Higham: re: the golang improvements, I can see with the rest of the core devs if there is already some timeline we have for getting the golang SDK on par with others. That would give you metrics and the like. As far as throttling, there are rate limits in pulsar that apply the same to functions as they do other consumers. If you are interested in helping with the golang improvements, I can help myself (or in touch with one of the other devs) to hopefully kick start that process. As mentioned, it should be pretty self contained and isn't super difficult. Also happy to help answer any more questions about stream processing in general and how you might use pulsar vs flink or whatever, in fact, if a 30 minute call would be helpful to accelerate that, we could certainly make it happen :slightly_smiling_face: ---- 2020-09-16 17:03:47 UTC - Vijay: @Vijay has joined the channel ---- 2020-09-16 17:04:29 UTC - Andy Walker: I would like that, yes. I will PM you ---- 2020-09-16 19:34:40 UTC - Vijay: #pulsartls Hi Community, i am pretty excited about pulsar and wanted to know more about it. So, lately i have been trying to setup TLS for pulsar using helm chart ( referring to Apache pulsar helm chart ). I tried with cert manager and was able to achieve it . I was able to login into toolset pod and create consumer and producer to know whether it was working or not. All I had to do is just to enable cert manager and TLS configs for components. But , now I have modified TLS setup not using certmanager to create certs , instead I have cert for each component as kube secrets. For parity purpose I have put same cert names for components as jn chart ( same as when cert manager is enabled, how certs are created) . I was able to get cluster up with TLS enabled. I was able to create producer and consumer using toolset component and able to produce/consume messages with pulsar+ssl for broker. This mean my TLS config is properly setup ? Or is there anyway we can confirm component wise TLS enabled or not ,may be using some curl commands or something. I want to verify my TLS setup for each component. Any help is appreciated! ---- 2020-09-16 19:42:31 UTC - Addison Higham: for the broker, you can simply check with curl against the admin APIs. Do I understand that you also set up TLS for bookkeeper and zookeeper? To validate those you can use `openssl s_client` which will just do a TCP connection and validate the cert ---- 2020-09-16 19:42:57 UTC - Addison Higham: but if you are using the helm chart and still seeing everything pass and able to produce, that should be exercising all of the TLS endpoints ---- 2020-09-16 22:27:11 UTC - Jim Martin: Hello all. Question. I have a process where 12 pods are trying to push events to a non partitioned topic. When I view the stats, it only shows 5 producers, not 12. I see this setting "maxConsumersPerSubscription" : "5". But that is for consumers, is there one for producers? If so can you tell me how to find/edit it? The other 7 producers are crashing when trying to push events and getting an `Unknown Error` from Pulsar. It feels like there is a setting I am missing. ---- 2020-09-16 22:33:24 UTC - Addison Higham: There is a setting `pulsar-admin namespaces get-max-producers-per-topic`which does set a limit to the number of producers on a given topic. The other thing you can do is run `pulsar-admin namespaces policies` that will give you all the configuration settings for your namespace. If it doesn't look like that is a case, you should be able to look at the broker logs and likely see more context about what the error is ---- 2020-09-16 22:34:46 UTC - Jim Martin: thanks. Is there some recommended numbers that we should not go past for producers / consumers? ---- 2020-09-16 22:41:55 UTC - Addison Higham: that is pretty dependent upon your cluster size, but even a fairly small pulsar cluster can handle hundreds of producers and consumers on a single topic ---- 2020-09-16 22:45:07 UTC - Jim Martin: Thanks for the info. ---- 2020-09-17 00:44:37 UTC - xiewz1112: @xiewz1112 has joined the channel ---- 2020-09-17 01:02:39 UTC - Jipei Wang: @Jipei Wang has joined the channel ---- 2020-09-17 02:37:08 UTC - cruiseplus: @cruiseplus has joined the channel ---- 2020-09-17 05:15:26 UTC - Jianyun Zhao: @Jianyun Zhao has joined the channel ---- 2020-09-17 05:38:23 UTC - Rahul Vashishth: > Do I understand that you also set up TLS for bookkeeper and zookeeper? yes, TLS is enabled for all components ---- 2020-09-17 07:14:08 UTC - Enrico: @Addison Higham i know consumertype, but i ask keysharedpolicy :slightly_smiling_face: ---- 2020-09-17 08:35:25 UTC - Alan Hoffmeister: Hello everyone! So `(-1,-1,-1,-1)` is the string representation of `MessageId.earliest` , but what that means? ---- 2020-09-17 08:35:45 UTC - Alan Hoffmeister: also how do I craft my own ids? ---- 2020-09-17 08:37:56 UTC - Alan Hoffmeister: I was expecting the message id to be a simple uint64 or something similar, but it looks like a data structure :thinking_face: ---- 2020-09-17 08:44:30 UTC - Alan Hoffmeister: oh nvm, it is indeed a int64, I'm so stupid ----
