2020-04-08 09:27:19 UTC - xue: I think there is a problem in the creation of function application. Suppose that the classpath in the jar of function application contains configuration files such as XML or properties, which can't be found at all. They are all files in Java instance.jar. You have to put these configuration files in Java instance.jar to find them, which is not reasonable ---- 2020-04-08 09:27:49 UTC - Sijie Guo: I see. so it seems that it points to the wrong ledgers root path. I will have to take a closer look at it. ---- 2020-04-08 09:29:55 UTC - Sijie Guo: They are ignored. +1 : Poul Henriksen ---- 2020-04-08 09:34:09 UTC - Sijie Guo: I think it is related how do you run your applications. Are you running the producers/consumers in multi-AZ? If your producers and consumers are mostly in one AZ, then geo-replication is making more sense.
If your producers and consumers are already spreading across multiple AZs, then a singe cluster spreading across multiple AZ is making more sense. You can also leverage rack-aware placement in bookkeeper to configure data placement across multiple AZ and broker isolation strategy to configure failover topics between brokers in different AZs. Although you need to be aware of the network cost between AZs. +1 : Hiroyuki Yamada man-bowing : Hiroyuki Yamada ---- 2020-04-08 09:34:19 UTC - Sijie Guo: Hope that is useful to you. ---- 2020-04-08 09:36:53 UTC - Sijie Guo: @Ken Huang offloader only offload those closed ledgers. It doesn’t offoad the latest active ledger. ---- 2020-04-08 09:37:57 UTC - Sijie Guo: For experimental purpose, try unload the topic first, then produce messages to create new ledger. Try unload and produce multiple times. It will create multiple ledgers. Then you can trigger offload command to verify if the old ledgers are offloaded. ---- 2020-04-08 11:55:11 UTC - Ganga Lakshmanasamy: Hi, I am trying to connect to pulsar from a wildfly server. Both are running in docker and i am able to ping pulsar container from wildfly. But while trying to connect from app which is deployed inside wildfly, below error is thrown, org.apache.pulsar.shade.io.netty.channel.AbstractChannel$AnnotatedConnectException: Connection refused: pulsar/172.27.0.6:6650 ---- 2020-04-08 12:16:59 UTC - Hiroyuki Yamada: Thank you very much @Sijie Guo for the quick reply. It helps a lot but I still have some unknowns. First, are they very different from availability or performance perspective ? Second, use cases that fit geo-replication are supposed to manage topics in more isolated way ? e.g. Producers in AZ-a only create messages for topic-a in cluster-a in AZ-a, and producers in AZ-b only create messages for topic-b in cluster-b in AZ-b, and so on ? Then, my application isn’t like that. Every producers (that are possibly spread in multi-azs) create messages for a single topic. Also, it would be great if you can point me to some references about `broker isolation strategy`. I can’t find anything with the keyword. ---- 2020-04-08 13:25:59 UTC - Damien: Hi, I am new to Pulsar and I came across this tweet : <https://twitter.com/sijieg/status/1202727993145098247> and wondered if this architecture, presented in the slide, already exist and it I could already directly query the “stream reader”, “sub-stream reader”, or “segment reader like @Sijie Guo is talking about for the flink’s integration ? ---- 2020-04-08 13:38:39 UTC - Jon P: @Jon P has joined the channel ---- 2020-04-08 13:52:44 UTC - Rattanjot Singh: Need to build a custom dockerImage for pulsar. `<https://github.com/apache/pulsar/blob/master/docker/pulsar/Dockerfile>` What should we pass as argument PULSAR_TARBALL? ---- 2020-04-08 13:58:35 UTC - Addison Higham: @Rattanjot Singh maven will build the docker image for you. Do `mvn package -Pdocker -DskipTests` ---- 2020-04-08 14:04:59 UTC - Michael Gokey: @Michael Gokey has joined the channel ---- 2020-04-08 15:03:46 UTC - Jon P: Hi. I’m trying to implement OpenTracing/Jaeger tracing across Pulsar topics. I came across this page (<https://github.com/apache/pulsar/wiki/PIP-23:-Message-Tracing-By-Interceptors>) which started well but unfortunately gets a little light on detail when it comes to (de-)serializing trace context from the Message & implementation of the concrete Producer/Consumer Interceptor. I had a look at the code for the PR and started an implementation based on ProducerInterceptor<T> but that class has now been deprecated in favour of an un-typed version ‘ProducerInterceptor’ with no indication why in the comments. I’m grabbing the repo to check commit history but can anyone explain why the move to un-typed ProducerInterceptor, which means I can’t get a type-safe reference to my custom message class? Also, any suggestions where I might find good code examples of interceptors/tracing? Many thanks. ---- 2020-04-08 16:10:01 UTC - Sijie Guo: Have you tried telnet that port? ---- 2020-04-08 16:20:39 UTC - Sijie Guo: > are they very different from availability or performance perspective I think it is more about availability. From performance wise, based on my experiences, cross zone latency should be fine. > use cases that fit geo-replication are supposed to manage topics in more isolated way yes in some sense. so in your use case, you are more looking for a single cluster spreading across multiple AZs. > it would be great if you can point me to some references about `broker isolation strategy` It is called namespace isolation strategy. sorry for my typo. Currently we have very little documentation about that. My team is working on filling up the documentation <http://pulsar.apache.org/docs/en/pulsar-admin/#ns-isolation-policy> man-bowing : Hiroyuki Yamada +1 : Hiroyuki Yamada ---- 2020-04-08 16:40:45 UTC - Sijie Guo: @Damien Yes. It already exists in the current Pulsar architecture. • “Stream Reader” is the pulsar reader, which read streams of events of a single topic (partition). Events are read in order. • “Sub-Stream Reader” is the key_range reader, which was introduced recently. It will be released as part of 2.6.0 release. A key-range reader reads a subset of events in a stream (a topic partition). You can have multiple readers reading one stream. • “Segment Reader” is a bypass-broker reader that reads data directly from bookkeeper or tiered storage. The segment reader is currently part of pulsar/bookkeeper storage API. Pulsar already uses it in implementing presto connector which is known as Pulsar SQL. If you are looking for individual reader API, they are already there. but they are not well connected to present a whole picture. If you are looking for a more concrete example on how these APIs are used, the pulsar-flink integration (<https://github.com/streamnative/pulsar-flink>) we are doing is a good example. This is the connector that we are contributing to upstream Flink as part of FLIP-72. Eventually we will do the same thing for other integrations such as pulsar-spark and pulsar-presto integrations. ---- 2020-04-08 16:48:13 UTC - Damien: thanks very much for you quick answer @Sijie Guo! OK I understand now. I first thought that the pulsar-flink was going to be “plugged” to pulsar thought these readers only for the need of the incoming “pulsar state backend”. By the way, beside FLIP-72 (which does not seem to mention the statebackend integration), where can I follow more of the development ? and specifically about the statebackend integration? any github branch? another Jira ticket? ---- 2020-04-08 16:52:16 UTC - Damien: about “bypass-broker reader that reads data directly from bookkeeper or tiered storage”, is is transparent in a client point of view when querying the *Segment Reader* ? This is the feature I’m mainly interested into. ---- 2020-04-08 17:00:59 UTC - Sijie Guo: @Jon P > gets a little light on detail when it comes to (de-)serializing trace context from the Message & implementation of the concrete Producer/Consumer Interceptor You can (de-)serialize trace context into message properties. Currently the `Message` interface doesn’t provider any methods to mutate the message. So you might have to check if the `Message` is a `MessageImpl` and cast the `Message` to `MessageImpl`. The MessageImpl has a MessageBuilder underneath, where you can use for adding properties. ```MessageImpl msg = (MessageImpl) message; msg.getMessageBuilder() .addProperties(KeyValue.newBuilder().setKey(tag).setValue(tagValue));``` This is not an elegant solution. But there is something you can achieve using current API. We can look into how to expose this builder to the interceptors so people can easily access the builder. > class has now been deprecated in favour of an un-typed version ‘ProducerInterceptor’ with no indication why in the comments The `api.ProducerInterceptor<T>` was deprecated in favor of `api.interceptor.ProducerInterceptor` because multiple schemas support in producer. Because 1) it is hard to support multiple schemas in one single producer if interceptor is strongly typed with one type. 2) In most of interceptors, the `value` of a messages ideally shouldn’t be changed after interception. Most of the time people are using interceptor on enriching a message. > Also, any suggestions where I might find good code examples of interceptors/tracing? @Penghui Li - we can think about providing an open-tracing interceptor example for your reference. --- The interceptor was added a few releases ago. We are still filling the gaps between code and documentation. Sorry for lack of documentation and code examples. You are also welcome to make any contributions to either code examples or documentation. ---- 2020-04-08 17:17:58 UTC - Sijie Guo: Currently FLIP-72 is more focusing on getting existing pulsar-flink integration including source, sink, catalog into upstream Flink. So people can consume an “official” pulsar-flink connector from Flink distribution. It doesn’t include the state integration. --- Until FLIP-72 is done, I think most of discussions and integrations will be added to <https://github.com/streamnative/pulsar-flink> . We are working with Flink community to first get FLIP-72 done. Once FLIP-72 is done, we can move the flink-pulsar discussions to flink development list and gain helps from Flink community. There used to be a Github issue for tracking the roadmap of the features we are adding to this integration. Currently we are still designing the whole state integration, because we want to make sure whatever state integration we do in pulsar-flink can be commonly used by both pulsar functions and pulsar-flink (or any integrations with other data processing engines). You can star and follow the repo :wink: Some references: <https://cwiki.apache.org/confluence/display/FLINK/FLIP-72%3A+Introduce+Pulsar+Connector> <https://issues.apache.org/jira/browse/FLINK-14146> <https://github.com/apache/flink/pull/10875> <https://github.com/apache/flink/pull/10455> If you are interested in this, I would suggest making comments to the JIRA issue. So Flink community can realize the need of this integration. ---- 2020-04-08 17:19:45 UTC - Damien: awesome, thank you again! I will try to contribute as much as possible :wink: ---- 2020-04-08 17:22:28 UTC - Shivji Kumar Jha: Hi @Sijie Guo Do you suggest that we move the flush to catch instead of finally? Will that be okay? Reference: <https://github.com/apache/pulsar/pull/6695> ---- 2020-04-08 17:29:39 UTC - Shivji Kumar Jha: +1 on this. Was trying the same sometime back but then dropped it to pick up later. Would be a useful addition for us. ```we can think about providing an open-tracing interceptor example for your reference.``` ---- 2020-04-08 17:36:41 UTC - p: @p has joined the channel ---- 2020-04-08 17:53:01 UTC - Sijie Guo: Can flush throw exception? If flush also throws exception, then we might need to know where the exception is thrown and whether do we need to call flush. Alternatively, we can consider adding a flag for indicating if the flush succeeded or not. +1 : Shivji Kumar Jha ---- 2020-04-08 18:05:46 UTC - Frank Kelly: Newbie question here - we're considering using Pulsar to stream binary data (Audio) in a continuous stream back to clients. Is there a recommended implementation / design to realize that? I've seen mention of WebSocket but was wondering if there are alternatives. Thanks in advance. ---- 2020-04-08 18:19:55 UTC - Matteo Merli: You could consider using non-persistent topics ---- 2020-04-08 18:21:32 UTC - Frank Kelly: My limited understanding of Pulsar - is that like most pub/sub systems - by defaylt it is transmitting discrete "chunks" of data rather than a stream - but I could be wrong ---- 2020-04-08 18:23:35 UTC - p: Has anyone here had issues with pulsar corrupting data? also a colleague of mine tried copying data from one pulsar instance to a new one, but hasn't been able to successful in starting the new instance. (using standalone) ---- 2020-04-08 18:23:40 UTC - Matteo Merli: That's correct. regarding persistent vs non-persistent: you could see that difference like TCP vs UDP ---- 2020-04-08 18:24:08 UTC - Frank Kelly: OK thanks I will take a look ---- 2020-04-08 18:24:41 UTC - Matteo Merli: persistent: data is replicated and stored on disk non-persistent: best-effort in memory delivery ---- 2020-04-08 18:25:58 UTC - p: you aren't going to get much out of pulsar unless you are dealing with audio segments that you want to store in order. pulsar is going to deliver 1000s of events to your consumer at a time, and you will go through one by one, and stream the corresponding data to your clients. ---- 2020-04-08 18:26:08 UTC - p: the streaming part is on you, pulsar isn't going to help you +1 : Frank Kelly ---- 2020-04-08 18:29:30 UTC - p: Also, i have a standalone instance running, and it's taking 10% cpu, even though i'm not doing anything with it. is this normal? ---- 2020-04-08 18:31:21 UTC - Matteo Merli: @p if you read the description of non-persistent topics, you can find the above statement is not necessarily true. Also, ordering is not the only reason one would use Pulsar ---- 2020-04-08 18:32:44 UTC - p: i know it's not the only reason. the topic is on streaming though. i think the topic type is actually off-topic in this thread ---- 2020-04-08 18:33:26 UTC - Matteo Merli: no, it's not ---- 2020-04-08 18:33:35 UTC - p: how so? ---- 2020-04-08 18:33:44 UTC - Matteo Merli: non-persistent topic behavior is fondamentally different ---- 2020-04-08 18:34:56 UTC - Matteo Merli: and no: non-persistent topic is really not about "data streaming" ---- 2020-04-08 18:35:20 UTC - p: pulsar isn't really about data streaming, in general ---- 2020-04-08 18:35:34 UTC - Matteo Merli: uhm.. ---- 2020-04-08 18:35:43 UTC - Matteo Merli: really? ---- 2020-04-08 18:36:28 UTC - p: yes ---- 2020-04-08 18:36:45 UTC - p: i have a streaming API ontop of pulsar as my data layer ---- 2020-04-08 18:37:05 UTC - p: pulsar didn't help very much in the streaming part ---- 2020-04-08 18:37:30 UTC - p: also, pulsar sends huge chunks of messages to it's clients at a time ---- 2020-04-08 18:37:56 UTC - Matteo Merli: how big it's controllable ---- 2020-04-08 18:38:06 UTC - p: default is 1000 messages at a time ---- 2020-04-08 18:39:08 UTC - p: unless you are breaking up your data into multiple events, i don't see how one would even get the illusion of streaming from pulsar ---- 2020-04-08 18:41:47 UTC - p: pulsar isn't going to stream you bytes of an event, you get the whole event and work with it one whole event at a time. ---- 2020-04-08 18:42:04 UTC - Matteo Merli: are you still referring to data streaming or audio-streaming? ---- 2020-04-08 18:43:08 UTC - p: is there a difference? ---- 2020-04-08 18:43:33 UTC - Matteo Merli: Of course there is. ---- 2020-04-08 18:43:48 UTC - Matteo Merli: A message is the fundamental atomic unit. you can have messages as small as needed ---- 2020-04-08 18:47:14 UTC - p: sure, you can cut up your binary data into a bunch of messages, and maybe that'll help you. i did mention this. but pulsar isn't going to help you with the streaming. you will have to figure out how to get your pulsar events working with whatever streaming protocol you are working with (HTTP, or something else). ---- 2020-04-08 19:27:27 UTC - Jon P: Thanks @Sijie Guo that’s useful background. I’d started down a different route - adding `Map<String, String> spanContext` to my data/schema (Message.getValue()) for injecting/extracting the context as we hope to trace over multiple different messaging systems. If the builder provides a way to set properties on MessageImpl that gives me another option - I’ll give it a try. I did manage to find a reference to the ticket in commit history so read the detail on why the move to untyped ProducerInterceptor - I take the point about supporting multiple schema in one interceptor. If you were able to provide an OpenTracing (soon to be OpenTelemetry but I’m sure the principles are the same) producer/consumer interceptor pair that would be really useful, although I’m getting there slowly - mostly trying to get my head around how to use inject/extract to link up multiple spans across services/transports right now! One last minor question - do you plan to support the `Format.Builtin.TEXT_MAP_INJECT` & `Format.Builtin.TEXT_MAP_EXTRACT` carrier types in pulsar-client? Thanks again for the help! ---- 2020-04-08 19:59:29 UTC - Sijie Guo: > for injecting/extracting the context as we hope to trace over multiple different messaging systems. Yeah. Good point. I think most of the message systems already support properties or headers. It should be ‘safe’ to make this assumption and use properties to propage the tracing context. > do you plan to support the `Format.Builtin.TEXT_MAP_INJECT` & `Format.Builtin.TEXT_MAP_EXTRACT` carrier types in pulsar-client? Thanks again for the help! If we are mapping the properties as the carrier, it seems that most naturally we will support TEXT_MAP_INJECT and TEXT_MAP_EXTRACT. We might be able to support other carrier type as well. ---- 2020-04-08 20:52:23 UTC - Markus Kobler: @Markus Kobler has joined the channel ---- 2020-04-08 20:57:13 UTC - Rahul: I'm wondering if there is any way to authorize a JWT token with multiple roles (array of valid pulsar roles). Is there a workaround if there is no inbuilt support for it? ---- 2020-04-08 21:02:52 UTC - Rahul: Is it possible to use a nested value for `tokenAuthClaim` broker configuration? ---- 2020-04-08 21:04:12 UTC - Sijie Guo: Currently I don’t think it support that yet. What is your use case? ---- 2020-04-08 21:13:41 UTC - Rahul: Currently the upstream system is issuing us JWT token as an array of roles. Right now we have only one role in that array. Later on it is possible we get multiple roles. This will help us keep the roles at a very granular level. But at the same time we can compose multiple roles to give more permission. @Sijie Guo ---- 2020-04-08 21:30:26 UTC - Matteo Merli: You can make that work by using a custom AuthorizationProvider. There you will be able to access the token data and decide if this has access to the specific topic ---- 2020-04-08 22:27:54 UTC - Sijie Guo: @Rahul yeah, as what Matteo mentioned. it can be done in a customized AuthorizationProvider. If you think it is a generic use case, I would also suggest you creating a github issue for that. ---- 2020-04-09 02:47:46 UTC - Hiroyuki Yamada: Thank you very much ! It is very helpful. ---- 2020-04-09 03:48:22 UTC - Rahul: @Sijie Guo @Matteo Merli I have already considered `AuthorizationProvider` interface. But it takes role as a string parameter. So that's where I'm stuck. ---- 2020-04-09 04:43:27 UTC - Ganga Lakshmanasamy: Nope. I tried ping. Let me try telnet ---- 2020-04-09 05:00:09 UTC - Sijie Guo: @Rahul I think you need to also implement AuthenticationProvider. I think you need to interpret the token and return a string that represents a list of roles. Then the AuthorizationProvider can interpret the string as the list of roles. ---- 2020-04-09 05:02:17 UTC - Rahul: @Sijie Guo got it. It's a hack but this is helpful for the time being. Thanks. ---- 2020-04-09 05:03:45 UTC - Sijie Guo: Yes. It is a hack. It will be great if you can create an issue for us, that we can consider enhancing our interfaces. ---- 2020-04-09 05:04:26 UTC - Rahul: @Sijie Guo sure. Will do that. +1 : Sijie Guo ---- 2020-04-09 07:10:57 UTC - Frans Guelinckx: @Frans Guelinckx has joined the channel ---- 2020-04-09 07:13:53 UTC - Jon P: Apologies, I realised my last question about TEXT_MAP_INJECT was daft - it’s Jaeger which needs to support the additional carrier types, not Pulsar! ----
