2019-04-09 09:38:00 UTC - Laurent Chriqui: Hello. Any updates on that matter @Sijie Guo? Did we get it right? Would there be easier means to implement this ? ---- 2019-04-09 12:32:15 UTC - Alexandre DUVAL: Hi, I see only getters in Record, Context classes for pulsar function, there is a way to add property to message on function processing? ---- 2019-04-09 12:34:37 UTC - John Crawford: just fyi, we ended up just downgrading to 2.2.1 ---- 2019-04-09 12:37:23 UTC - Sijie Guo: you mean adding properties to the result messages? ---- 2019-04-09 12:37:46 UTC - Alexandre DUVAL: Yes ---- 2019-04-09 12:39:10 UTC - Sijie Guo: currently it doesn’t support this yet. but it should straightforward to adding this feature. do you mind creating an issue to pulsar? so the community can pick it up. ---- 2019-04-09 12:43:52 UTC - Alexandre DUVAL: <https://github.com/apache/pulsar/issues/4009> ---- 2019-04-09 13:10:11 UTC - young: by pulsar-client tools, if running the producer at first,and then running the consumer,the first message will be lost,but the second message will not. ---- 2019-04-09 13:14:18 UTC - Sijie Guo: by default, a brand new subscription starts from latest messages. so if you want to consume all the messages in a topic, we have two options:
a) create the subscription before messages are published. b) create the consumer by specifying `.subscriptionInitialPosition(SubscriptionInitialPosition.earliest)`. ---- 2019-04-09 14:42:01 UTC - Chris DiGiovanni: Does the managedLedgerOffloadMaxThreads apply to reads and writes? ---- 2019-04-09 15:54:21 UTC - Emma Pollum: Having trouble with the prometheus metrics, continually getting this error while linting: text format parsing error in line 189: second TYPE line for metric name "pulsar_subscriptions_count", or TYPE reported after samples ---- 2019-04-09 15:54:26 UTC - Emma Pollum: is there a setting I have to change? ---- 2019-04-09 15:59:27 UTC - Sijie Guo: what version of prometheus are you using? if you are using 1.x version, it might be having some problems on handle duplicated metrics. try upgrade to 2.4.x version or above. ---- 2019-04-09 16:00:01 UTC - Sijie Guo: or a temp workaround at broker side by setting ``exposeTopicLevelMetricsInPrometheus=false`` ---- 2019-04-09 16:17:38 UTC - Emma Pollum: :thumbsup: I will try to upgrade. I did turn off topic level metrics and I'm still getting the error on pulsar_topics_count looks like there is one for the cluster, and one for namespace level ---- 2019-04-09 16:22:42 UTC - Emma Pollum: I'm running promtool with v. 2..7.1 and still getting the issue. ---- 2019-04-09 17:44:19 UTC - Emma Pollum: @Sijie Guo after updating prometheus to 2.7.1, prometheus cannot scrape if I have topic level metrics on. I need those metrics though to see subscription backlog counts... do you know of a workaround or a fix that is coming soon>? ---- 2019-04-09 19:35:09 UTC - Devin G. Bost: @Grant Wu Do you have links to any of the documentation for the k8s tooling that you're referring to (that overlaps with my proposal for the manifest approach)? ---- 2019-04-09 19:35:43 UTC - Grant Wu: So the tool that we use is <https://helm.sh/docs/helm/> ---- 2019-04-09 19:35:56 UTC - Devin G. Bost: Thanks! ---- 2019-04-09 19:37:55 UTC - Devin G. Bost: Is there a YAML file somewhere for deploying Pulsar components via Helm? ---- 2019-04-09 19:38:08 UTC - Kenan Dalley: Is there an example of a kafka source ingestion? I've been working to consume from an existing SSL-based Kafka cluster, and am running into issues. It looks like I may have finally got the source to connect to my cluster correctly, but no pulsar function was created. I see no errors in either the pulsar log or the function log, and actually see the, but when I do a pulsar-admin functions list, nothing shows up. pulsar-admin source list shows my source. ---- 2019-04-09 19:38:53 UTC - Grant Wu: <https://github.com/apache/pulsar/tree/master/deployment/kubernetes/helm> Pulsar supplies helm charts for Pulsar here ---- 2019-04-09 19:39:10 UTC - Devin G. Bost: I see YAML files for the broker, bookkeeper, grafana, prometheus, and related services, but I don't see anything for sinks or sources (or functions). ---- 2019-04-09 19:39:40 UTC - Grant Wu: I don’t know anything about sinks or sources. Functions being missing is what I was referring to when I said “Pulsar Functions are the exception here because they’re not materialized as k8s resources anywhere” ---- 2019-04-09 19:40:12 UTC - Devin G. Bost: Gotcha. What about tenants or namespaces? ---- 2019-04-09 19:40:26 UTC - Grant Wu: Those aren’t materialized as k8s resources either :confused: ---- 2019-04-09 19:40:38 UTC - Devin G. Bost: Okay. It sounds like there's no overlap then. ---- 2019-04-09 19:40:38 UTC - Grant Wu: Probably want to talk to @Matteo Merli to see what sort of roadmap/plans they have ---- 2019-04-09 19:40:45 UTC - Devin G. Bost: Good point. ---- 2019-04-09 19:41:00 UTC - Grant Wu: So when you said “components” you weren’t referring to the broker/bookeeper/etc.? ---- 2019-04-09 19:41:13 UTC - Devin G. Bost: Correct. ---- 2019-04-09 20:10:06 UTC - Ryan Samo: Has anyone attempted to calculate the time difference between when messages are produced to when they are consumed? Trying to figure if there is any built in metrics for this or if I need to roll my own. All I can think of is starting a reader and measuring the difference from the message produced time stamp to the time of read. Any suggestions or other ways this might be calculated internally? Even if it’s the produce to consume ratio every minute that would be enough for my use case. ---- 2019-04-09 20:11:19 UTC - Kenan Dalley: I'm basing this on the Cassandra sink tutorial from the website which says that a function should be created. ---- 2019-04-09 20:13:03 UTC - Ali Ahmed: are you using the kafka source ? ---- 2019-04-09 20:15:12 UTC - Kenan Dalley: Yes, I'm using the pulsar-io-kafka.2.3.0.nar that I pulled from the website. It's the only connector that I have in the "connectors" folder currently. ---- 2019-04-09 20:17:32 UTC - Ali Ahmed: don’t think someone there is an example it is uses in the integration tests you can take a look at that ---- 2019-04-09 20:18:08 UTC - Kenan Dalley: I assume that's in the source? ---- 2019-04-09 20:19:56 UTC - Ali Ahmed: yes ---- 2019-04-09 20:20:28 UTC - Kenan Dalley: Ok, I'll take a look there. Thanks. ---- 2019-04-09 20:20:30 UTC - Ali Ahmed: it’s using docker to to model e2e scenarios ---- 2019-04-09 20:32:50 UTC - Ryan Samo: Settled on using a reader to compare times but please feel free if anyone has better ideas! :+1: ---- 2019-04-10 00:14:47 UTC - Matteo Merli: Functions and IO connectors (which are based on functions) are not instantiated through the Helm chart. Rather the worker service (or broker) will just create a K8S deployment when the function is created ---- 2019-04-10 00:15:48 UTC - Grant Wu: Oh, I didn't realize it created a k8s deployment ---- 2019-04-10 00:17:04 UTC - Matteo Merli: Yes, it depends on the deployment mode (thread, process or K8S) ---- 2019-04-10 02:39:26 UTC - Steve Kim: Does anyone have guidance on configuring pulsar to improve read performance when reading from segments that have been offloaded to object storage (e.g. S3)? Of course reading from object storage will be slower than reading from bookies. However, I am surprised by how slow it is, when I compare to reading data directly from cloud storage without going through pulsar. ---- 2019-04-10 02:40:56 UTC - Steve Kim: I see that there is a configuration parameter `s3ManagedLedgerOffloadReadBufferSizeInBytes`. Are there other relevant configuration parameters? Should I be adjusting the size of the offloaded segments? ---- 2019-04-10 02:57:42 UTC - Sanjeev Kulkarni: @Ivan Kelly @jia zhai might be able to help you @Steve Kim ---- 2019-04-10 03:14:01 UTC - jia zhai: @Steve Kim, you are right, s3ManagedLedgerOffloadReadBufferSizeInBytes is the parameter. ---- 2019-04-10 06:17:58 UTC - Olivier Chicha: Hello, Is there a way to have a sticky distribution with pulsar? The idea is to have a shared subscription, but instead of having a round robin distribution, I would like that each message is distributed based on a hash function (on one of the property of the message) I think that it is feasible with Kafka through partitions but I don't see how this this can be achieved via Pulsar, we initially thought that we could use the Pulsar partitioned topic as well, but after re reading the doc we realized that it is not the case (from our understanding) ---- 2019-04-10 06:23:25 UTC - Ali Ahmed: @Olivier Chicha you want a specific message to go into a specific partition ? ---- 2019-04-10 06:30:59 UTC - Matteo Merli: @Olivier Chicha A failover subscription on a partitioned topic will achieve the same ---- 2019-04-10 06:56:14 UTC - Olivier Chicha: @Matteo Merli Thanks a lot for your answer. So if I create a failover subscription on a partitioned topic, it will not be the same consumer that will be the master on each partition? I thought that it would be the same for each partition based on what is written in the documentation of "failover subscription" "In failover mode, multiple consumers can attach to the same subscription. The consumers will be lexically sorted by the consumer's name and the first consumer will initially be the only one receiving messages. This consumer is called the master consumer" How are master distributed over the partitions ? Is there a documentation about it? ---- 2019-04-10 07:02:41 UTC - Matteo Merli: Uhm.. apparently the javadocs publishing on website got stuck some time back. Take a look at: <https://github.com/apache/pulsar/blob/c79fd728cf27417ca117ca220dd07dc4319d4c46/pulsar-client-api/src/main/java/org/apache/pulsar/client/api/SubscriptionType.java#L46> ---- 2019-04-10 07:04:11 UTC - Matteo Merli: > How are master distributed over the partitions ? Brokers will pick active consumers such that partitions will be evenly distributed among available consumers ---- 2019-04-10 07:15:16 UTC - Olivier Chicha: Great this was really a critical point for us. FYI the doc I was refering to is : <https://pulsar.apache.org/docs/en/concepts-messaging/#failover> Is there a way for a consumer to know : - on which partition he is the master ? - that he has become a master or a slave for a partition, or that simply the "distribution" on a topic has changed? ---- 2019-04-10 07:16:51 UTC - Matteo Merli: yes, take a look at <https://pulsar.apache.org/api/client/org/apache/pulsar/client/api/ConsumerBuilder.html#consumerEventListener-org.apache.pulsar.client.api.ConsumerEventListener-> ---- 2019-04-10 07:19:31 UTC - Matteo Merli: > FYI the doc I was refering to is : <https://pulsar.apache.org/docs/en/concepts-messaging/#failover> Yes, we really need to clarify this in the docs ---- 2019-04-10 07:32:12 UTC - Olivier Chicha: Ok, so the ConsumerEventListener will allow me to be notified of the changes : This is great and that should be enough for us As far as I understand, there is no way to get directly the list of the partitions on which my consumer is master for a given topic, is that correct? ---- 2019-04-10 07:37:54 UTC - Olivier Chicha: the message from merlimat in the thread answered my question thanks ---- 2019-04-10 07:40:46 UTC - Matteo Merli: No, but when you create the consumer you’ll get all the notifications ---- 2019-04-10 07:40:58 UTC - Olivier Chicha: On a totally different subject: Do you know if there is any plan / project to provide an Elixir / Erlang implementaton of the pulsar client ? ---- 2019-04-10 07:41:34 UTC - Matteo Merli: Also you can check the topic stats to see which one is active/inactive ---- 2019-04-10 07:42:41 UTC - Olivier Chicha: Great, thank you very much for all your answers. ---- 2019-04-10 08:07:22 UTC - Thor Sigurjonsson: I noticed this problem a few days ago and mentioned it to @Matteo Merli. I started working on a fix but got side tracked with my regular day job :slightly_smiling_face: I should have a commit later this week. ---- 2019-04-10 08:08:51 UTC - Sijie Guo: cool :+1: ---- 2019-04-10 08:32:39 UTC - Kev Jackson: morning - so I'm moving towards building a pulsar cluster after testing with a single instance ---- 2019-04-10 08:33:26 UTC - Kev Jackson: and I'm starting with the docs that suggest building the Zookeeper cluster first - reading the zookeeper docs suggests something interesting ---- 2019-04-10 08:34:23 UTC - Kev Jackson: "ZooKeeper runs in Java, release 1.8 or greater (JDK 8 or greater, FreeBSD support requires openjdk8). It runs as an ensemble of ZooKeeper servers. Three ZooKeeper servers is the minimum recommended size for an ensemble, and we also recommend that they run on separate machines. At Yahoo!, ZooKeeper is usually deployed on dedicated RHEL boxes, with dual-core processors, 2GB of RAM, and 80GB IDE hard drives" - is this still suggested sizing for zookeeper for a pulsar cluster ---- 2019-04-10 08:45:03 UTC - Ivan Kelly: @Kev Jackson it depends on the load on the cluster, but that should be enough for most cases ---- 2019-04-10 08:47:18 UTC - Ali Ahmed: @Olivier Chicha there is current plans a wrapper over c++ would be the way to go, it just depends on the community demand ----
