Slack digest for #general - 2019-04-10

Apache Pulsar Slack Wed, 10 Apr 2019 02:19:53 -0700

2019-04-09 09:38:00 UTC - Laurent Chriqui: Hello. Any updates on that matter 
@Sijie Guo? Did we get it right? Would there be easier means to implement this ?
----
2019-04-09 12:32:15 UTC - Alexandre DUVAL: Hi, I see only getters in Record, 
Context classes for pulsar function, there is a way to add property to message 
on function processing?
----
2019-04-09 12:34:37 UTC - John Crawford: just fyi, we ended up just downgrading 
to 2.2.1
----
2019-04-09 12:37:23 UTC - Sijie Guo: you mean adding properties to the result 
messages?
----
2019-04-09 12:37:46 UTC - Alexandre DUVAL: Yes
----
2019-04-09 12:39:10 UTC - Sijie Guo: currently it doesn’t support this yet. but 
it should straightforward to adding this feature. do you mind creating an issue 
to pulsar? so the community can pick it up.
----
2019-04-09 12:43:52 UTC - Alexandre DUVAL: 
<https://github.com/apache/pulsar/issues/4009>
----
2019-04-09 13:10:11 UTC - young: by pulsar-client tools, if running the 
producer at first,and then running the consumer,the first message will be 
lost,but the second message will not.
----
2019-04-09 13:14:18 UTC - Sijie Guo: by default, a brand new subscription 
starts from latest messages. so if you want to consume all the messages in a 
topic, we have two options:

a) create the subscription before messages are published.

b) create the consumer by specifying
`.subscriptionInitialPosition(SubscriptionInitialPosition.earliest)`.
----
2019-04-09 14:42:01 UTC - Chris DiGiovanni: Does the
managedLedgerOffloadMaxThreads apply to reads and writes?
----
2019-04-09 15:54:21 UTC - Emma Pollum: Having trouble with the prometheus
metrics, continually getting this
error while linting: text format parsing error in line 189: second TYPE line
for metric name "pulsar_subscriptions_count", or TYPE reported after samples
----
2019-04-09 15:54:26 UTC - Emma Pollum: is there a setting I have to change?
----
2019-04-09 15:59:27 UTC - Sijie Guo: what version of prometheus are you using?
if you are using 1.x version, it might be having some problems on handle
duplicated metrics. try upgrade to 2.4.x version or above.
----
2019-04-09 16:00:01 UTC - Sijie Guo: or a temp workaround at broker side by
setting ``exposeTopicLevelMetricsInPrometheus=false``
----
2019-04-09 16:17:38 UTC - Emma Pollum: :thumbsup: I will try to upgrade.
I did turn off topic level metrics and I'm still getting the error on
pulsar_topics_count
looks like there is one for the cluster, and one for namespace level
----
2019-04-09 16:22:42 UTC - Emma Pollum: I'm running promtool with v. 2..7.1 and
still getting the issue.
----
2019-04-09 17:44:19 UTC - Emma Pollum: @Sijie Guo after updating prometheus to
2.7.1, prometheus cannot scrape if I have topic level metrics on. I need those
metrics though to see subscription backlog counts... do you know of a
workaround or a fix that is coming soon&gt;?
----
2019-04-09 19:35:09 UTC - Devin G. Bost: @Grant Wu Do you have links to any of
the documentation for the k8s tooling that you're referring to (that overlaps
with my proposal for the manifest approach)?
----
2019-04-09 19:35:43 UTC - Grant Wu: So the tool that we use is
<https://helm.sh/docs/helm/>
----
2019-04-09 19:35:56 UTC - Devin G. Bost: Thanks!
----
2019-04-09 19:37:55 UTC - Devin G. Bost: Is there a YAML file somewhere for
deploying Pulsar components via Helm?
----
2019-04-09 19:38:08 UTC - Kenan Dalley: Is there an example of a kafka source
ingestion? I've been working to consume from an existing SSL-based Kafka
cluster, and am running into issues. It looks like I may have finally got the
source to connect to my cluster correctly, but no pulsar function was created.
I see no errors in either the pulsar log or the function log, and actually see
the, but when I do a pulsar-admin functions list, nothing shows up.
pulsar-admin source list shows my source.
----
2019-04-09 19:38:53 UTC - Grant Wu:
<https://github.com/apache/pulsar/tree/master/deployment/kubernetes/helm>
Pulsar supplies helm charts for Pulsar here
----
2019-04-09 19:39:10 UTC - Devin G. Bost: I see YAML files for the broker,
bookkeeper, grafana, prometheus, and related services, but I don't see anything
for sinks or sources (or functions).
----
2019-04-09 19:39:40 UTC - Grant Wu: I don’t know anything about sinks or
sources. Functions being missing is what I was referring to when I said
“Pulsar Functions are the exception here because they’re not materialized as
k8s resources anywhere”
----
2019-04-09 19:40:12 UTC - Devin G. Bost: Gotcha. What about tenants or
namespaces?
----
2019-04-09 19:40:26 UTC - Grant Wu: Those aren’t materialized as k8s resources
either :confused:
----
2019-04-09 19:40:38 UTC - Devin G. Bost: Okay. It sounds like there's no
overlap then.
----
2019-04-09 19:40:38 UTC - Grant Wu: Probably want to talk to @Matteo Merli to
see what sort of roadmap/plans they have
----
2019-04-09 19:40:45 UTC - Devin G. Bost: Good point.
----
2019-04-09 19:41:00 UTC - Grant Wu: So when you said “components” you weren’t
referring to the broker/bookeeper/etc.?
----
2019-04-09 19:41:13 UTC - Devin G. Bost: Correct.
----
2019-04-09 20:10:06 UTC - Ryan Samo: Has anyone attempted to calculate the time
difference between when messages are produced to when they are consumed? Trying
to figure if there is any built in metrics for this or if I need to roll my
own. All I can think of is starting a reader and measuring the difference from
the message produced time stamp to the time of read. Any suggestions or other
ways this might be calculated internally? Even if it’s the produce to consume
ratio every minute that would be enough for my use case.
----
2019-04-09 20:11:19 UTC - Kenan Dalley: I'm basing this on the Cassandra sink
tutorial from the website which says that a function should be created.
----
2019-04-09 20:13:03 UTC - Ali Ahmed: are you using the kafka source ?
----
2019-04-09 20:15:12 UTC - Kenan Dalley: Yes, I'm using the
pulsar-io-kafka.2.3.0.nar that I pulled from the website. It's the only
connector that I have in the "connectors" folder currently.
----
2019-04-09 20:17:32 UTC - Ali Ahmed: don’t think someone there is an example it
is uses in the integration tests you can take a look at that
----
2019-04-09 20:18:08 UTC - Kenan Dalley: I assume that's in the source?
----
2019-04-09 20:19:56 UTC - Ali Ahmed: yes
----
2019-04-09 20:20:28 UTC - Kenan Dalley: Ok, I'll take a look there. Thanks.
----
2019-04-09 20:20:30 UTC - Ali Ahmed: it’s using docker to to model e2e scenarios
----
2019-04-09 20:32:50 UTC - Ryan Samo: Settled on using a reader to compare times
but please feel free if anyone has better ideas! :+1:
----
2019-04-10 00:14:47 UTC - Matteo Merli: Functions and IO connectors (which are
based on functions) are not instantiated through the Helm chart. Rather the
worker service (or broker) will just create a K8S deployment when the function
is created
----
2019-04-10 00:15:48 UTC - Grant Wu: Oh, I didn't realize it created a k8s
deployment
----
2019-04-10 00:17:04 UTC - Matteo Merli: Yes, it depends on the deployment mode
(thread, process or K8S)
----
2019-04-10 02:39:26 UTC - Steve Kim: Does anyone have guidance on configuring
pulsar to improve read performance when reading from segments that have been
offloaded to object storage (e.g. S3)? Of course reading from object storage
will be slower than reading from bookies. However, I am surprised by how slow
it is, when I compare to reading data directly from cloud storage without going
through pulsar.
----
2019-04-10 02:40:56 UTC - Steve Kim: I see that there is a configuration
parameter `s3ManagedLedgerOffloadReadBufferSizeInBytes`. Are there other
relevant configuration parameters? Should I be adjusting the size of the
offloaded segments?
----
2019-04-10 02:57:42 UTC - Sanjeev Kulkarni: @Ivan Kelly @jia zhai might be able
to help you @Steve Kim
----
2019-04-10 03:14:01 UTC - jia zhai: @Steve Kim, you are right,
s3ManagedLedgerOffloadReadBufferSizeInBytes is the parameter.
----
2019-04-10 06:17:58 UTC - Olivier Chicha: Hello,
Is there a way to have a sticky distribution with pulsar?
The idea is to have a shared subscription, but instead of having a round robin
distribution, I would like that each message is distributed based on a hash
function (on one of the property of the message)
I think that it is feasible with Kafka through partitions
but I don't see how this this can be achieved via Pulsar, we initially thought
that we could use the Pulsar partitioned topic as well, but after re reading
the doc we realized that it is not the case (from our understanding)
----
2019-04-10 06:23:25 UTC - Ali Ahmed: @Olivier Chicha you want a specific
message to go into a specific partition ?
----
2019-04-10 06:30:59 UTC - Matteo Merli: @Olivier Chicha A failover subscription
on a partitioned topic will achieve the same
----
2019-04-10 06:56:14 UTC - Olivier Chicha: @Matteo Merli Thanks a lot for your
answer.
So if I create a failover subscription on a partitioned topic, it will not be
the same consumer that will be the master on each partition?
I thought that it would be the same for each partition based on what is written
in the documentation of "failover subscription"
"In failover mode, multiple consumers can attach to the same subscription. The
consumers will be lexically sorted by the consumer's name and the first
consumer will initially be the only one receiving messages. This consumer is
called the master consumer"
How are master distributed over the partitions ?
Is there a documentation about it?
----
2019-04-10 07:02:41 UTC - Matteo Merli: Uhm.. apparently the javadocs
publishing on website got stuck some time back.

Take a look at:
<https://github.com/apache/pulsar/blob/c79fd728cf27417ca117ca220dd07dc4319d4c46/pulsar-client-api/src/main/java/org/apache/pulsar/client/api/SubscriptionType.java#L46>
----
2019-04-10 07:04:11 UTC - Matteo Merli: &gt; How are master distributed over
the partitions ?

Brokers will pick active consumers such that partitions will be evenly
distributed among available consumers
----
2019-04-10 07:15:16 UTC - Olivier Chicha: Great this was really a critical
point for us.
FYI the doc I was refering to is :
<https://pulsar.apache.org/docs/en/concepts-messaging/#failover>
Is there a way for a consumer to know :
- on which partition he is the master ?
- that he has become a master or a slave for a partition, or that simply the
"distribution" on a topic has changed?
----
2019-04-10 07:16:51 UTC - Matteo Merli: yes, take a look at
<https://pulsar.apache.org/api/client/org/apache/pulsar/client/api/ConsumerBuilder.html#consumerEventListener-org.apache.pulsar.client.api.ConsumerEventListener->
----
2019-04-10 07:19:31 UTC - Matteo Merli: &gt; FYI the doc I was refering to is :
<https://pulsar.apache.org/docs/en/concepts-messaging/#failover>

Yes, we really need to clarify this in the docs
----
2019-04-10 07:32:12 UTC - Olivier Chicha: Ok, so the ConsumerEventListener will
allow me to be notified of the changes : This is great and that should be
enough for us
As far as I understand, there is no way to get directly the list of the
partitions on which my consumer is master for a given topic, is that correct?
----
2019-04-10 07:37:54 UTC - Olivier Chicha: the message from merlimat in the
thread answered my question thanks
----
2019-04-10 07:40:46 UTC - Matteo Merli: No, but when you create the consumer
you’ll get all the notifications
----
2019-04-10 07:40:58 UTC - Olivier Chicha: On a totally different subject:
Do you know if there is any plan / project to provide an Elixir / Erlang
implementaton of the pulsar client ?
----
2019-04-10 07:41:34 UTC - Matteo Merli: Also you can check the topic stats to
see which one is active/inactive
----
2019-04-10 07:42:41 UTC - Olivier Chicha: Great, thank you very much for all
your answers.
----
2019-04-10 08:07:22 UTC - Thor Sigurjonsson: I noticed this problem a few days
ago and mentioned it to @Matteo Merli. I started working on a fix but got side
tracked with my regular day job :slightly_smiling_face:
I should have a commit later this week.
----
2019-04-10 08:08:51 UTC - Sijie Guo: cool :+1:
----
2019-04-10 08:32:39 UTC - Kev Jackson: morning - so I'm moving towards building
a pulsar cluster after testing with a single instance
----
2019-04-10 08:33:26 UTC - Kev Jackson: and I'm starting with the docs that
suggest building the Zookeeper cluster first - reading the zookeeper docs
suggests something interesting
----
2019-04-10 08:34:23 UTC - Kev Jackson: "ZooKeeper runs in Java, release 1.8 or
greater (JDK 8 or greater, FreeBSD support requires openjdk8). It runs as an
ensemble of ZooKeeper servers. Three ZooKeeper servers is the minimum
recommended size for an ensemble, and we also recommend that they run on
separate machines. At Yahoo!, ZooKeeper is usually deployed on dedicated RHEL
boxes, with dual-core processors, 2GB of RAM, and 80GB IDE hard drives" - is
this still suggested sizing for zookeeper for a pulsar cluster
----
2019-04-10 08:45:03 UTC - Ivan Kelly: @Kev Jackson it depends on the load on
the cluster, but that should be enough for most cases
----
2019-04-10 08:47:18 UTC - Ali Ahmed: @Olivier Chicha there is current plans a
wrapper over c++ would be the way to go, it just depends on the community demand
----

Slack digest for #general - 2019-04-10

Reply via email to