Slack digest for #general - 2018-10-19

Apache Pulsar Slack Fri, 19 Oct 2018 02:11:48 -0700

2018-10-18 09:27:50 UTC - Wenfeng Wang: @Wenfeng Wang has joined the channel
----
2018-10-18 10:31:55 UTC - Nicolas Ha: I am trying to understand the 
recommendation you made last time to use a Daemonset instead of a StatefulSet 
(previous msg 
<https://apache-pulsar.slack.com/archives/C5Z4T36F7/p1538779207000100> )

My understanding is that in the pulsar helm chart:
- currently BK and ZK use a StatefulSet (and a volumeClaim) - they do not care
where the storage is provided, but the kubernetes cluster has to have a way to
provide storage (not sure where this bit is specified?)
- DaemonSet would make ZK and BK pods “stick” to one physical node, and use the
local storage there (so would work on bare metal clusters too, the operator
would have to keep number of machines &gt;= number of ZK/BK DaemonSet replicas)

Also I see Volume claims, but I was expecting `PersistentVolumeClaim` in these:
-
<https://github.com/apache/pulsar/blob/master/deployment/kubernetes/helm/pulsar/templates/bookkeeper-statefulset.yaml>
-
<https://github.com/apache/pulsar/blob/master/deployment/kubernetes/helm/pulsar/templates/zookeeper-statefulset.yaml>

Or even something like what is described there:
<https://kubernetes.io/blog/2018/04/13/local-persistent-volumes-beta/> and
<https://kubernetes.io/docs/concepts/storage/volumes/#local> - it looks like
this should allow not changing the statefulset and still use local volumes?

Am I missing something? Was there a specific reason to use a DaemonSet in
"kubernetes/generic" and Statefulset in "helm"?
----
2018-10-18 12:54:31 UTC - Sijie Guo: &gt; it looks like this should allow not
changing the statefulset and still use local volumes?

yes localvolumes should be a better solution than a daemonset.

&gt; Was there a specific reason to use a DaemonSet in “kubernetes/generic” and
Statefulset in “helm”?

kubernetes/generic was added before k8s introduces local volumes. so we use
daemonset there. but it probably can be changed to be using stateful set and
local volumes.

“helm” was added for deploying in cloud environment. hence statefulset with
persistent volumes is more resonable there.
----
2018-10-18 13:24:20 UTC - Martin Svensson: yeah, that’s right
----
2018-10-18 13:48:47 UTC - Nicolas Ha: ok that makes sense.
So if I understand correctly, there is nothing wrong with having a "generic"
that would use local volumes, and even a "helm" version of the "generic"
deployment?
----
2018-10-18 13:54:40 UTC - Sijie Guo: YES correct
----
2018-10-18 13:56:05 UTC - Nicolas Ha: thanks a lot - very helpful answers as
usual :smile:
----
2018-10-18 16:44:10 UTC - Jerry Moore: @Jerry Moore has joined the channel
----
2018-10-18 18:33:57 UTC - Zuyu Zhang: Hi guys, I have a simple q regarding the
multi-topic consumer in C++ client. When the consumer receives a message, how
to know which topic it belongs to?
----
2018-10-18 18:40:25 UTC - Jerry Peng: MessageId::getTopicName()
----
2018-10-18 18:40:34 UTC - Jerry Peng: ```
/**
* Get the topic Name
*/
const std::string&amp; getTopicName() const;
```
clap : Rodrigo Malacarne
----
2018-10-18 18:40:52 UTC - Zuyu Zhang: ok. Thanks!
----
2018-10-18 18:42:00 UTC - Jerry Peng: FYI This is a feature not in an official
release yet. You will have to compile from master
clap : Rodrigo Malacarne
----
2018-10-18 18:53:10 UTC - Zuyu Zhang: Yes, I did.
----
2018-10-18 22:03:51 UTC - Ryan Samo: Hey guys, I’m trying to load test Pulsar
but keep receiving “Connection reset by peer messages”. It only happens when I
start multiple producers and consumers on a topic. If the topic is not
partitioned then it does not seem to happen as often. I have gone through the
broker.conf and set many of the limits to 0 so that I can push the boundaries
of the hardware. Any ideas on what my by throwing out all of my connections?
All of them drop at once, consumers and producers and then immediately
reconnect.

Thanks!
----
2018-10-18 22:05:34 UTC - Ali Ahmed: @Ryan Samo what’s your setup and how much
traffic are you pushing through
----
2018-10-18 22:09:14 UTC - Ryan Samo: 3 brokers, 3 zookeepers each on 40 core
servers with plenty of RAM. If I do a topic that’s not partitioned, I can spin
up 25 producers with Pulsar-perf and generate around 1 million msg/s. The
consumer side lags in that case so I decided to partition and that’s when the
connection issues popped up. I want to sustain 1 million msg/s without lag if I
can get there.
----
2018-10-18 22:09:55 UTC - Ali Ahmed: how many bookies ?
----
2018-10-18 22:13:02 UTC - Ryan Samo: 3, they are shared on the same servers as
the brokers. 3 brokers, 3 bookies, 3 zookeepers
----
2018-10-18 22:13:26 UTC - Ryan Samo: Will reply back soon, need to drive :)
----
2018-10-18 23:23:34 UTC - Rodrigo Malacarne: @Ryan Samo, were you able to start
a cluster with with bookies? Are you using v2.1.1-incubating?
----
2018-10-18 23:31:53 UTC - Ryan Samo: Yes I was able to get it all running.
V2.1.1-incubating yup. Yeah so the cluster works fine until I try to have the
multiple connections and then it just keeps dropping them with timeouts. The
servers themselves are hardly being taxed with the load so I thought maybe a
setting needs adjustment or the JVM is running into issues maybe.
----
2018-10-18 23:32:30 UTC - Matteo Merli: @Ryan Samo Are you reaching any
bandwidth limit ?
----
2018-10-18 23:33:32 UTC - Matteo Merli: There certainly some setting that can
be used when testing for high-throughput. The defaults are conservative, meant
to work out of the box in all scenarios
----
2018-10-18 23:33:44 UTC - Ryan Samo: Well I was able to do 25 producers and a
couple consumers with a non-partitioned table so I don’t think so.
----
2018-10-18 23:34:29 UTC - Ryan Samo: If you have any recommended settings for
beating the crap out of the cluster I’m all for it. Need to prove out how far
we can go
----
2018-10-18 23:34:58 UTC - Matteo Merli: One of the reasons for “connection
reset by peer” is that one end appears to be unresponsive, so the broker cuts
the TCP connection off, after 30-60 sec
----
2018-10-18 23:36:16 UTC - Ryan Samo: Hmm yeah I saw a setting for that in the
broker config. I might try the Prometheus and grafana monitoring too to see if
I see anything network wise
----
2018-10-18 23:36:31 UTC - Ryan Samo: CPU, RAM, etc all good
----
2018-10-18 23:36:58 UTC - Matteo Merli: For settings, take a look at
<https://github.com/openmessaging/openmessaging-benchmark/blob/master/driver-pulsar/deploy/templates>
----
2018-10-18 23:37:21 UTC - Ryan Samo: I am using Pulsar-perf produce and consume
to perform the tests. Cool thanks, I’ll check that out!
----
2018-10-18 23:37:56 UTC - Matteo Merli: Are the machines with 10Gbsp NICs ?
----
2018-10-18 23:40:41 UTC - Ryan Samo: Yes they sure are. In the same rack too
actually
----
2018-10-18 23:41:39 UTC - Matteo Merli: Ok, then, apart from metrics, take a
look at broker logs for any additional clue
----
2018-10-18 23:43:04 UTC - Ryan Samo: Ok, I’ll keep searching. It feels like a
broker.config issue because when I started the initial testing it was
conservative like you said. I tweaked a few values to unlimited and that helped
a ton. Thanks for all the support!
+1 : Matteo Merli
----
2018-10-19 07:15:43 UTC - Matti-Pekka Laaksonen: Hi! I tried running my cluster
for the first time with production-level workloads, and I ended up running out
of memory with the brokers. I have a few questions regarding this:
1. Is the sum of maxDirectMemory and maxHeapMemory the maximum memory JVM can
use? Or is the heap memory a subset of the direct memory?
2. What is the recommended total memory (I assume the sum of the two memory
sizes) for brokers?
3. What is the recommended share of instance's total memory used by Pulsar? The
sample deployment scripts in the Git repo show that c5.2xlarge instances are
used for the brokers. The instance has 16 gb of memory, but the direct and heap
memory are both 12gb, which sum up to 24 gb, so more than the available memory
----
2018-10-19 07:27:38 UTC - Matti-Pekka Laaksonen: Oh, and I could swear that
I've seen a guide to upgrading a running Pulsar cluster, but I can't seem to
find it anymore. Anyone know about this?
----

Slack digest for #general - 2018-10-19

Reply via email to