2019-02-01 09:11:21 UTC - David Tinker: Tx a stack. I have posted this on
serverfault to create some Google history.
<https://serverfault.com/questions/951846/is-it-safe-to-remove-one-node-from-a-3-node-apache-pulsar-cluster-for-maintenanc>
+1 : jia zhai, Ali Ahmed, Karthik Ramasamy
----
2019-02-01 09:35:27 UTC - David Tinker: Hmm. I just rebooted on of the machines
(it came up again, phew!) and one of my consumer services got a bunch of these
in the logs:
2019-02-01 09:28:52.149 WARN [,,] 4396 --- [r-client-io-1-1]
org.apache.pulsar.client.impl.ClientCnx : [id: 0x9301b5d4, L:/10.0.0.18:34712
- R:10.0.0.1/10.0.0.1:6650] Received error from server:
org.apache.bookkeeper.mledger.ManagedLedgerException: Not enough non-faulty
bookies available
2019-02-01 09:28:52.149 WARN [,,] 4396 --- [r-client-io-1-1]
o.a.pulsar.client.impl.ConsumerImpl :
[<persistent://public/brand-mentions/b00092665>][gammon] Failed to subscribe to
topic on 10.0.0.1/10.0.0.1:6650
2019-02-01 09:28:52.149 WARN [,,] 4396 --- [r-client-io-1-1]
o.a.p.client.impl.ConnectionHandler :
[<persistent://public/brand-mentions/b00092665>] [gammon] Could not get
connection to broker: org.apache.bookkeeper.mledger.ManagedLedgerException: Not
enough non-faulty bookies available -- Will try again in 23.946 s
----
2019-02-01 09:36:12 UTC - David Tinker: I thought clients would still be able
to operate with one machine down?
----
2019-02-01 09:40:14 UTC - Sijie Guo: ```
managedLedgerDefaultEnsembleSize=3
managedLedgerDefaultWriteQuorum=3
managedLedgerDefaultAckQuorum=2
```
your ensemble size and write quorum are 3, and you only have 3 machines. so you
can’t tolerant any bookie going down.
----
2019-02-01 09:40:32 UTC - Sijie Guo: are you setting ensemble size and write
quorum to 3 intentionally?
----
2019-02-01 09:43:18 UTC - David Tinker: I probably just misunderstood. I
thought this would give me 3 copies of each message but still be able to lose
one machine from the cluster.
----
2019-02-01 09:47:54 UTC - Sijie Guo: @David Tinker -
when a machine is going down, pulsar broker (via bookkeeper client) will have
to do ensemble change. both ensemble change or creating a new segment (aka
ledger) requires you have at least ensemble-size bookies available in the
cluster. so technically you need at least ensembleSize machines available when
this operation happens.
----
2019-02-01 09:51:31 UTC - Sijie Guo: since you have zookeeper, bookkeeper and
broker in the same host, so when you reboot a machine, there are a couple of
things happening:
- zookeeper is fine, since there are still 2 zookeeper server alive
- the rebooted transfer ownerships to other brokers. during the ownership
transfer, old segment will have to be sealed and recovered and new segment will
be created. both operations require `ensembleSize` bookies alive.
----
2019-02-01 09:52:04 UTC - jia zhai: Oh, I missed that. the default quorum
(2,2,2) is fine for this situation.
----
2019-02-01 09:52:49 UTC - David Tinker: So I should just change to 2,2,2? We
can live with only 2 copies of each message.
----
2019-02-01 09:53:25 UTC - Sijie Guo: if you only have 3 machines, I would
recommend you switch back to 2,2,2
----
2019-02-01 09:56:02 UTC - David Tinker: Is it ok to change that on each node
and then restart each broker?
----
2019-02-01 09:56:24 UTC - Sijie Guo: @David Tinker yes
----
2019-02-01 10:13:21 UTC - David Tinker: Ok. Did that. It seems that this
automatically applies to namespaces with default configuration? I checked one
of mine: pulsar$ ./bin/pulsar-admin namespaces get-persistence
public/brand-mentions
{
"bookkeeperEnsemble" : 2,
"bookkeeperWriteQuorum" : 2,
"bookkeeperAckQuorum" : 2,
"managedLedgerMaxMarkDeleteRate" : 0.0
}
----
2019-02-01 10:13:39 UTC - Sijie Guo: correct
----
2019-02-01 10:13:55 UTC - David Tinker: So I don't need to wait for this to
take effect or anything?
----
2019-02-01 10:18:52 UTC - Sijie Guo: so it only affects the new semgnets. the
old segments will still be 3,3,2
----
2019-02-01 16:49:17 UTC - Steve Kim: @Steve Kim has joined the channel
----
2019-02-01 16:50:39 UTC - Steve Kim: Hello! I am getting started with Pulsar.
+1 : David Kjerrumgaard
----
2019-02-01 17:06:02 UTC - Steve Kim: I have a question that I have not found an
answer for in the documentation. I see that the c++ `MessageIdImpl` has ledger
id, entry id, partition, batch index. The Java `MessageIdImpl` has ledger id,
entry id, partition index.
The documentation explains what ledger id and entry id are. What are partition
and batch? Why are the C++ and Java message IDs slightly different?
----
2019-02-01 17:10:29 UTC - Matteo Merli: The Impl classes in both cases are
meant to be opaque and hidden behind the respective interfaces
:slightly_smiling_face:
In any case, in Java there’s also a `BatchMessageIdImpl` that inherit from
`MessageIdImpl` and add the batch id info. It’s just an impl difference between
C++ and Java.
Regarding to what they mean:
* Partition index: If the messages is being consumed from a partitioned topic,
it will have this index set
* Batch index : If the message is part of a batch, the (ledgerId, entryId)
will be actually referring to the batch and the `batchIndex` is the index of
the message within the batch
----
2019-02-01 17:13:27 UTC - Joe Francis: @Steve Kim Message-id is opaque to
applications. Are you implementing a Pulsar client?
----
2019-02-01 17:14:58 UTC - Steve Kim: I want to record message IDs in a database
----
2019-02-01 17:15:10 UTC - Steve Kim: It seems like I ought to use the provided
methods to serialize to bytes
----
2019-02-01 17:15:18 UTC - Steve Kim: Do the bytes order correctly?
----
2019-02-01 17:15:43 UTC - Matteo Merli: Yes: `toByteArray()` and
`fromByteArray()`
----
2019-02-01 17:15:49 UTC - Steve Kim: For example, if I sort by the serialized
bytes, do the match the order of message IDs?
----
2019-02-01 17:16:26 UTC - Matteo Merli: uhm, the serialized form is written in
protobuf
----
2019-02-01 17:16:31 UTC - Steve Kim: So no
----
2019-02-01 17:16:32 UTC - Matteo Merli: basically they’re varint
----
2019-02-01 17:16:41 UTC - Joe Francis: Use the comparators from the Pulsar
library
----
2019-02-01 17:17:08 UTC - Matteo Merli: I don’t think it’s safe to assume order
in serialized form
----
2019-02-01 17:17:22 UTC - Steve Kim: It's not
----
2019-02-01 17:17:50 UTC - Grant Wu: It’s not what?
----
2019-02-01 17:17:59 UTC - Matteo Merli: > I don’t think it’s safe to assume
order in serialized form (edited)
----
2019-02-01 17:18:16 UTC - Steve Kim: It's not safe to make assumptions about
protobuf serializarion
+1 : Grant Wu
----
2019-02-01 17:18:59 UTC - Steve Kim: I need to query messages by ranges. Even
if the interface does not expose the individual fields, can I reasonably expect
the message ID impl to be stable enough that I can treat it as a tuple of
integers for sorting purposes?
----
2019-02-01 17:19:46 UTC - Grant Wu:
<https://pulsar.apache.org/docs/en/concepts-clients/#reader-interface> does
this work for you?
----
2019-02-01 17:20:08 UTC - Grant Wu: not 100% sure what you mean
----
2019-02-01 17:20:09 UTC - Matteo Merli: Sure, it’s kind of stable enough..
though it’s not “guaranteed” that the impl class will not be changed in future
(it’s not likely, though not be excluded)
----
2019-02-01 17:20:10 UTC - Steve Kim: No. As I said before, I am storing message
IDs in a database.
----
2019-02-01 17:20:23 UTC - Steve Kim: I want to query them in a database
----
2019-02-01 17:21:12 UTC - Matteo Merli: Another option is to use the
`toString()` method and do lexicographical comparison
----
2019-02-01 17:21:59 UTC - Matteo Merli: eg: it would be like `(1:3:5:7)`
----
2019-02-01 17:22:09 UTC - Matteo Merli: (or commas, I don’t remember)
----
2019-02-01 17:22:49 UTC - Grant Wu: It might be worth evaluating doing this
differently
----
2019-02-01 17:23:03 UTC - Grant Wu: Like, storing the order in a separate manner
----
2019-02-01 17:23:05 UTC - Joe Francis: I think you should use a sequence-id
rather than message-id - @Matteo Merli will event time work?
----
2019-02-01 17:23:20 UTC - Grant Wu: Or that
----
2019-02-01 17:23:28 UTC - Steve Kim: How does a reader get sequence ID?
----
2019-02-01 17:24:34 UTC - Matteo Merli: Uhm, the sequenceId is tied to
particular producer and cannot be used to position a reader/consumer
----
2019-02-01 17:25:13 UTC - Steve Kim: Okay. So if I want to strictly order
messages that a reader sees, my only option is message ID?
----
2019-02-01 17:25:25 UTC - Matteo Merli: Yes
----
2019-02-01 17:25:48 UTC - Joe Francis: Not THE sequence ID, but a sequence id.
----
2019-02-01 17:27:12 UTC - Steve Kim: But... the sequence that I care about is
the sequence that is assigned by pulsar. I want to know the official order of
events as recorded by pulsar.
----
2019-02-01 17:29:51 UTC - Steve Kim: It seems like message ID is the right
thing. I just need to work around the fact that it doesn't have a built-in
order-preserving serialization that can be persisted in a database.
----
2019-02-01 17:31:21 UTC - Matteo Merli: That’s correct
----
2019-02-01 17:31:27 UTC - Steve Kim: Thanks for your help
----
2019-02-01 17:32:44 UTC - Steve Kim: How would you feel about a small
contribution to make the Python `MessageId` class expose its fields as
properties, the way that the Java and c++ impl classes do?
----
2019-02-01 17:33:45 UTC - Joe Francis: I think you should open a PR to provide
a serialization with order
----
2019-02-01 17:34:22 UTC - Steve Kim: Ohhh. Wouldn't that be risky for existing
users of serialization?
----
2019-02-01 17:34:39 UTC - Grant Wu: Could be a new function
----
2019-02-01 17:34:58 UTC - Joe Francis: A new ser method
----
2019-02-01 17:36:10 UTC - Matteo Merli: Sure, that might be a good option. We
just need a marker to understand which format it is when deserializing
----
2019-02-01 17:37:51 UTC - Matteo Merli: Regarding python, I think it should be
easy to expose these detail in the C++ wrapper object, in “unofficial” way
----
2019-02-01 17:38:11 UTC - Joe Francis: That is a can of worms
----
2019-02-01 17:38:24 UTC - Joe Francis: Or would become one sometime
----
2019-02-01 17:38:43 UTC - Matteo Merli: I mean.. it’s the same thing we have in
Java/C++
----
2019-02-01 17:39:11 UTC - Steve Kim: Yes, that is my argument. The other impls
already expose these fields
----
2019-02-01 17:39:28 UTC - Joe Francis: Yeah, but there are no guarantees about
them
----
2019-02-01 17:40:04 UTC - Joe Francis: It was a mistake. Msgid already has
changed so many times
----
2019-02-01 17:40:28 UTC - Matteo Merli: not in the past 4years
:slightly_smiling_face:
----
2019-02-01 17:40:35 UTC - Steve Kim: My worry about adding another
serialization is that existing serialized blobs do not have any marker to
indicate which serialization version they use. So someone who encounters a blob
that is labeled "this is a pulsar messageId" would have to know/guess which
deser method to use.
----
2019-02-01 17:40:55 UTC - Joe Francis: We should nip it in the bud before it
breeds
----
2019-02-01 17:41:29 UTC - Steve Kim: Anyway, I am the newbie. I will let more
experienced decide. Thanks for answering my questions and considering my
suggestion
----
2019-02-01 17:41:33 UTC - Matteo Merli: Marking the format of the serialized
form is easy. protobuf has its own marker when the object start
----
2019-02-01 17:42:02 UTC - Matteo Merli: 0x08 or similar. We just need to pick a
1 byte value that is different from that
----
2019-02-01 17:42:22 UTC - Steve Kim: Aha, and our new deser method can switch
after sniffing the first byte
----
2019-02-01 17:49:51 UTC - Matteo Merli: In any case it might be worthwhile to
have both forms of serialization, as the protobuf varints will be most likely
to be much more compact than any bytewise comparable form
----
2019-02-01 18:13:33 UTC - Emma Pollum: Does anyone have any tips or resources
on jvm memory tuning for bookkeeper? My pulsar cluster is racking up about 20G
in memory per bookie, with no sign of it going down. which seems excessive to
me.
----
2019-02-01 18:20:47 UTC - Matteo Merli: There are several tunables for that. In
BK 4.9 which will ship with Pulsar 2.3 we have been simplifying to have the
defaults to adapt to the JVM available mem
----
2019-02-01 18:22:05 UTC - Matteo Merli: For bookies: the main source of mem are
the cache regions:
```
# Size of Write Cache. Memory is allocated from JVM direct memory.
# Write cache is used to buffer entries before flushing into the entry log
# For good performance, it should be big enough to hold a sub
dbStorage_writeCacheMaxSizeMb=512
# Size of Read cache. Memory is allocated from JVM direct memory.
# This read cache is pre-filled doing read-ahead whenever a cache miss happens
dbStorage_readAheadCacheMaxSizeMb=256
```
----
2019-02-01 19:30:23 UTC - Emma Pollum: My bookies failed with a Java Heap OOM
error in their log but did not quit or restart the service. How can i get it to
fail fast and loud when it ooms?
----
2019-02-01 19:36:49 UTC - Matteo Merli: Pass `-XX:+ExitOnOutOfMemoryError` to
JVM
----
2019-02-01 19:38:29 UTC - Emma Pollum: Thank you!
----
2019-02-01 23:30:09 UTC - Grant Wu: @Jerry Peng
```
Creating pulsar function adminNotifications
Invalid FunctionDetails
Reason: Invalid FunctionDetails
```
getting this from a `pulsar-admin functions create`
----
2019-02-01 23:30:19 UTC - Grant Wu: Do you have any idea what sort of thing
might trigger this?
----
2019-02-01 23:35:55 UTC - Sanjeev Kulkarni: what’s your pulsar-admin command?
----
2019-02-01 23:37:18 UTC - Grant Wu: It’s part of a script
----
2019-02-01 23:37:23 UTC - Grant Wu: It looks like this:
```
if ! $PULSAR_DIR/bin/pulsar-admin functions $cmd --functionConfigFile
"$(realpath function-config.yaml)"; then
echo "Failed to put $pf_name"
exit 3
fi
```
----
2019-02-01 23:37:59 UTC - Grant Wu: er so the first line there - `Creating
pulsar function adminNotifications` - is part of my script
----
2019-02-01 23:38:10 UTC - Grant Wu: So I know that `cmd='create'`
----
2019-02-01 23:38:52 UTC - Grant Wu: And this script seems to work most of the
time…
----
2019-02-01 23:39:01 UTC - Grant Wu: It is strange and curious that it has
failed this time
----
2019-02-01 23:43:51 UTC - Sanjeev Kulkarni: do you have the function-config-file
----
2019-02-01 23:43:51 UTC - Grant Wu: I can provide a sample of my
`function-config.yaml` too
----
2019-02-01 23:43:53 UTC - Jerry Peng: @Grant Wu so this doesn’t happen every
time? how often have you observed this? The error seems to indicate there is
something wrong/corrupted in your function-config.yaml file
----
2019-02-01 23:44:00 UTC - Grant Wu: ```
tenant: "PULSAR_TENANT"
namespace: "K8S_NAMESPACE"
inputs:
- "<persistent://PULSAR_TENANT/K8S_NAMESPACE/public-adminNotifications>"
name: "adminNotifications"
py: "/pulsar_functions/adminNotifications/adminNotifications.py"
className: "adminNotifications.AdminNotifications"
autoAck: true
parallelism: 1
processing-guarantees: AT_LEAST_ONCE
```
----
2019-02-01 23:44:12 UTC - Grant Wu: we use sed to fix PULSAR_TENANT and
K8S_NAMESPACE
----
2019-02-01 23:44:38 UTC - Grant Wu: hrm. maybe the user ran the script in an
environment where the env vars needed to replace those properly were not set
----
2019-02-01 23:45:31 UTC - Grant Wu: If those ended up being replaced with empty
strings, would that cause the error?
----
2019-02-01 23:58:33 UTC - Jerry Peng: yes
----