2019-12-17 09:17:37 UTC - Martin Kunev: Hi,
I am running pulsar 2.3.2 in stand alone mode. I have the following issue:
I subscribe to a topic with pulsar-client on region2 (shared subscription).
Then I use the pulsar-client to publish on region1. Both regions have the
replication clusters correctly set, but I never receive the message on region2.
What can cause this problem?
----
2019-12-17 10:23:23 UTC - rmb: @rmb has joined the channel
----
2019-12-17 10:39:28 UTC - Fernando: Hi guys, I’d appreciate any feedback on a
problem we’re having with the current implementation of the redelivery count.
Here’s the issue <https://github.com/apache/pulsar/issues/5881>
----
2019-12-17 10:44:37 UTC - rmb: Hi, I'm trying to get pulsar standalone running,
and I'm having trouble with some basic python scripts. I downloaded
pulsar-2.4.1 and unpacked the tarball; `bin/pulsar standalone` seems to start
fine, but when I try to use the python libraries to produce or consume
messages, I get errors like
```ERROR ClientImpl:182 | Error Checking/Getting Partition Metadata while
creating producer on <persistent://public/default/my-topic> -- 5```
(and sometimes pulsar then shuts down). the bin/pulsar-client script isn't any
better; I get a long stream of java exceptions and then
```INFO org.apache.pulsar.client.cli.PulsarClientTool - 0 messages
successfully produced```
(or consumed, respectively). any suggestions?
fwiw, I can produce and consume messages using the docker image (but I'm having
other issues there, so I thought I'd give `bin/pulsar standalone` a try)
----
2019-12-17 10:47:41 UTC - tihomir: @rmb check if public/default namespace is
created
----
2019-12-17 10:48:08 UTC - rmb: I thought that was automatically created on
startup?
----
2019-12-17 10:50:04 UTC - tihomir: yes but it takes time for that. My guess is
that you are executing your call too early and the namespace does not exist yet
----
2019-12-17 10:51:01 UTC - rmb: I don't think so, I've been waiting substantial
amounts of time
----
2019-12-17 10:52:12 UTC - rmb: anyway, pulsar just exited before I had a chance
to run any admin commands
----
2019-12-17 10:55:48 UTC - rmb: ```$ bin/pulsar-admin tenants list
null
Reason: javax.ws.rs.ProcessingException: Connection refused:
localhost/127.0.0.1:8080```
----
2019-12-17 11:05:18 UTC - rmb: anyway, since the docker image is running and
letting me produce and consume, I'll post the questions I have about that
----
2019-12-17 11:26:26 UTC - rmb: the command I'm running is `docker run -it -p
6650:6650 -p 8080:8080 --mount source=pulsardata,target=/pulsar/data --mount
source=pulsarconf,target=/pulsar/conf apachepulsar/pulsar:2.4.1 bin/pulsar
standalone`
...ok, I'm having trouble replicating the weird deduplication behavior I was
seeing, so here's a different question, about ordering. I had thought that for
an un-partitioned topic, message delivery ordering was guaranteed per-producer
(for messages sent without a key). so I'm using the node client to send some
messages:
``` 12 » const producer = await client.createProducer({
14 » » topic: `${topic}`,
15 » » producerName: 'my-producer',
16 » » messageRoutingMode: 'UseSinglePartition',
17 » » sendTimeoutMs: -1,
18 » });
22
23 » // Send messages
24 » for (let i = 0; i < 10; i += 1) {
25 » » const msg = `my-message-${i}`;
26 » » const message = {
27 » » » data: Buffer.from(msg),
28 » » }
29 » » producer.send(message);
31 » }```
but I'm receiving them in a different order:
```Received message: 'b'my-message-0''
Received message: 'b'my-message-1''
Received message: 'b'my-message-3''
Received message: 'b'my-message-2''
Received message: 'b'my-message-4''
Received message: 'b'my-message-5''
Received message: 'b'my-message-6''
Received message: 'b'my-message-7''
Received message: 'b'my-message-8''
Received message: 'b'my-message-9''```
----
2019-12-17 13:37:00 UTC - rmb: some further questions about deduplication
(based on
<https://github.com/apache/pulsar/wiki/PIP-6:-Guaranteed-Message-Deduplication>):
• if a producer is setting sequenceId on a per-message basis, are there any
constraints imposed by the broker? for example, does the sequence have to be
monotonic? the doc says that the broker keeps track of the highest sequenceId
received from a particular producer; does that mean that if a producer sent
messages 1, 2, 4, 3, the last one would be rejected because its sequenceId is
less than 4? how would the broker distinguish that situation from the messages
simply arriving out-of-order?
• if a producers uses a custom sequence of sequenceIds, with 'holes', what does
that mean for acknowledgements? is the consumer's view of the sequence of
messages entirely separate from the producer's sequenceIds?
----
2019-12-17 15:00:34 UTC - Greg Hoover: Is this a known issue or is this unique
to me?
----
2019-12-17 15:05:26 UTC - Greg Hoover: I should have searched the slack channel
for grafana. There are a couple of things mentioned I have not tried yet. Will
report back.
----
2019-12-17 18:43:50 UTC - Daniel Ferreira Jorge: @rmb Let me try to go point by
point:
• The sequenceId does not need to be monotonic, just increasing.
• If you send messages 1,2,4,3 the message 3 will be discarded.
• The deduplication is a simple mechanism that maps a *producer name and a
number.* You should not have 2 producers with the same name if you are using
deduplication.
• The consumer has absolutely nothing to do with sequence numbers of producers.
When a broker receives a message from producer named "X", it will check in the
map what is the last sequenceId of the producer X. If the message being
produced now by producer X has an equal or greater sequence number, the message
will be discarded. The deduplication mechanism has absolutely no other
implications.
----
2019-12-17 18:55:39 UTC - rmb: thanks, @Daniel Ferreira Jorge. what if a
producer sent messages 1, 2, 3, 4, but due to network issues they arrived at
the broker in the order 1, 2, 4, 3? would the broker assume message 3 was a
duplicate and discard it?
----
2019-12-17 18:58:39 UTC - Daniel Ferreira Jorge: I do not believe that this can
happen in this context because for the producer to send message 4 the broker
must have sent an ack of message 3... maybe I'm wrong and someone more
knowledgeable, like @Sijie Guo or @Matteo Merli can give you a certainty about
that...
----
2019-12-17 18:58:59 UTC - Joe Francis: ^^, you are right
slightly_smiling_face : Daniel Ferreira Jorge
----
2019-12-17 18:59:36 UTC - rmb: really? the diagrams at
<https://pulsar.apache.org/docs/en/develop-binary-protocol/> seem to suggest
otherwise
----
2019-12-17 18:59:46 UTC - rmb: and what about batch mode?
----
2019-12-17 19:00:32 UTC - Joe Francis: Messages are published in the order they
are queued by the producer..
----
2019-12-17 19:02:39 UTC - rmb: and is that queue order determined by the
sequenceIds?
----
2019-12-17 19:06:42 UTC - Joe Francis: No. Its the order of send() invoked by
the Producer.
----
2019-12-17 19:14:39 UTC - rmb: ok, so the diagrams are misleading and every
send() has to be ack'ed before the next one is called. what about
deduplication in batch mode? does the broker update the highest sequenceId for
a given producer after it reads the entire batch of messages, or as it reads it?
----
2019-12-17 19:15:21 UTC - Joe Francis: The diagrams are correct
----
2019-12-17 19:16:44 UTC - rmb: the diagram under "producers" shows send() being
called twice before sendReceipt is called
----
2019-12-17 19:19:26 UTC - Joe Francis: ??
----
2019-12-17 19:19:44 UTC - rmb:
<https://pulsar.apache.org/docs/en/develop-binary-protocol/#producer>
----
2019-12-17 19:21:16 UTC - Joe Francis: That is a sequence diagram. It shows
messages are acked in the order they are published. Nothing else is implied
----
2019-12-17 19:27:26 UTC - rmb: it would be considerably clearer if the second
send() and the first sendReceipt() were interchanged, but ok
----
2019-12-17 19:27:49 UTC - rmb: any idea why I was receiving messages
out-of-order above?
----
2019-12-17 19:28:42 UTC - Jason Fisher: it might be an async logging artifact
----
2019-12-17 19:30:08 UTC - Joe Francis: Actually, its meant to imply the
opposite (and contrary to what you understood). There is no
send1-ack1-send2-ack2 waits. Its send1-send2, and ack1, ack2 is independent.
The diagram actaully shows pipelining, and shows exactly what it's meant to do
----
2019-12-17 19:30:18 UTC - Nick Nezis: @Nick Nezis has joined the channel
----
2019-12-17 19:31:29 UTC - rmb: ok. so then my question remains: how is the
broker determining order in the producer queue? your initial answer was that
the producer was waiting for each message to be ack-ed
----
2019-12-17 19:32:18 UTC - Joe Francis: No, my answer was that it was the order
in which you invoke send,or (sendasync).
----
2019-12-17 19:34:02 UTC - Joe Francis: If you use send(), of course it will
block for the ack. That'a an artifact of using a blocking API. But that's not
required for dedup You can use sendAsync.
----
2019-12-17 19:35:55 UTC - Joshua Dunham: Hi Everyone, I'm getting `"python
pulsar-producer.py" terminated by signal SIGILL (Illegal instruction)"` with
the 2.4.2 client. Anyone else see this?
----
2019-12-17 19:36:18 UTC - Joshua Dunham: I have a MWE that worked with 2.4.1p1
and no longer in 2.4.2
----
2019-12-17 19:36:50 UTC - Joe Francis: As for what you observed, I dont have
enough details. But messages will be published in the same order they are send.
Be aware that if you send 1,2,4,3 seq-id in that order, Pulsar will attempt to
publish in that order, and the broker will reject it if dedup is on. In other
words, Pulsar will not sort disordered seq-id
----
2019-12-17 19:37:23 UTC - Joshua Dunham: Once I upgraded to 2.4.2 the pulsar
python module complained about also wanting [email protected] which I had to install
from the brew retired repository.
----
2019-12-17 19:38:38 UTC - rmb: thanks. but how does the broker distinguish 1,
2, 4, 3 from 1, 2, 3, 4 arriving out of order?
----
2019-12-17 19:40:34 UTC - Joe Francis: Whatever order e you invoke send on the
client, is the order the broker will see
----
2019-12-17 19:41:46 UTC - rmb: yes, you've said, but I've been asking you how
the broker determines that order. is it looking at a timestamp?
----
2019-12-17 19:43:14 UTC - Joe Francis: Perhaps you are leaving something out
that you know, but I dont.. this is a very simple concept. If you do
send(1)/send(2), send(3) in that order Or (sendasync()) , the broker will see
1, 2, 3. There is no timpestamps its the oder in which you send
----
2019-12-17 19:44:53 UTC - rmb: producers are talking to brokers over a network,
which means that if there's a network issue, messages can arrive in a different
order than they were sent. I assume there must be some way for brokers to deal
with that
----
2019-12-17 19:45:03 UTC - Joshua Dunham: Hi @rmb: I can possibly help. For each
client cxn pulsar only acks when it's reached a write quorum. If it acks, the
record has been noted. If you are connecting async then one thread could write
ahead of another. If you write bulk the ack is on the bulk payload and the
client needs to understand if there is order in the individual records.
----
2019-12-17 19:47:21 UTC - Joe Francis: Batching and quorum are entirely
orthogonal to message ordering
----
2019-12-17 19:47:29 UTC - jmogden: Hello, I'm trying to understand mutli-topic
subscription using regex patterns. From what I've found, when a consumer
subscribes to topics in a namespace using a regex expression it will subscribe
to everything that matches the regex. I noticed that if a new topic is made
that the consumer wasn't initially subscribed to, then it won't actually
subscribe to and consume from the new topic; even if it would match the regex.
Is there a way to have the consumer subscribe to the new topic with being
closed and re-made?
----
2019-12-17 19:49:31 UTC - Joe Francis: Ordering is determined at the client,
based on the order in which you invoke send()/sendasync. Batching is a
transport and i/O optimization, that is entirely immaterial to odering. Quuorum
is not visible to client.
----
2019-12-17 19:50:01 UTC - Joe Francis: Fo eg:
<https://github.com/streamlio/pulsar-java-tutorial/blob/master/src/main/java/tutorial/async/AsyncProducerTutorial.java>
----
2019-12-17 19:51:22 UTC - Joe Francis: Messages will be publsihed in the order
that sends are invoked in the loop, so they will be published in the loop
counter order
----
2019-12-17 19:53:59 UTC - rmb: thanks, @Joshua Dunham, I'm trying to understand
how that interacts with the deduplication feature. if a producer is connecting
asynchronously and one thread writes ahead of another (so that the producers
sendAsync's 1, 2, 3, 4 but the broker sees messages in the order 1, 2, 4, 3),
will the broker delete messages?
----
2019-12-17 19:54:37 UTC - Joe Francis: Ha - so you are using mult-threading in
the Producer?
----
2019-12-17 19:54:53 UTC - rmb: no, I'm just worried about an unreliable network
----
2019-12-17 19:54:57 UTC - rmb: packets can arrive out of order
----
2019-12-17 19:55:05 UTC - Joe Francis: That's not something you have to worry
about
----
2019-12-17 19:55:10 UTC - rmb: I assume there must be some way to deal with this
----
2019-12-17 19:55:12 UTC - rmb: why not?
----
2019-12-17 19:55:52 UTC - Joe Francis: Because Pulsar guarantees that the order
in which you send is the order in which it gets published.
----
2019-12-17 19:56:01 UTC - Jason Fisher: Switch to sync and not async on the
consumer
----
2019-12-17 19:56:18 UTC - Joshua Dunham: @rmb Definitely create a app scoped
timestamp or id in this case. You cannot use any queue as ordered if you are
writing fifo.
----
2019-12-17 19:56:22 UTC - Jason Fisher: You can’t judge the logging output to
be the actual order things arrive
----
2019-12-17 19:56:34 UTC - Jason Fisher: Add a received timestamp to the log
----
2019-12-17 19:56:59 UTC - rmb: how does pulsar guarantee that the order in
which you send is the order in which it gets published?
----
2019-12-17 19:57:02 UTC - Jason Fisher: Your console output is not async safe
----
2019-12-17 19:57:11 UTC - Jason Fisher: In terms of keeping things in order
----
2019-12-17 19:57:46 UTC - Joshua Dunham: @rmb, not the order you send, the
order that pulsar acks.
----
2019-12-17 19:58:23 UTC - Roman Popenov: The only order you can guarantee is
within Pulsar cluster
----
2019-12-17 19:58:49 UTC - Joshua Dunham: If you send synchronously you wait for
each ack. If async then some other threads can get ackd depending on how slow
the backend is to achieve write quorum etc.
----
2019-12-17 19:59:22 UTC - rmb: ok, great. so if the broker has deduplication
enabled and sees 1, 2, 4, 3, it will assume that 3 is a duplicate and delete
it, even if the producers sent it async before 4?
----
2019-12-17 19:59:43 UTC - Roman Popenov: `Java client components are
thread-safe: a consumer can acknowledge messages from different threads.`
----
2019-12-17 19:59:50 UTC - Roman Popenov: But there is no such guarantees with
producers
----
2019-12-17 20:00:20 UTC - Joshua Dunham: It doesn't see 1,2,4,3, it sees
1,2,3,4 and your app sees the contents out of order.
----
2019-12-17 20:00:50 UTC - rmb: how do you guarantee that it sees 1,2,3,4?
----
2019-12-17 20:01:22 UTC - Joshua Dunham: I mean, pulsar orders linearly what it
sees.
----
2019-12-17 20:01:53 UTC - Roman Popenov: Well, I would assume if you have one
producer that produces to one topic, it will be in order
----
2019-12-17 20:01:54 UTC - Joshua Dunham: 1,2,3,4 being pulsar derived IDs.
----
2019-12-17 20:02:51 UTC - rmb: this thread of conversation started with me
trying to understand custom sequenceIds and deduplication
----
2019-12-17 20:02:52 UTC - Joshua Dunham: Like if you have two small python
loops filling in a spreadsheet with contents. The index of the spreadsheet is
always 1->N in order. but the contents have no guarantee to make sense to
the app.
----
2019-12-17 20:05:05 UTC - Joe Francis: @rnb There is a Q on the client, which
is where ordering is imposed. The only way to ensure this order is to invoke
send()/aysnc() in the order you desire. (which you cannot ensure if you use a
multi-threaded Producer and invoke send from multiple threads) . This Q is what
gets transported to the client. Batching/compression etc are tranpsort
mechanisms, which only affect how the Q is moved, and not the order.
Underneath, TCP is used, which guarantees network order on a given connection.
Ordering is enforced, in that if the transfer fails on a msg in the middle of
the Q, everything after msg will also be failed.
----
2019-12-17 20:08:12 UTC - Joe Francis: Receipts/acks will also come in similar
order. Everything from a producer will be acked in the order in the Q. All
these acks will be delivered into a consumer Q in the client. If you read it
out one by one, you will get the publish order. (If you read it with
multi-threaded consumer, you will lose ack ordering)
----
2019-12-17 20:08:53 UTC - Roman Popenov: Although consumers ARE thread safe
----
2019-12-17 20:10:17 UTC - Joe Francis: Thread safe =/= ordering. It means that
they can dequeue without stepping on each other. It does not gaurantee they
execute in the same order they dequeued.
----
2019-12-17 20:10:21 UTC - Roman Popenov: It isn’t
----
2019-12-17 20:10:46 UTC - Roman Popenov: The other question I have, how to
handle a chunked message
----
2019-12-17 20:11:08 UTC - rmb: ok, thanks. why does one message getting lost
mean that all subsequent messages will get lost?
----
2019-12-17 20:11:17 UTC - Roman Popenov: Is it possible to know that chunks are
part of the same message and skip them until next message with multiple
consumers?
----
2019-12-17 20:15:32 UTC - Joe Francis: That's the guarantee given by Pulsar.
If you publish 1,2,3,4.5,6,7,8,9.10 and for some error on the server side,
Pulsar could not store 5, then the integrity of the Q order is lost. Pulsar
will not ack the rest . It will ack 1..4, and then fail 5-10.
----
2019-12-17 20:18:30 UTC - rmb: ok, but you're specifically allowed to send
messages with a sparse sequence of sequenceIds. if a broker receives
1,2,3,4,6,7,8,9,10, how does it distinguish the producer sending that sequence
from the producer sending 1,2,3,4,5,6,7,8,9,10 and 5 getting lost?
----
2019-12-17 20:27:44 UTC - rmb: Thanks for the answers to my questions! I'm
afraid it's dinner time for me and I have to drop offline
----
2019-12-17 20:37:01 UTC - Joe Francis: This has nothing to do with seq-id. The
numbers i used indicate message order, not seq-id
----
2019-12-17 20:46:53 UTC - Joshua Dunham: Anyone see issues with the pulsar
python client?
----
2019-12-17 20:47:25 UTC - Joshua Dunham: I'm getting a hard error (think it's
in the C components)
----
2019-12-17 20:47:27 UTC - Joshua Dunham: "python pulsar-producer.py" terminated
by signal SIGILL (Illegal instruction)" with the 2.4.2 client. Anyone else see
this?
----
2019-12-17 21:00:54 UTC - ec: Is it safe to connect and use the Bookkeeper that
Pulsar uses, for you know possibilities?
----
2019-12-17 21:04:21 UTC - tihomir: guys we are using pulsar 2.3.2 and we have
the following strange problem
I subscribe to a topic with pulsar-client on region2 (shared subscription).
Then I use the pulsar-client to publish on region1. Both regions have the
replication clusters correctly set, but I never receive the message on region2.
----
2019-12-17 21:16:59 UTC - Addison Higham: @tihomir struggling to remember the
call at the moment, but there is a status call for replication that will let
you know what is happening, also, the broker logs are pretty useful for
debugging replication
----
2019-12-17 22:15:57 UTC - Roman Popenov: So I ran `kubectl apply -f
zookeeper.yaml` and my kubectl config is pointing to an EKS cluster in AWS
----
2019-12-17 22:16:16 UTC - Roman Popenov: It doesn’t seem to be in the default
namespace
----
2019-12-17 22:16:22 UTC - Roman Popenov: Is that to be expected?
----
2019-12-17 22:53:41 UTC - Greg Hoover: Got it working. I had not setup
Prometheus. It was included with some other containers I was using previously
so overlooked that part. Downloaded a Prometheus container and configured it
with the other containers and now it is working fine. Interestingly, the
grafana dashboards in the streamnative and apache containers are different. I
like different aspects of both, so I’ll prob use them both for a while.
----
2019-12-17 23:33:23 UTC - Greg Hoover: Looks like the streamnative one may be a
superset of the Apache. So will use the streamnative one for now.
----
2019-12-18 02:41:26 UTC - LaxChan: pulsar geo-replication must be use same zk
cluster?
----
2019-12-18 03:29:18 UTC - jia zhai: It is not necessary
----
2019-12-18 03:29:33 UTC - jia zhai: <https://gist.github.com/jiazhai>
----
2019-12-18 03:30:06 UTC - jia zhai: Here contain 2 example. 1 use globalzk,
another not using global zk
----
2019-12-18 06:11:49 UTC - LaxChan: :+1:
----
2019-12-18 09:06:14 UTC - Jasper Li: Hello all, I want to ask a question of
Pulsar SQL. Does Pulsar SQL actually consume data in a subscription or have it
just scanned the data from storage directly?
----