2019-12-17 09:17:37 UTC - Martin Kunev: Hi,
I am running pulsar 2.3.2 in stand alone mode. I have the following issue:
I subscribe to a topic with pulsar-client on region2 (shared subscription). 
Then I use the pulsar-client to publish on region1. Both regions have the 
replication clusters correctly set, but I never receive the message on region2.
What can cause this problem?
----
2019-12-17 10:23:23 UTC - rmb: @rmb has joined the channel
----
2019-12-17 10:39:28 UTC - Fernando: Hi guys, I’d appreciate any feedback on a 
problem we’re having with the current implementation of the redelivery count. 
Here’s the issue <https://github.com/apache/pulsar/issues/5881>
----
2019-12-17 10:44:37 UTC - rmb: Hi, I'm trying to get pulsar standalone running, 
and I'm having trouble with some basic python scripts.  I downloaded 
pulsar-2.4.1 and unpacked the tarball; `bin/pulsar standalone` seems to start 
fine, but when I try to use the python libraries to produce or consume 
messages, I get errors like
```ERROR ClientImpl:182 | Error Checking/Getting Partition Metadata while 
creating producer on <persistent://public/default/my-topic> -- 5```
(and sometimes pulsar then shuts down).  the bin/pulsar-client script isn't any 
better; I get a long stream of java exceptions and then
```INFO  org.apache.pulsar.client.cli.PulsarClientTool - 0 messages 
successfully produced```
(or consumed, respectively).  any suggestions?

fwiw, I can produce and consume messages using the docker image (but I'm having 
other issues there, so I thought I'd give `bin/pulsar standalone` a try)
----
2019-12-17 10:47:41 UTC - tihomir: @rmb check if public/default namespace is 
created
----
2019-12-17 10:48:08 UTC - rmb: I thought that was automatically created on 
startup?
----
2019-12-17 10:50:04 UTC - tihomir: yes but it takes time for that. My guess is 
that you are executing your call too early and the namespace does not exist yet
----
2019-12-17 10:51:01 UTC - rmb: I don't think so, I've been waiting substantial 
amounts of time
----
2019-12-17 10:52:12 UTC - rmb: anyway, pulsar just exited before I had a chance 
to run any admin commands
----
2019-12-17 10:55:48 UTC - rmb: ```$ bin/pulsar-admin tenants list
null

Reason: javax.ws.rs.ProcessingException: Connection refused: 
localhost/127.0.0.1:8080```
----
2019-12-17 11:05:18 UTC - rmb: anyway, since the docker image is running and 
letting me produce and consume, I'll post the questions I have about that
----
2019-12-17 11:26:26 UTC - rmb: the command I'm running is `docker run -it -p 
6650:6650 -p 8080:8080  --mount source=pulsardata,target=/pulsar/data --mount 
source=pulsarconf,target=/pulsar/conf apachepulsar/pulsar:2.4.1 bin/pulsar 
standalone`
...ok, I'm having trouble replicating the weird deduplication behavior I was 
seeing, so here's a different question, about ordering.  I had thought that for 
an un-partitioned topic, message delivery ordering was guaranteed per-producer 
(for messages sent without a key).  so I'm using the node client to send some 
messages:
``` 12 »       const producer = await client.createProducer({
 14 »       »       topic: `${topic}`,
 15 »       »       producerName: 'my-producer',
 16 »       »       messageRoutingMode: 'UseSinglePartition',
 17 »       »       sendTimeoutMs: -1,
 18 »       });
 22 
 23 »       // Send messages
 24 »       for (let i = 0; i &lt; 10; i += 1) {
 25 »       »       const msg = `my-message-${i}`;
 26 »       »       const message = {
 27 »       »       »       data: Buffer.from(msg),
 28 »       »       }
 29 »       »       producer.send(message);
 31 »       }```
but I'm receiving them in a different order:
```Received message: 'b'my-message-0''
Received message: 'b'my-message-1''
Received message: 'b'my-message-3''
Received message: 'b'my-message-2''
Received message: 'b'my-message-4''
Received message: 'b'my-message-5''
Received message: 'b'my-message-6''
Received message: 'b'my-message-7''
Received message: 'b'my-message-8''
Received message: 'b'my-message-9''```
----
2019-12-17 13:37:00 UTC - rmb: some further questions about deduplication 
(based on 
<https://github.com/apache/pulsar/wiki/PIP-6:-Guaranteed-Message-Deduplication>):
• if a producer is setting sequenceId on a per-message basis, are there any 
constraints imposed by the broker? for example, does the sequence have to be 
monotonic?  the doc says that the broker keeps track of the highest sequenceId 
received from a particular producer; does that mean that if a producer sent 
messages 1, 2, 4, 3, the last one would be rejected because its sequenceId is 
less than 4? how would the broker distinguish that situation from the messages 
simply arriving out-of-order?
• if a producers uses a custom sequence of sequenceIds, with 'holes', what does 
that mean for acknowledgements? is the consumer's view of the sequence of 
messages entirely separate from the producer's sequenceIds?
----
2019-12-17 15:00:34 UTC - Greg Hoover: Is this a known issue or is this unique 
to me?
----
2019-12-17 15:05:26 UTC - Greg Hoover: I should have searched the slack channel 
for grafana. There are a couple of things mentioned I have not tried yet. Will 
report back. 
----
2019-12-17 18:43:50 UTC - Daniel Ferreira Jorge: @rmb Let me try to go point by 
point:

• The sequenceId does not need to be monotonic, just increasing.
• If you send messages 1,2,4,3 the message 3 will be discarded.
• The deduplication is a simple mechanism that maps a *producer name and a 
number.* You should not have 2 producers with the same name if you are using 
deduplication. 
• The consumer has absolutely nothing to do with sequence numbers of producers. 
When a broker receives a message from producer named "X", it will check in the 
map what is the last sequenceId of the producer X. If the message being 
produced now by producer X has an equal or greater sequence number, the message 
will be discarded. The deduplication mechanism has absolutely no other 
implications. 
----
2019-12-17 18:55:39 UTC - rmb: thanks, @Daniel Ferreira Jorge.  what if a 
producer sent messages 1, 2, 3, 4, but due to network issues they arrived at 
the broker in the order 1, 2, 4, 3? would the broker assume message 3 was a 
duplicate and discard it?
----
2019-12-17 18:58:39 UTC - Daniel Ferreira Jorge: I do not believe that this can 
happen in this context because for the producer to send message 4 the broker 
must have sent an ack of message 3... maybe I'm wrong and someone more 
knowledgeable, like @Sijie Guo or @Matteo Merli can give you a certainty about 
that...
----
2019-12-17 18:58:59 UTC - Joe Francis: ^^,  you are right
slightly_smiling_face : Daniel Ferreira Jorge
----
2019-12-17 18:59:36 UTC - rmb: really? the diagrams at 
<https://pulsar.apache.org/docs/en/develop-binary-protocol/> seem to suggest 
otherwise
----
2019-12-17 18:59:46 UTC - rmb: and what about batch mode?
----
2019-12-17 19:00:32 UTC - Joe Francis: Messages are published in the order they 
are queued by the producer..
----
2019-12-17 19:02:39 UTC - rmb: and is that queue order determined by the 
sequenceIds?
----
2019-12-17 19:06:42 UTC - Joe Francis: No. Its the order of send() invoked by 
the Producer.
----
2019-12-17 19:14:39 UTC - rmb: ok, so the diagrams are misleading and every 
send() has to be ack'ed before the next one is called.  what about 
deduplication in batch mode?  does the broker update the highest sequenceId for 
a given producer after it reads the entire batch of messages, or as it reads it?
----
2019-12-17 19:15:21 UTC - Joe Francis: The diagrams are correct
----
2019-12-17 19:16:44 UTC - rmb: the diagram under "producers" shows send() being 
called twice before sendReceipt is called
----
2019-12-17 19:19:26 UTC - Joe Francis: ??
----
2019-12-17 19:19:44 UTC - rmb: 
<https://pulsar.apache.org/docs/en/develop-binary-protocol/#producer>
----
2019-12-17 19:21:16 UTC - Joe Francis: That is a sequence diagram. It shows 
messages are acked in the order they are published. Nothing else is implied
----
2019-12-17 19:27:26 UTC - rmb: it would be considerably clearer if the second 
send() and the first sendReceipt() were interchanged, but ok
----
2019-12-17 19:27:49 UTC - rmb: any idea why I was receiving messages 
out-of-order above?
----
2019-12-17 19:28:42 UTC - Jason Fisher: it might be an async logging artifact
----
2019-12-17 19:30:08 UTC - Joe Francis: Actually, its meant to imply the 
opposite (and contrary to what you understood). There is no 
send1-ack1-send2-ack2 waits. Its send1-send2, and ack1, ack2 is independent. 
The diagram actaully shows pipelining, and shows exactly what it's meant to do
----
2019-12-17 19:30:18 UTC - Nick Nezis: @Nick Nezis has joined the channel
----
2019-12-17 19:31:29 UTC - rmb: ok.  so then my question remains: how is the 
broker determining order in the producer queue?  your initial answer was that 
the producer was waiting for each message to be ack-ed
----
2019-12-17 19:32:18 UTC - Joe Francis: No, my answer was that it was the order 
in which you invoke send,or (sendasync).
----
2019-12-17 19:34:02 UTC - Joe Francis: If you use send(), of course it will 
block for the ack. That'a an artifact of using a blocking API. But that's not 
required for dedup You can use sendAsync.
----
2019-12-17 19:35:55 UTC - Joshua Dunham: Hi Everyone, I'm getting `"python 
pulsar-producer.py" terminated by signal SIGILL (Illegal instruction)"` with 
the 2.4.2 client. Anyone else see this?
----
2019-12-17 19:36:18 UTC - Joshua Dunham: I have a MWE that worked with 2.4.1p1 
and no longer in 2.4.2
----
2019-12-17 19:36:50 UTC - Joe Francis: As for what you observed, I dont have 
enough details. But messages will be published in the same order they are send. 
Be aware that if you send 1,2,4,3 seq-id in that order, Pulsar will attempt to 
publish in  that order, and the broker will reject it if dedup is on. In other 
words, Pulsar will not sort  disordered seq-id
----
2019-12-17 19:37:23 UTC - Joshua Dunham: Once I upgraded to 2.4.2 the pulsar 
python module complained about also wanting [email protected] which I had to install 
from the brew retired repository.
----
2019-12-17 19:38:38 UTC - rmb: thanks.  but how does the broker distinguish 1, 
2, 4, 3 from 1, 2, 3, 4 arriving out of order?
----
2019-12-17 19:40:34 UTC - Joe Francis: Whatever order e you invoke send on the 
client, is the order the broker will see
----
2019-12-17 19:41:46 UTC - rmb: yes, you've said, but I've been asking you how 
the broker determines that order.  is it looking at a timestamp?
----
2019-12-17 19:43:14 UTC - Joe Francis: Perhaps you are leaving something out 
that you know, but I dont.. this is a very simple concept. If you do 
send(1)/send(2), send(3) in that order Or (sendasync()) , the broker will see 
1, 2, 3.  There is no timpestamps its the oder in which you send
----
2019-12-17 19:44:53 UTC - rmb: producers are talking to brokers over a network, 
which means that if there's a network issue, messages can arrive in a different 
order than they were sent.  I assume there must be some way for brokers to deal 
with that
----
2019-12-17 19:45:03 UTC - Joshua Dunham: Hi @rmb: I can possibly help. For each 
client cxn pulsar only acks when it's reached a write quorum. If it acks, the 
record has been noted. If you are connecting async then one thread could write 
ahead of another. If you write bulk the ack is on the bulk payload and the 
client needs to understand if there is order in the individual records.
----
2019-12-17 19:47:21 UTC - Joe Francis: Batching and quorum are entirely 
orthogonal to message ordering
----
2019-12-17 19:47:29 UTC - jmogden: Hello, I'm trying to understand mutli-topic 
subscription using regex patterns. From what I've found, when a consumer 
subscribes to topics in a namespace using a regex expression it will subscribe 
to everything that matches the regex. I noticed that if a new topic is made 
that the consumer wasn't initially subscribed to, then it won't actually 
subscribe to and consume from the new topic; even if it would match the regex. 
Is there a way to have the consumer subscribe to the new topic with being 
closed and re-made?
----
2019-12-17 19:49:31 UTC - Joe Francis: Ordering is determined at the client, 
based on the order in which you invoke send()/sendasync.  Batching is a 
transport and i/O optimization, that is entirely immaterial to odering. Quuorum 
is not visible to client.
----
2019-12-17 19:50:01 UTC - Joe Francis: Fo eg: 
<https://github.com/streamlio/pulsar-java-tutorial/blob/master/src/main/java/tutorial/async/AsyncProducerTutorial.java>
----
2019-12-17 19:51:22 UTC - Joe Francis: Messages will be publsihed in the  order 
that sends are invoked in the loop, so they will be published in the loop 
counter order
----
2019-12-17 19:53:59 UTC - rmb: thanks, @Joshua Dunham, I'm trying to understand 
how that interacts with the deduplication feature.  if a producer is connecting 
asynchronously and one thread writes ahead of another (so that the producers 
sendAsync's 1, 2, 3, 4 but the broker sees messages in the order 1, 2, 4, 3), 
will the broker delete messages?
----
2019-12-17 19:54:37 UTC - Joe Francis: Ha - so you are using mult-threading in 
the Producer?
----
2019-12-17 19:54:53 UTC - rmb: no, I'm just worried about an unreliable network
----
2019-12-17 19:54:57 UTC - rmb: packets can arrive out of order
----
2019-12-17 19:55:05 UTC - Joe Francis: That's not something you have to worry 
about
----
2019-12-17 19:55:10 UTC - rmb: I assume there must be some way to deal with this
----
2019-12-17 19:55:12 UTC - rmb: why not?
----
2019-12-17 19:55:52 UTC - Joe Francis: Because Pulsar guarantees that the order 
in which you send is the order in which it gets published.
----
2019-12-17 19:56:01 UTC - Jason Fisher: Switch to sync and not async on the 
consumer
----
2019-12-17 19:56:18 UTC - Joshua Dunham: @rmb Definitely create a app scoped 
timestamp or id in this case. You cannot use any queue as ordered if you are 
writing fifo.
----
2019-12-17 19:56:22 UTC - Jason Fisher: You can’t judge the logging output to 
be the actual order things arrive 
----
2019-12-17 19:56:34 UTC - Jason Fisher: Add a received timestamp to the log 
----
2019-12-17 19:56:59 UTC - rmb: how does pulsar guarantee that the order in 
which you send is the order in which it gets published?
----
2019-12-17 19:57:02 UTC - Jason Fisher: Your console output is not async safe 
----
2019-12-17 19:57:11 UTC - Jason Fisher: In terms of keeping things in order 
----
2019-12-17 19:57:46 UTC - Joshua Dunham: @rmb, not the order you send, the 
order that pulsar acks.
----
2019-12-17 19:58:23 UTC - Roman Popenov: The only order you can guarantee is 
within Pulsar cluster
----
2019-12-17 19:58:49 UTC - Joshua Dunham: If you send synchronously you wait for 
each ack. If async then some other threads can get ackd depending on how slow 
the backend is to achieve write quorum etc.
----
2019-12-17 19:59:22 UTC - rmb: ok, great.  so if the broker has deduplication 
enabled and sees 1, 2, 4, 3, it will assume that 3 is a duplicate and delete 
it, even if the producers sent it async before 4?
----
2019-12-17 19:59:43 UTC - Roman Popenov: `Java client components are 
thread-safe: a consumer can acknowledge messages from different threads.`
----
2019-12-17 19:59:50 UTC - Roman Popenov: But there is no such guarantees with 
producers
----
2019-12-17 20:00:20 UTC - Joshua Dunham: It doesn't see 1,2,4,3, it sees 
1,2,3,4 and your app sees the contents out of order.
----
2019-12-17 20:00:50 UTC - rmb: how do you guarantee that it sees 1,2,3,4?
----
2019-12-17 20:01:22 UTC - Joshua Dunham: I mean, pulsar orders linearly what it 
sees.
----
2019-12-17 20:01:53 UTC - Roman Popenov: Well, I would assume if you have one 
producer that produces to one topic, it will be in order
----
2019-12-17 20:01:54 UTC - Joshua Dunham: 1,2,3,4 being pulsar derived IDs.
----
2019-12-17 20:02:51 UTC - rmb: this thread of conversation started with me 
trying to understand custom sequenceIds and deduplication
----
2019-12-17 20:02:52 UTC - Joshua Dunham: Like if you have two small python 
loops filling in a spreadsheet with contents. The index of the spreadsheet is 
always 1-&gt;N in order. but the contents have no guarantee to make sense to 
the app.
----
2019-12-17 20:05:05 UTC - Joe Francis: @rnb There is a Q on the client, which 
is where ordering is imposed. The only way to ensure this order is to invoke 
send()/aysnc() in the order you desire. (which you cannot ensure if you use a 
multi-threaded Producer and invoke send from multiple threads) . This Q is what 
gets transported to the client.  Batching/compression etc are tranpsort 
mechanisms, which only affect how the Q is moved, and not the order. 
Underneath, TCP is used, which guarantees network order on a given connection.  
Ordering is enforced, in that if the transfer fails on a msg in the middle of 
the Q, everything after msg  will also be failed.
----
2019-12-17 20:08:12 UTC - Joe Francis: Receipts/acks will also come in similar 
order. Everything from a producer will be acked in the order in the Q. All 
these acks will be delivered into a consumer Q in the client.  If you read it 
out one by one, you will get the publish order. (If you read it with 
multi-threaded consumer, you will lose ack ordering)
----
2019-12-17 20:08:53 UTC - Roman Popenov: Although consumers ARE thread safe
----
2019-12-17 20:10:17 UTC - Joe Francis: Thread safe =/= ordering. It means that 
they can dequeue without stepping on each other. It does not gaurantee they 
execute in the same order they dequeued.
----
2019-12-17 20:10:21 UTC - Roman Popenov: It isn’t
----
2019-12-17 20:10:46 UTC - Roman Popenov: The other question I have, how to 
handle a chunked message
----
2019-12-17 20:11:08 UTC - rmb: ok, thanks.  why does one message getting lost 
mean that all subsequent messages will get lost?
----
2019-12-17 20:11:17 UTC - Roman Popenov: Is it possible to know that chunks are 
part of the same message and skip them until next message with multiple 
consumers?
----
2019-12-17 20:15:32 UTC - Joe Francis: That's the guarantee given by Pulsar.  
If you publish 1,2,3,4.5,6,7,8,9.10 and for some  error on the server side, 
Pulsar could not store 5, then the integrity of the Q order is lost. Pulsar 
will not ack the rest . It will ack 1..4, and then fail 5-10.
----
2019-12-17 20:18:30 UTC - rmb: ok, but you're specifically allowed to send 
messages with a sparse sequence of sequenceIds.  if a broker receives 
1,2,3,4,6,7,8,9,10, how does it distinguish the producer sending that sequence 
from the producer sending 1,2,3,4,5,6,7,8,9,10 and 5 getting lost?
----
2019-12-17 20:27:44 UTC - rmb: Thanks for the answers to my questions!  I'm 
afraid it's dinner time for me and I have to drop offline
----
2019-12-17 20:37:01 UTC - Joe Francis: This has nothing to do with seq-id. The 
numbers i used indicate message order, not seq-id
----
2019-12-17 20:46:53 UTC - Joshua Dunham: Anyone see issues with the pulsar 
python client?
----
2019-12-17 20:47:25 UTC - Joshua Dunham: I'm getting a hard error (think it's 
in the C components)
----
2019-12-17 20:47:27 UTC - Joshua Dunham: "python pulsar-producer.py" terminated 
by signal SIGILL (Illegal instruction)" with the 2.4.2 client. Anyone else see 
this?
----
2019-12-17 21:00:54 UTC - ec: Is it safe to connect and use the Bookkeeper that 
Pulsar uses, for you know possibilities?
----
2019-12-17 21:04:21 UTC - tihomir: guys we are using pulsar 2.3.2 and we have 
the following strange problem
I subscribe to a topic with pulsar-client on region2 (shared subscription). 
Then I use the pulsar-client to publish on region1. Both regions have the 
replication clusters correctly set, but I never receive the message on region2.
----
2019-12-17 21:16:59 UTC - Addison Higham: @tihomir struggling to remember the 
call at the moment, but there is a status call for replication that will let 
you know what is happening, also, the broker logs are pretty useful for 
debugging replication
----
2019-12-17 22:15:57 UTC - Roman Popenov: So I ran `kubectl apply -f 
zookeeper.yaml` and my kubectl config is pointing to an EKS cluster in AWS
----
2019-12-17 22:16:16 UTC - Roman Popenov: It doesn’t seem to be in the default 
namespace
----
2019-12-17 22:16:22 UTC - Roman Popenov: Is that to be expected?
----
2019-12-17 22:53:41 UTC - Greg Hoover: Got it working. I had not setup 
Prometheus. It was included with some other containers I was using previously 
so overlooked that part. Downloaded a Prometheus container and configured it 
with the other containers and now it is working fine. Interestingly, the 
grafana dashboards in the streamnative and apache containers are different. I 
like different aspects of both, so I’ll prob use them both for a while. 
----
2019-12-17 23:33:23 UTC - Greg Hoover: Looks like the streamnative one may be a 
superset of the Apache. So will use the streamnative one for now.
----
2019-12-18 02:41:26 UTC - LaxChan: pulsar geo-replication must be use same zk 
cluster?
----
2019-12-18 03:29:18 UTC - jia zhai: It is not necessary
----
2019-12-18 03:29:33 UTC - jia zhai: <https://gist.github.com/jiazhai>
----
2019-12-18 03:30:06 UTC - jia zhai: Here contain 2 example. 1 use globalzk, 
another not using global zk
----
2019-12-18 06:11:49 UTC - LaxChan: :+1:
----
2019-12-18 09:06:14 UTC - Jasper Li: Hello all, I want to ask a question of 
Pulsar SQL. Does Pulsar SQL actually consume data in a subscription or have it 
just scanned the data from storage directly?
----

Reply via email to