2019-04-19 11:47:28 UTC - Mr BECHAMKI: @Mr BECHAMKI has joined the channel
----
2019-04-19 13:18:44 UTC - stefan: Hi. I am having trouble re initializing the
cluster meta data. I end up with Exception in thread "main"
org.apache.zookeeper.KeeperException$NodeExistsException: KeeperErrorCode =
NodeExists for /namespace
----
2019-04-19 13:31:06 UTC - Ruud Kamphuis: @Ruud Kamphuis has joined the channel
----
2019-04-19 13:33:55 UTC - stefan: Hi guys. When running locally on my laptop, i
end up with a connection refused : Caused by:
io.netty.channel.AbstractChannel$AnnotatedConnectException: Connection refused:
localhost/127.0.0.1:6650. Any help appreciated
----
2019-04-19 13:34:47 UTC - Ruud Kamphuis: There seems to be a typo in your
address, it reads `localhost/127.0.0.1:6650` thats not good
----
2019-04-19 13:35:03 UTC - Ruud Kamphuis: it should be `localhost:6650` or
`127.0.0.1:6650`
----
2019-04-19 13:36:06 UTC - Ruud Kamphuis: Hello everyone. I read the whole FAQ
(<https://github.com/apache/pulsar/blob/master/faq.md>) but couldn't find the
answer to this question:
Is it possible to have multiple consumers listening to 1 topic that have their
own subscription type? For example, I have an ETL consumer that wants to make
sure it gets all the messages. And I have a Stats consumer that keeps track off
stats. I want to make sure there is only 1 ETL consumer, and only 1 Stats
consumer.
----
2019-04-19 13:36:38 UTC - Ruud Kamphuis: As far as I know, 1 topic can only
have 1 subscription? Or is there a way to somehow group consumers by
consumerName ?
----
2019-04-19 13:37:28 UTC - stefan: agreed. i just downloaded it with wget and
launch the standalon bin/pulsar standalone and called : ./bin/pulsar-client
produce my-topic --messages "hello-pulsar"
----
2019-04-19 13:37:35 UTC - stefan: i did not even touch the conf
----
2019-04-19 13:38:15 UTC - Sijie Guo: 1 topic can have as many subscriptions as
it can
----
2019-04-19 13:38:25 UTC - Sijie Guo: each subscription can choose its own
subscription type.
----
2019-04-19 13:38:38 UTC - Sijie Guo: the consumers use same subscription name
are in the same consumer group.
----
2019-04-19 13:44:34 UTC - Ruud Kamphuis: Is this somewhere documented? Because
I read through the whole docs and faq but couldn't find it.
----
2019-04-19 13:44:40 UTC - Ruud Kamphuis: (thanks for your answer btw!)
----
2019-04-19 13:46:35 UTC - Ruud Kamphuis: Ah, I know what I was doing wrong.
I saw
`<ws://broker-service-url:8080/ws/v2/consumer/persistent/:tenant/:namespace/:topic/:subscription>`
And thought that `:subscription` was the type, so I entered `shared` there..
But that's just the name of the subscription, nice!
----
2019-04-19 13:49:39 UTC - Sijie Guo: :+1:
----
2019-04-19 13:57:35 UTC - Kai Levy: I understand ZK's general role, I am just
hoping to get into the specifics. For example, does using pulsar's reader
interface cause writes on ZK, like creating a subscription does?
----
2019-04-19 14:34:50 UTC - Ruud Kamphuis: Another question > When using
websockets with a schema, is it still required to base64 encode the `payload`?
Or can you send a message like this:
```
{ "payload": { "id": 1, "event": "some-event" } }
```
Maybe I misunderstand the schemas thing.
----
2019-04-19 14:35:26 UTC - Ruud Kamphuis: So if I change my scema from `None` to
`JSON`
----
2019-04-19 15:17:38 UTC - Joe Francis: Readers have no persistent state, so no.
----
2019-04-19 15:29:43 UTC - Kai Levy: So generally speaking, is there a list of
operations that do use zookeeper, and whether they are reads or writes?
----
2019-04-19 15:31:15 UTC - Kai Levy: Or a straightforward way I can analyze the
source code to find operations that use zookeeper?
----
2019-04-19 15:44:06 UTC - Joe Francis: Topics and Subscriptions have state and
metadata, and so they will have ZK entries, and this metadata gets updated if
you create/delete or set properties on them. Then there is Bookkeeper ledgers
associated with the topics and cursors which gets updated when data files get
rolled over. You can look ManagedLedgerInfo.java to see what metadata is kept
----
2019-04-19 16:05:54 UTC - Kai Levy: Does creating consumers on existing
subscriptions ever write to zk? Or just read?
----
2019-04-19 16:14:42 UTC - Sébastien de Melo: Hi guys!
We encounter a very weird error with our Pulsar function. It has 2 input
topics and when we make a load test on 1 topic, the function eventually stops
listening to this topic at some point and never recovers. The messages sent to
the other topic are still processed though (confirmed by the stats subcommand).
Then we have to delete it and recreate it so that it works again.
----
2019-04-19 16:56:08 UTC - Sanjeev Kulkarni: @Sébastien de Melo huh, thats
wierd. any errors in the functipn log? how long after the fnction starts do you
see this happening
----
2019-04-19 16:56:37 UTC - Sanjeev Kulkarni: and whats the message rate on each
of the topic?
----
2019-04-19 17:01:38 UTC - Ruud Kamphuis: Why is the pulsar docker 1GB big?
Isn't there a Docker image available that only contains Pulsar itself?
----
2019-04-19 17:13:55 UTC - Joe Francis: In general no.
----
2019-04-19 18:03:49 UTC - Sam Leung: I have a question about phased rollout of
a service that is a consumer. Our current system’s paradigm allows us to
specify a percentage of traffic to route to a new deployment, e.g. 99% of
traffic goes to service A v1, 1% goes to service A v2. Eventually we tweak
those until all requests go to v2
In Pulsar, messages are pushed to the clients according to the subscription, so
that means v1 and v2 will both process messages as fast as they can. Has
precise throttling of a certain group of consumers been considered?
I see some potential solutions as:
- use consumer priority and permits to get rough distribution, but that does
not actually give me control
- have consumers nack a % of received messages, but a lot of busy work and
again not very precise
- create pulsar function to route messages in a distribution into v1's topic
and v2's topic, but that could end up with a lot of duplication
- add something to `AbstractDispatcherMultipleConsumers` to support groups of
consumers with a % of messages routed to them
Any thoughts?
----
2019-04-19 18:13:49 UTC - David Kjerrumgaard: @Ruud Kamphuis The Pulsar docker
image currently includes bookkeeper, zookeeper, and other components that
contribute to the size of the image. We could create a standalone "pulsar"
only docker image, but it would be incumbent upon the user to also spin up a
ZK, and BK image to configure the networking between them via docker-compose or
similar. So far, nobody has elected to go down that route.
----
2019-04-19 18:16:59 UTC - David Kjerrumgaard: @Sam Leung If you are looking for
a short term "hack" to simulate the behavior you described, you could write a
simple pulsar function that processes the message, generates a random number
between 1 and 100, if it is less than 100 then route it to service A v1,
otherwise route it to service A v2.
----
2019-04-19 18:18:14 UTC - Sam Leung: @David Kjerrumgaard I understand that
“hack” could work. I am trying to figure out the long term solution
----
2019-04-19 18:19:21 UTC - David Kjerrumgaard: @Sam Leung Sure, I am curious as
to how the long term solution would be different from a routing perspective,
i.e how would you determine which messages go to which consumers?
----
2019-04-19 18:20:57 UTC - David Kjerrumgaard: and how would you handle slow
consumers, i.e. one consumer takes longer to process messages than others,
would you adapt to the back-pressure, etc? What if one of the consumers
fails? should the remaining one get 100% of the traffic?
----
2019-04-19 18:22:39 UTC - Sam Leung: Ah we have a microservice that could serve
those percentage numbers. If we use pulsar functions to do the routing, I am
thinking we would need to cache the numbers in redis or zookeeper.
We generally have a GA version, which all traffic is routed to by default, but
divert 1% (or whatever) to the new deployments.
----
2019-04-19 18:22:54 UTC - David Kjerrumgaard: Just things to consider if you
want to submit a PIP, etc.
----
2019-04-19 18:23:08 UTC - Sam Leung: Each service also has multiple instances,
so it should be resilient enough that the GA has at least one consumer running.
----
2019-04-19 18:23:11 UTC - Matteo Merli: There are several optimizations that
could be done on the Docker image
----
2019-04-19 18:23:41 UTC - Matteo Merli: Basically that image just needs the
pulsar-bin.tar.gz plus JVM
----
2019-04-19 18:23:53 UTC - Sam Leung: Definitely good things to think about in a
more general scenario though.
----
2019-04-19 18:24:13 UTC - Matteo Merli: There was some discussion here:
<https://github.com/apache/pulsar/pull/3602>
----
2019-04-19 18:24:39 UTC - David Kjerrumgaard: Since this use case is geared
towards A/B testing (in my mind anyway), I was thinking of the case were v2 of
the service has a bug in it that causes ALL instances to fail.
----
2019-04-19 18:25:57 UTC - David Kjerrumgaard: users would think that some of
the messages aren't getting processed by the system. A lot of messages would
go un-acked which can cause issues, etc.
----
2019-04-19 18:26:57 UTC - Sam Leung: I see.. if v2 did not have an ack timeout
and doesn’t disconnect, the messages would be stuck.
----
2019-04-19 18:27:14 UTC - David Kjerrumgaard: yep
----
2019-04-19 18:28:03 UTC - Sam Leung: Okay, alternatively, if we didn’t need the
precision of exact percentages, what do you think would be a good canary test
to ensure v2 works?
----
2019-04-19 18:30:44 UTC - David Kjerrumgaard: Assuming that v2 would in turn
distribute messages to downstream services, etc?
----
2019-04-19 18:31:09 UTC - Sam Leung: sure
----
2019-04-19 18:31:24 UTC - Ruud Kamphuis: Thanks. I get that having a standalone
image is great for everybody that just wants to test Pulsar out.
However, I do find the naming of the current docker files super confusing.
pulsar
pulsar-standalone
pulsar-all
They all seem to have ZK, BK and more installed.
I expected `pulsar` to be the single pulsar package. And `pulsar-standalone` to
be P,ZK,BK,Dashboard etc
Why are they the same(ish)?
If you want to go to production, then you need / want to have these services
split right?
----
2019-04-19 18:33:31 UTC - Ruud Kamphuis: Thanks I will subscribe to the issue.
----
2019-04-19 18:35:21 UTC - David Kjerrumgaard: That's a good question. I'd have
to think about it a bit. Can your downstream services handle duplicate
messages? If so, you can have v2 create its own subscription on the incoming
topic
----
2019-04-19 18:36:49 UTC - David Kjerrumgaard: yes, in a production environment
these services are typically spread out.
----
2019-04-19 18:37:40 UTC - Sam Leung: That would be nice for the cases where
that the downstream services can handle that, it would put a bit of duplicated
effort, but well worth it. But there are some that cannot.
----
2019-04-19 18:37:48 UTC - David Kjerrumgaard: We deploy the services separately
as pods in K8s and use the configs to control which services are running in
each pod
----
2019-04-19 18:38:09 UTC - Sébastien de Melo: Approximately 120 000 messages in
1 minute. The function processes between 50k and 85k and stops working. It
takes a few minutes. There are some 500 errors from the API we call in the logs.
We had 9 instances of the function distributed across 3 brokers. Interestingly
the problem does not occur if we create 20 instances instead of 9
----
2019-04-19 18:38:48 UTC - David Kjerrumgaard: Yea, the answer is going to be
very specific to your environment
----
2019-04-19 18:39:19 UTC - Ruud Kamphuis: I created an issue on Github,
<https://github.com/apache/pulsar/issues/4086> . I think it's better to have it
there as others can also search for it.
----
2019-04-19 18:40:08 UTC - David Kjerrumgaard: FWIW, my "hack" would segregate
the messages into different topics, and if v2 is having issue, you will be able
to see that in the topic backlog, ack count, etc.
----
2019-04-19 18:40:34 UTC - David Kjerrumgaard: and it wouldn't impact the v1 flow
----
2019-04-19 18:41:22 UTC - David Kjerrumgaard: topics are cheap in Pulsar as
well :smiley:
----
2019-04-19 18:44:54 UTC - Sam Leung: Makes sense. Yeah they’re cheap, but I’m
thinking about the scale where we’re running at say 50% capacity, if we have 3
services that consume from the same topic on different subscriptions, and they
each run their own A/B test, we suddenly are duplicating the messages into 6
topics, with 3x the number of messages.
----
2019-04-19 18:45:42 UTC - Sam Leung: But that’s relatively unlikely
:slightly_smiling_face:
----
2019-04-19 18:46:05 UTC - David Kjerrumgaard: From a design perspective, I
think it is best NOT to embed this behavior into the core classes, and instead
use functions or similar tools to implement this and other unique behaviors,
such as filtering, replicating, etc. Adding this into the base class makes the
topic configuration that much more complicated.
----
2019-04-19 18:46:43 UTC - Sam Leung: I agree
----
2019-04-19 18:46:52 UTC - David Kjerrumgaard: I wouldn't worry about the
scalability of Pulsar too much :smiley:
----
2019-04-19 18:47:29 UTC - David Kjerrumgaard: with proper message retention and
expiration policies in place you will be fine
----
2019-04-19 18:55:04 UTC - Sam Leung: Thanks for all your help!
----
2019-04-19 19:34:06 UTC - Matteo Merli: Having ZK and BK in same image is not
the reason for the big size :slightly_smiling_face:
----