2019-05-20 09:33:58 UTC - pradeep: Can someone help here?
Active-active replication, If two producers are writing the data to a single
topic, then how would pulsar guarantee the message ordering ar consumer end?
I mean lets take Two DC, producer and consumers are present in both the dc, and
both are producing and consuming, Then during replication how does the pulsar
guarantees that messages are in same order considering in case of delay in
replication
----
2019-05-20 13:47:19 UTC - Shivji Kumar Jha: In AuthenticationToken.java
```
@Override
public void configure(Map<String, String> authParams) {
// noop
}
```
Why is this empty? Is it something that is intentionally left blank (which
seems the case with noops) or just not done at that time in the interest of
time?
This is a problem with <https://github.com/apache/pulsar/pull/4284>
We try to construct AuthenticationToken from .class and map<K<V> but
it returns null and the flink connector crashes with NPE…
----
2019-05-20 14:03:36 UTC - Géraud: @Géraud has joined the channel
----
2019-05-20 14:53:43 UTC - David Kjerrumgaard: @vvy So, you are saying that you
have the messageID of the message you want to read, but when you create a
reader and set the start message, e.g. `Reader r =
ReaderBuilder.startMessageId(MessageId startMessageId).build()` the call to
`reader.readNext()` reads the message AFTER the desired messageID, and NOT the
one you are trying to read?
----
2019-05-20 14:57:40 UTC - Byron: Quick question for those running Pulsar with
Docker.. I want to use the “tiered storage” feature which is available in the
`apachepulsar/pulsar-all` image. Can I use that image just for the broker nodes
(and not for proxies, zookeeper, bookies, etc.)? I ask because the image is
twice the size (2GB) so I wanted to minimize its use if I didn’t have to.
----
2019-05-20 15:00:42 UTC - Byron: based on the docs
(<https://pulsar.apache.org/docs/en/standalone/#installing-tiered-storage-offloaders-optional>),
it notes only “every broker node”, but just wanted to confirm there wasn’t
going to be any strange behavior if the proxy nodes didn’t use this image as
well
----
2019-05-20 15:26:29 UTC - Alexandre DUVAL: Hi, there is limitation on message
properties evolutions? Like backward compatibility?
----
2019-05-20 15:31:44 UTC - David Kjerrumgaard: @Byron In theory yes, just make
sure the Pulsar client version is compatible with the Broker version.
----
2019-05-20 15:40:17 UTC - Byron: @David Kjerrumgaard thanks
----
2019-05-20 15:48:47 UTC - David Kjerrumgaard: @pradeep When messages are
produced on a Pulsar topic, they are first persisted in the local cluster and
then forwarded asynchronously to the remote clusters. In normal cases, when
there are no connectivity issues, messages are replicated immediately, at the
same time as they are dispatched to local consumers. However, in the scenario
you mention, there is a replication delay between the two clusters, which in
CAP theorem terms represents a network partition.
In the presence of a partition, one is then left with two options: consistency
or availability. When choosing consistency over availability, the system will
return an error or a time-out if particular information cannot be guaranteed to
be up to date due to network partitioning. When choosing availability over
consistency, the system will always process the query and try to return the
most recent available version of the information, even if it cannot guarantee
it is up to date due to network partitioning.
Pulsar choses to provide availability over consistency in this scenario.
----
2019-05-20 15:49:32 UTC - David Kjerrumgaard: so your messages will not be
guaranteed to be "in order".
----
2019-05-20 16:00:16 UTC - Addison Higham: hrm... can anyone provide a bit of
insight? I am running my first cluster on k8s using a slightly modified version
of the helm chart. I was doing some load testing and restarted my brokers, and
I have gotten myself into a weird state, specifically, the pulsar-dashboard is
completely blank as the collector is failing to fetch metrics, as it is hitting
the wrong IP, which it got from the `admin/clusters/<cluster>` endpoint,
which is returning the wrong serviceUrl and brokerServiceUrl. Mostly trying to
understand where that comes from (is it somehow getting stored in ZK?) and what
I did wrong
----
2019-05-20 16:01:48 UTC - David Kjerrumgaard: @Addison Higham When you say you
"restarted your brokers" I am assuming you spun up completely new pods with new
IP addresses, etc?
----
2019-05-20 16:02:03 UTC - Addison Higham: no, just scaled the replica set down
and back up
----
2019-05-20 16:02:12 UTC - Addison Higham: so it got rescheduled
----
2019-05-20 16:03:46 UTC - Addison Higham: ah found the controller, looks to be
coming from ZK
----
2019-05-20 16:05:05 UTC - David Kjerrumgaard: yes
----
2019-05-20 16:05:28 UTC - David Kjerrumgaard: Did you deploy a Pulsar-Proxy as
part of your cluster?
----
2019-05-20 16:05:32 UTC - Addison Higham: yes
----
2019-05-20 16:10:01 UTC - Addison Higham: okay, so I see that I can manually
change it via the `pulsar-admin` CLI, but is that expected? my cluster is still
functional as the brokers can accept messages, but the cluster as stored in ZK
now has the wrong URL (which is an IP address, rather than the k8s service
endpoint). I imagine setting that to the service endpoint would solve that, but
not sure how it got set to that value in the first place
----
2019-05-20 16:12:50 UTC - David Kjerrumgaard: Yes, manually setting that to
service endpoint should resolve the immediate issue. As for the longer term, it
looks like we need to do some investigation on that.
----
2019-05-20 16:13:09 UTC - Addison Higham: ah okay, so that key is the sentinel
key used to indicate that BK and brokers can start so it must get set somewhere
earlier on in setting up the clusters
----
2019-05-20 16:13:32 UTC - David Kjerrumgaard: correct.
----
2019-05-20 16:14:21 UTC - David Kjerrumgaard: Do you have service discovery
enabled on your Pulsar proxy?
----
2019-05-20 16:16:53 UTC - Addison Higham: I imagine that is just the config map
from the proxy? it just has "zookeeperServers" and "configurationStoreServers"
as keys, so I am guessing yes?
----
2019-05-20 16:18:54 UTC - David Kjerrumgaard: Yes,
<https://pulsar.apache.org/docs/en/administration-proxy/#option-1-using-service-discovery>
----
2019-05-20 16:18:57 UTC - David Kjerrumgaard: zookeeperServers=zk-0,zk-1,zk-2
configurationStoreServers=zk-0:2184,zk-remote:2184
----
2019-05-20 16:20:43 UTC - Addison Higham: so should the
serviceUrl/brokerServiceUrl be the proxy? or the brokers directly?
----
2019-05-20 16:22:36 UTC - David Kjerrumgaard: Your configuration is correct. As
it stands it goes to ZK to get the serviceURL, which is what you want. The
issue is, as you discovered, is that ZK has the wrong value (currently)
----
2019-05-20 16:23:21 UTC - Mathieu Holl: @Mathieu Holl has joined the channel
----
2019-05-20 16:23:33 UTC - David Kjerrumgaard: Sorry for the confusion, I am
just trying to get a complete picture of the situation :smiley:
----
2019-05-20 16:24:25 UTC - Addison Higham: correct, and from my (limited)
perspective, it seems the mis-configuration here is primarily that it sets the
ZK value to an IP of one of the brokers, rather than the k8s service endpoint
name, which would be stable
----
2019-05-20 16:24:54 UTC - Addison Higham: not sure if other processes will
update that value automatically?
----
2019-05-20 16:26:09 UTC - David Kjerrumgaard: right, and does the
mis-configuration come from your Helm chart/ConfigMap ? Just trying to rule
things out
----
2019-05-20 16:27:20 UTC - Addison Higham: AFAICT, no, I didn't set that value
explicitly, my changes are primarily about just adding support for exposing the
proxies via an AWS NLB and adding some nodeSelectors to isolate onto a pool of
compute
----
2019-05-20 16:28:25 UTC - Addison Higham: so I would imagine it is somewhere in
the automation
----
2019-05-20 16:28:34 UTC - David Kjerrumgaard: yep.
----
2019-05-20 16:29:41 UTC - David Kjerrumgaard: For now, let's use the admin CLI
to correct that value and get you unstuck, and we can re-visit this issue since
it is easily reproducible.
----
2019-05-20 16:29:47 UTC - Addison Higham: yeah, already done
+1 : David Kjerrumgaard
----
2019-05-20 16:30:19 UTC - Addison Higham: I am just going to tear down my
cluster, re-provision, and double check it didn't get changed somewhere along
the line
+1 : David Kjerrumgaard
----
2019-05-20 16:35:22 UTC - Addison Higham: but just for my own clarification and
understanding, @David Kjerrumgaard I imagine that you *could* set those values
to the proxy and the dashboard (at least) would then fetch via the proxy, but
how else is that used? For example, when setting up replication, does it go
through the endpoints defined there? I imagine that in most cases of
replication, you are probably going to need to go through your proxy
----
2019-05-20 16:37:02 UTC - Addison Higham: also confirmed, that yes, on a
freshly created cluster I get this:
```pulsar-admin --admin-url <http://host:8080/> clusters get pulsar-dev
{
"serviceUrl" : "<http://10.164.20.188:8080>",
"brokerServiceUrl" : "<pulsar://10.164.20.188:6650>"
}
```
----
2019-05-20 16:38:51 UTC - Mathieu Holl: Hello ,
when using Athenz for authentication & authorization, do I need to deploy
the Athenz ZMS and ZTS separately from my pulsar cluster or is it included in
the pulsar deployment?
----
2019-05-20 16:44:27 UTC - David Kjerrumgaard: @Mathieu Holl You will need to
deploy those separately.
----
2019-05-20 16:47:17 UTC - Mathieu Holl: got it thx. On the other end, when
using TLS, authentication and authorization are self-contained and no separate
deployment is needed correct ?
----
2019-05-20 16:47:32 UTC - Addison Higham: weird, tracking down the
`initializer-cluster-metadata` call, it is properly setting URLs, `bin/pulsar
initialize-cluster-metadata \ --cluster pulsar-dev \ --zookeeper
pulsar-dev-zookeeper \ --configuration-store pulsar-dev-zookeeper \
--web-service-url <http://pulsar-dev-broker.pulsar.svc.cluster.local:8080/> \
--broker-service-url
<pulsar://pulsar-dev-broker.pulsar.svc.cluster.local:6650/> || true;`
----
2019-05-20 16:48:05 UTC - Addison Higham: and looking at the cluster metadata
setup class, I don't see anything that would be doing name resolution
----
2019-05-20 16:48:17 UTC - Matteo Merli: Correct. Same as for token based
authentication
----
2019-05-20 17:01:53 UTC - Matteo Merli: @Shivji Kumar Jha This was left blank
because the `configure(Map<String, String> authParams)` was already been
deprecated long ago in favor of `configure(String encodedAuthParamString)`
----
2019-05-20 17:04:47 UTC - Shivji Kumar Jha: @Matteo Merli for now I have done
this and i hope it works for our internal release:
@Override
public void configure(Map<String, String> authParams) {
- // noop
+ configure(new Gson().toJson(authParams));
}
----
2019-05-20 17:05:44 UTC - Shivji Kumar Jha: Basically what AuthenticationBasic
does already but start() is empty too and i am not sure if that will cause an
issue. Thhoughts?
----
2019-05-20 17:05:58 UTC - Matteo Merli: :+1:
----
2019-05-20 17:06:41 UTC - Matteo Merli: I don’t remember your change in Flink
connector, though, it would also be possible to just pass the
encodedAuthParamString
----
2019-05-20 17:07:21 UTC - Shivji Kumar Jha: That could work too! I will try
that out thanks :slightly_smiling_face:
----
2019-05-20 17:09:12 UTC - Matteo Merli: Yes, the `pulsar-all` image is just the
`pulsar` base image plus pulsar-io connector and tiered storage providers.
proxy, zk, bk won’t attempt to use those stuff anyway
----
2019-05-20 17:09:35 UTC - Matteo Merli: (also, from 2.4 the images will be much
smallers…)
----
2019-05-20 17:10:20 UTC - Matteo Merli: properties are just `Map<String,
String>`, application will define the meaning and how to evolve the formats
----
2019-05-20 17:14:41 UTC - Matteo Merli: @Addison Higham The URL set in the
initialize cluster metadata is typically used when there are multiple clusters
and geo-replication. The purpose is for clusters to be able to translated a
cluster name into a URL (eg. us-west ->
`<pulsar://xxxx.us-west-1.example.com>`).
The pulsar dashboard is designed to work for multiple clusters, so it will do
the following:
* Connect the configured service URL
* Get the list of clusters and their URLs
* For each cluster:
- Get the list of brokers
- For each broker, get the stats for all topic it’s running
----
2019-05-20 17:15:25 UTC - Matteo Merli: you can update the cluster URLs with
`bin/pulsar-admin clusters update`
----
2019-05-20 17:18:51 UTC - Addison Higham: okay cool, get that, but just trying
to understand, if I am using proxies and geo-replication (where the only way
the regional clusters can talk to each other is via the proxies), would it be a
mis-configuration for that value to be the private broker address? or should it
be the public proxy address?
----
2019-05-20 17:19:37 UTC - Matteo Merli: Yes, that should point to the “public”
address of a cluster
----
2019-05-20 17:20:03 UTC - Matteo Merli: if there’s a proxy, that should be
pointing to the proxy, rather than brokers
----
2019-05-20 17:23:44 UTC - Addison Higham: okay, got it, I am using the existing
helm chart as a guide to make sure I understand things, the helm chart does
take that into account when initializing the cluster (not that I expect it to,
but might be nice to document in general, none of the deployment guides mention
it AFAICT). Thanks for clarifying for me!
+1 : Matteo Merli
----
2019-05-20 17:25:15 UTC - Addison Higham: but yes, there is still a mystery to
me of where in writing those URLs to ZK they are getting resolved to IPs,
instead of the DNS name, not sure if you want a ticket for that?
----
2019-05-20 17:26:38 UTC - Matteo Merli: I’m not sure I got that completely
----
2019-05-20 17:26:59 UTC - Matteo Merli: Do you mean how
`pulsar-dev-broker.pulsar.svc.cluster.local` get resolved into an IP ?
----
2019-05-20 17:27:05 UTC - Addison Higham: correct
----
2019-05-20 17:27:19 UTC - Addison Higham: and then stored in ZK
----
2019-05-20 17:27:28 UTC - Matteo Merli: oh, that’s done because Kubernetes
attaches DNS names to “services”
----
2019-05-20 17:27:50 UTC - Matteo Merli: that name will get you the IP of any
broker pods
----
2019-05-20 17:27:56 UTC - Addison Higham: oh no, I get that
----
2019-05-20 17:27:57 UTC - Matteo Merli: (currently running)
----
2019-05-20 17:29:37 UTC - Addison Higham: what I mean is that when using the
current helm chart, it does an `initialize-cluster-metadata` with the correct
DNS name (`bin/pulsar initialize-cluster-metadata \ --cluster pulsar-dev \
--zookeeper pulsar-dev-zookeeper \ --configuration-store pulsar-dev-zookeeper \
--web-service-url <http://pulsar-dev-broker.pulsar.svc.cluster.local:8080/> \
--broker-service-url
<pulsar://pulsar-dev-broker.pulsar.svc.cluster.local:6650/> || true;`)
----
2019-05-20 17:30:35 UTC - Addison Higham: but when fetching that value back out
after everything is up, those DNS names are now IP address:
```pulsar-admin --admin-url <http://host:8080/> clusters get pulsar-dev
{
"serviceUrl" : "<http://10.164.20.188:8080>",
"brokerServiceUrl" : "<pulsar://10.164.20.188:6650>"
}```
----
2019-05-20 17:31:00 UTC - Addison Higham: so either:
A. initialize-cluster-metdata is doing name resolution to an IP
B. something else is coming along and updating it
----
2019-05-20 17:31:21 UTC - Matteo Merli: Oh, I missed that in the chat before…
that looks weird indeed
----
2019-05-20 17:33:39 UTC - Matteo Merli: The ` initialize-cluster-metadata`
treat these as strings only and they get serialized into JSON
----
2019-05-20 17:34:35 UTC - Addison Higham: agree, I looked at the code and
couldn't see anything, and the bash wrapper looks to just be passing it on
immediately as well, but I couldn't find any other places that call update
:shrug:
----
2019-05-20 17:34:40 UTC - Matteo Merli: Nowhere I can see that get converted
into an InetSocketAddress of any sort
----
2019-05-20 17:38:24 UTC - Addison Higham: so seems likely something else is
doing it, but I don't see any logging that would make it easy to confirm that,
anyways, let me know if I can help provide more info on that! But all things
told... this is about the only issue I have faced after a few days of poking at
it with a pretty much default helm chart, which is pretty nice!
beers : Matteo Merli
----
2019-05-20 17:49:08 UTC - Chris Bartholomew: In my experience using helm to
bring up Pulsar in k8s, it is important that first Zookeeper is started, then
the initialize metadata job runs, then everything else can be started. Not sure
that is strictly necessary, but that is what works reliably for me. The default
helm charts try to control the bring up order using initContainers that poll on
a condition. I have found that I've had to tweak those conditions depending
where I am running.
+1 : David Kjerrumgaard
----
2019-05-20 18:28:08 UTC - Sree Vaddi: my first medium post:
<https://medium.com/@sree_at_work/pulsar-io-connectors-4aad1e213764>
+1 : Chris Bartholomew, David Kjerrumgaard, Ali Ahmed, Matteo Merli, Karthik
Ramasamy, Jerry Peng, Ezequiel Lovelle, Thor Sigurjonsson, Devin G. Bost
----
2019-05-20 19:30:32 UTC - Byron: @Matteo Merli awesome thanks
----
2019-05-20 21:30:06 UTC - Devin G. Bost: We're having issues finding log
information that we're expecting to be produced when we call
`sinkContext.getLogger().info("...")`.
How does the logger know where to write to? Is there a default value that's set
somewhere? In our current use case, we're implementing `Sink<String>` in
a custom sink.
----
2019-05-20 21:30:45 UTC - Devin G. Bost: We also don't see a `logTopic`
parameter available for sinks when we deploy, so we're not sure where to get
the log information.
----
2019-05-20 21:40:49 UTC - Devin G. Bost: It looks like the logger is getting
constructed in `JavaInstanceRunnable`
----
2019-05-20 21:41:55 UTC - Jerry Peng: @Devin G. Bost logs are found under
<PULSAR_DIR>/logs/functions/<tenant>/<namespace>/<name>
----
2019-05-20 21:42:04 UTC - Devin G. Bost: The logs are empty for some reason.
----
2019-05-20 21:42:21 UTC - Devin G. Bost: We're trying to figure out why the log
files are all 0 bytes.
----
2019-05-20 21:42:28 UTC - Devin G. Bost: At least for this particular sink.
----
2019-05-20 21:47:32 UTC - Ali Ahmed: what is the sink status ?
----
2019-05-20 21:49:32 UTC - Victor Siu: I’m working with Devin. The status looks
like this when I run a pulsar-admin sink getstatus command.
----
2019-05-20 21:51:06 UTC - Victor Siu: We had another instance which had
messages passed to it already and it didn’t have logs. And then there should
also be some logs written when the open() method on our custom sink is called
----
2019-05-20 21:52:03 UTC - Ali Ahmed: sink startup does generate logs so this
could be a config issue
----
2019-05-20 21:53:21 UTC - Victor Siu: Are you thinking it’s a lack of a config?
What config should we be setting instead? I tried adding a `logTopic` and the
parameter wasn’t recognized.
----
2019-05-20 21:56:48 UTC - Jerry Peng: @Devin G. Bost @Victor Siu you guys are
running with ThreadRuntime correct?
----
2019-05-20 21:57:04 UTC - Devin G. Bost: I think we still are.
----
2019-05-20 21:57:06 UTC - Devin G. Bost: Is that an issue?
----
2019-05-20 21:57:19 UTC - Jerry Peng: no just checking
----
2019-05-20 21:57:25 UTC - Devin G. Bost: Okay. Thanks.
----
2019-05-20 22:05:12 UTC - Jerry Peng: @Devin G. Bost do you guys have
log4j2.yaml in conf/ ?
----
2019-05-20 22:05:22 UTC - Devin G. Bost: I'll check.
----
2019-05-20 22:08:02 UTC - Devin G. Bost: @Jerry Peng Yes.
----
2019-05-20 22:08:58 UTC - Jerry Peng: its the same of
<https://github.com/apache/pulsar/blob/master/conf/log4j2.yaml>
----
2019-05-20 22:09:32 UTC - Devin G. Bost: Thanks. It might be related to the log
level. We're investigating.
----
2019-05-20 22:14:04 UTC - Devin G. Bost: @Ali Ahmed Do you know what runlevel
those logs are generated at?
----
2019-05-20 22:14:40 UTC - Ali Ahmed: info should be fine
@Sanjeev Kulkarni can you verify ?
----
2019-05-20 22:23:50 UTC - Sanjeev Kulkarni: I believe info
----
2019-05-20 23:04:44 UTC - Devin G. Bost: BTW, this link on the Pulsar site is
broken: <https://pulsar.apache.org/docs/latest/functions/quickstart/>
----
2019-05-20 23:05:05 UTC - Matteo Merli: Yes, it’s pointing to an older version
of the site
----
2019-05-20 23:05:16 UTC - Matteo Merli: it was causing lot of confusion
----
2019-05-20 23:05:20 UTC - Devin G. Bost: Might want to configure a redirect for
SEO :wink:
----
2019-05-20 23:06:22 UTC - Matteo Merli: that’s what we did initially, though
the redirect was rewritten with actual html.. and the page was back online
:confused:
----
2019-05-20 23:06:37 UTC - Matteo Merli: hopefully, it should go out of the
indexes quickly
----
2019-05-20 23:06:38 UTC - Matteo Merli:
<https://pulsar.apache.org/docs/en/functions-quickstart/>
----
2019-05-20 23:07:37 UTC - Devin G. Bost: Sorry about that.
----
2019-05-21 01:16:45 UTC - arrdem: @arrdem has joined the channel
----
2019-05-21 06:02:48 UTC - vvy: Yes, how to read the corresponding message?
----