2019-04-25 09:11:20 UTC - Romain Castagnet: Hi, did you try "namespaces create"
without cluster and next "namespaces set-clusters TENANT/NS -c
cluster1,cluster2" ?
----
2019-04-25 09:14:15 UTC - stefan: hi, yes this is what i did
----
2019-04-25 09:16:04 UTC - stefan: i tried both ways
----
2019-04-25 09:23:24 UTC - Romain Castagnet: hum strange
----
2019-04-25 09:32:28 UTC - Matti-Pekka Laaksonen: Today I noticed one of our
client applications had died, seemingly due to a lost connection. The last log
message is:
{"timestamp":"2019-04-25T06:33:19.900Z","level":"WARN","thread":"pulsar-client-io-1-1","logger":"org.apache.pulsar.client.impl.ClientCnx","message":"[10.223.2.164/10.223.2.164:6650]
Got exception NativeIoException : syscall:read(..) failed: Connection reset by
peer","context":"default"}
----
2019-04-25 09:32:38 UTC - Matti-Pekka Laaksonen: This leads me to
<https://github.com/apache/pulsar/blob/branch-2.2/pulsar-client/src/main/java/org/apache/pulsar/client/impl/ClientCnx.java#L219>
----
2019-04-25 09:37:15 UTC - Matti-Pekka Laaksonen: I don't quite understand this
case. Normally when the Pulsar connection is lost we catch the exception, close
down the application gracefully, and the orchestration service restarts the
container after a delay. In this case, however, there is no error or a caught
exception, simply a WARN level log message. I'm not familiar with the execution
path of the ClientCnx, should the connection die after the state is set to
State.Failed?
----
2019-04-25 10:15:13 UTC - Yuvaraj Loganathan: Because the client will retry and
establish the connection :thinking_face:
----
2019-04-25 10:35:49 UTC - songxinlei: @songxinlei has joined the channel
----
2019-04-25 11:39:58 UTC - Matti-Pekka Laaksonen: Hmm, might be that the Pulsar
client was able to reconnect, but the non-Pulsar parts of the client failed.
I'll look into it
----
2019-04-25 12:28:59 UTC - Chris Bartholomew: I wanted to let everyone know that
I've built a service based on Pulsar. You can see it here:
<https://kafkaesque.io> I am really hoping it helps people get started with
Pulsar, testing their client code, etc. A basic account is free and includes an
integrated dashboard for admin and monitoring of topics, namespaces, clusters,
geo-replication. Would love it if everyone could try it out and give me some
feedback. Thanks.
+1 : Sijie Guo, Ezequiel Lovelle, Guy Feldman, Karthik Ramasamy, DT, Ruud
Kamphuis
----
2019-04-25 16:00:14 UTC - Grant Wu: What’s the current status of the Docker/k8s
runtime for PFs?
----
2019-04-25 16:07:45 UTC - Sijie Guo: @Grant Wu k8s runtime is supported since
2.3.0. The documentation is still missing though :disappointed:
----
2019-04-25 16:08:29 UTC - Grant Wu: :disappointed:
----
2019-04-25 17:43:08 UTC - Devin G. Bost: We increased parallelism for a
high-traffic Pulsar function (from 3 to 5), but the data shows that the new
function instances aren't getting any traffic. How would we figure out why
these instances aren't getting any of the load?
----
2019-04-25 17:44:13 UTC - Matteo Merli: Can you share the topics stats for the
topic these are consuming from?
`pulsar-admin topics stats $TOPIC`
----
2019-04-25 17:54:10 UTC - Thor Sigurjonsson: Devin is getting the topic stats
ready...
----
2019-04-25 17:55:59 UTC - Thor Sigurjonsson: I guess to add a little color to
the conversation, we noticed that the metrics in grafana showed higher latency
on 0.999 quantile and wanted to see if we could bring that down, when we
deployed parallelism 5 (from 3) we noticed 2 new functions share hosts with 2
"older ones" and there are not metrics being shown either in grafana for those
and 0 metrics from the pulsar-admin functions stats call.
----
2019-04-25 17:56:34 UTC - Devin G. Bost: I noticed that the instance with
instance_id: "2" is missing from the list of subscriptions.
----
2019-04-25 17:56:49 UTC - Thor Sigurjonsson: those .999 quantile ones are
around 100-125ms.
----
2019-04-25 17:57:23 UTC - Matteo Merli: It seems all 5 consumers (1 per
function instance) are consuming at ~32 msg/s
----
2019-04-25 17:59:03 UTC - Matteo Merli: In the stats JSON, you have the
`msgRateOut` for each consumer and the overall for the subscription
----
2019-04-25 18:02:36 UTC - Thor Sigurjonsson: when I do pulsar-admin functions
status on that function I get this for instance 2:
```{
"instanceId" : 2,
"status" : {
"running" : true,
"error" : "",
"numRestarts" : 0,
"numReceived" : 0,
"numSuccessfullyProcessed" : 0,
"numUserExceptions" : 0,
"latestUserExceptions" : [ ],
"numSystemExceptions" : 0,
"latestSystemExceptions" : [ ],
"averageLatency" : 0.0,
"lastInvocationTime" : 0,
"workerId" : "REDACTED-8080"
}```
----
2019-04-25 18:04:36 UTC - Devin G. Bost: I also only count 4 consumers.
----
2019-04-25 18:05:32 UTC - Thor Sigurjonsson: Also instance 3 and instance 2 are
on the same host, and instance 2 shows no metrics from prometheus-grafana and
instance 4 has ~100ms .999 quantile latency (and no data showing for instance
2). It's roughly twice what other functions report... Made me guess maybe they
were being rolled up for the host or something...
----
2019-04-25 18:07:45 UTC - Devin G. Bost: In the JSON output from `pulsar-admin
topics stats $TOPIC`, I only see consumers with these instance_id values: 0, 3,
1, 4. (2 is missing.)
----
2019-04-25 18:12:51 UTC - David Kjerrumgaard: Are there any errors in the log
for instance 2?
----
2019-04-25 18:16:07 UTC - Thor Sigurjonsson: Full contents of log from instance
2 at about the time of the parallelism update.
----
2019-04-25 18:20:33 UTC - Ruud Kamphuis: Interesting! Question: why does the
name include kafka?
----
2019-04-25 18:21:18 UTC - Jerry Peng: @Thor Sigurjonsson there are not more
logs for instance-2? If so it seems to be getting stuck.
----
2019-04-25 18:21:37 UTC - Jerry Peng: @Thor Sigurjonsson do you guys have
function state enabled?
----
2019-04-25 18:22:38 UTC - Thor Sigurjonsson: Hmm, we do have a log topic set...
----
2019-04-25 18:23:01 UTC - Thor Sigurjonsson: working to see about function
state being enabled..
----
2019-04-25 18:25:06 UTC - Thor Sigurjonsson: would that be
`stateStorageServiceUrl` in functions_worker.yml? (It's commented out).
----
2019-04-25 18:25:19 UTC - Jerry Peng: yes and gotcha
----
2019-04-25 18:25:47 UTC - Jerry Peng: you guys are running functions via Thread
Runtime?
----
2019-04-25 18:26:09 UTC - Thor Sigurjonsson: Yes
----
2019-04-25 18:26:59 UTC - Thor Sigurjonsson: (kerberos jvm params plumbing the
kafka connector made us go there for now)
----
2019-04-25 18:29:25 UTC - Jerry Peng: Give me a second to investigate
+1 : Thor Sigurjonsson, Devin G. Bost
----
2019-04-25 18:33:31 UTC - Thor Sigurjonsson: That's like OpenOffice calling
their thing Microsofty. :slightly_smiling_face:
----
2019-04-25 18:38:23 UTC - Thor Sigurjonsson: Sorry about being flippant.
:slightly_smiling_face: I get that there is a good search marketing angle to
get streaming customers.. I'll give it a spin this week and see if I can give
useful feedback.
----
2019-04-25 18:48:02 UTC - Ruud Kamphuis: Yeah to me the name is only gonna
confuse people. People are looking for pulsar will see the name and think :
nope, this is not what I want. People that want hosted kafka find this and
think nope, this is not what I want. Just my 2 cents
----
2019-04-25 18:48:26 UTC - Ruud Kamphuis: Great to have a hosted pulsar offering
tho! :raised_hands:
----
2019-04-25 18:53:50 UTC - Chethan UK: @Chethan UK has joined the channel
----
2019-04-25 19:04:50 UTC - Jerry Peng: @Devin G. Bost @Thor Sigurjonsson I have
reproduced the issue. There was a PR that went in earlier this year that might
have be causing race conditions when using the same pulsar client to create
consumers as is the case for running functions via ThreadRuntime. I am looking
for a fix the issue.
In the meantime, do you guys want to try running with process runtime? We
have added the ability to add runtime flags so that your kerberos configs can
get passed in. The functionality is not in a official release so you guys can
either 1) try to build your own pulsar release from master or 2) you can try a
streamlio pulsar release that contains the functionality since we create
releases more often that apache does.
+1 : Thor Sigurjonsson
----
2019-04-25 19:05:21 UTC - Devin G. Bost: > I have reproduced the issue.
There was a PR that went in earlier this year that might have be causing race
conditions when using the same pulsar client to create consumers as is the case
for running functions via ThreadRuntime. I am looking for a fix the issue.
Very impressive.
----
2019-04-25 19:06:51 UTC - Jerry Peng: Thanks! Can’t take all the credit.
@Matteo Merli also helped
+1 : Devin G. Bost
----
2019-04-25 19:13:11 UTC - Devin G. Bost: Is there a temporary workaround?
We did have some concerns about the memory utilization that might be associated
with the process runtime (with parallelization). Would we increase our memory
requirements if we parallelized with the process runtime instead of
parallelizing with the threading runtime?
----
2019-04-25 19:15:46 UTC - Thor Sigurjonsson: I think there are a few things
that go into the decision for us: 1) bug fixes we need, 2) what runtime to
"settle on" (and in which cluster maybe) 3) functions support for publishing
properties and then the timing and how we roll out the prod env right now being
used. We can roll faster in lower environment, but it would be good to pick a
release soon that gets the most bang for the buck.
----
2019-04-25 19:16:22 UTC - Thor Sigurjonsson: This parallelism issue is not
critical just yet, but it is part of 1) above I think.
----
2019-04-25 19:16:57 UTC - Thor Sigurjonsson: I guess we should consider the
streamlio build also going forward.
----
2019-04-25 19:17:51 UTC - Thor Sigurjonsson: I'm guessing much of what we'd
need would be in 2.3.2 (I may be wrong).
----
2019-04-25 19:20:08 UTC - Thor Sigurjonsson: Would we be getting those from
here <https://hub.docker.com/r/streamlio/pulsar/tags> ? if we were rolling with
docker?
----
2019-04-25 19:27:29 UTC - Jerry Peng: @Thor Sigurjonsson yes but its currently
does not have the latest image. We are actually in the process of doing a
another release. A new image should be up in the next half an hour
+1 : Thor Sigurjonsson
----
2019-04-25 19:28:22 UTC - Jerry Peng: @Devin G. Bost I am not sure of a
temporary workaround at this moment, but this issue doesn’t happen everytime.
It is a race condition. I am only able to reproduce it once out of the many
times I have tried.
----
2019-04-25 19:34:46 UTC - Chethan UK: Has anyone used MongoDB Source connector?
----
2019-04-25 19:45:57 UTC - David Kjerrumgaard: Not yet, are you having issues?
----
2019-04-25 19:47:22 UTC - Chethan UK:
<https://pulsar.apache.org/docs/en/io-cdc/>
is there a good tutorial on MongoDB *Source*?
----
2019-04-25 19:47:57 UTC - Ali Ahmed:
<https://github.com/bbonnin/pulsar-io-mongo>
----
2019-04-25 19:48:11 UTC - Chethan UK: Its sink, I want source
----
2019-04-25 19:48:49 UTC - Ali Ahmed: sorry the source is just debezium you can
try debezium docs
----
2019-04-25 19:54:51 UTC - Chethan UK: Where is the helm chart
<https://pulsar.apache.org/docs/en/deploy-kubernetes/#deploying-pulsar-components-helm>
?
----
2019-04-25 19:59:16 UTC - David Kjerrumgaard: @Chethan UK It is bundled with
the code. If you close the pulsar repo, then go to
/apache/pulsar/deployment/kubernetes/helm/pulsar
----
2019-04-25 19:59:52 UTC - Devin G. Bost: Gotcha.
----
2019-04-25 20:32:34 UTC - Jerry Peng: @Devin G. Bost @Thor Sigurjonsson a new
docker release is not available:
<https://hub.docker.com/r/streamlio/pulsa>
----
2019-04-25 20:32:46 UTC - Jerry Peng: sorry:
<https://hub.docker.com/r/streamlio/pulsar/tags>
----
2019-04-25 20:43:03 UTC - Devin G. Bost: Thanks!
----
2019-04-25 22:28:54 UTC - Steven Le Roux: Hi, I've deployed a local instance of
pulsar, but with separated components (zk, bk)
----
2019-04-25 22:29:44 UTC - Steven Le Roux: Bk seems ok so far (bk shell
listbookies, is listing bookies), they're registred into zk properly under
/ledgers/available
----
2019-04-25 22:30:16 UTC - Steven Le Roux: but when starting pulsar, it connects
to zk, then :
22:03:43.753 [main] ERROR org.apache.bookkeeper.client.BookieWatcherImpl -
Failed to get bookie list :
----
2019-04-25 22:30:55 UTC - Steven Le Roux: I can't find where to configure the
ledger zk path, but anyway, it defaults to /ledgers which should be fine :
----
2019-04-25 22:30:56 UTC - Steven Le Roux: 22:03:43.583 [main] INFO
org.apache.bookkeeper.meta.zk.ZKMetadataDriverBase - Initialize zookeeper
metadata driver with external zookeeper client : ledgersRootPath = /ledgers.
----
2019-04-25 22:31:13 UTC - Steven Le Roux: any idea what I'm missing ?
----
2019-04-25 22:31:53 UTC - Matteo Merli: What’s the `zookeeperServers` settings
in `broker.conf`?
----
2019-04-25 22:33:24 UTC - Steven Le Roux:
zookeeperServers=10.0.0.2:2181/pulsar-local
----
2019-04-25 22:33:54 UTC - Steven Le Roux: I've reduced to one for testing but
there are three of them
----
2019-04-25 22:33:55 UTC - Matteo Merli: I see, you’re using a chroot for ZK
----
2019-04-25 22:34:05 UTC - Steven Le Roux: yes
----
2019-04-25 22:34:14 UTC - Matteo Merli: is BK also using the same chroot?
----
2019-04-25 22:34:36 UTC - Steven Le Roux: also, I'm testing to chroot zk so
that I can collocalize local zk and global zk for testing purpose
----
2019-04-25 22:35:25 UTC - Steven Le Roux: ok from what you're saying, pulsar is
expecting to read /ledgers at /pulsar-local/ledgers then ?
----
2019-04-25 22:35:31 UTC - Matteo Merli: You can co-locate them without needing
the chroot
----
2019-04-25 22:35:47 UTC - Matteo Merli: the “global” zk is only using `/admin/`
prefix
----
2019-04-25 22:36:02 UTC - Steven Le Roux: ok perfect
----
2019-04-25 22:36:23 UTC - Matteo Merli: > ok from what you’re saying, pulsar
is expecting to read /ledgers at /pulsar-local/ledgers then ?
Yes, both Pulsar and BK should share the same chroot
----
2019-04-25 22:36:36 UTC - Steven Le Roux: ok that's why
----
2019-04-25 22:36:39 UTC - Steven Le Roux: thx, testing ;:)
----
2019-04-25 22:36:42 UTC - Matteo Merli: :slightly_smiling_face:
----
2019-04-25 22:43:08 UTC - Steven Le Roux: Far better :wink: thx @Matteo Merli!
+1 : Matteo Merli
----
2019-04-26 00:10:48 UTC - Grant Wu: @Sijie Guo did you figure anything out
about <https://github.com/apache/bookkeeper/issues/1970> ?
----
2019-04-26 01:05:37 UTC - Jerry Peng: @Grant Wu I talked with a few users that
saw this problem, they all had errors in their bookies when this error was
occurring. There wasn’t enough non-faulty bookies in the cluster and that is
what is causing this exception.
----
2019-04-26 01:06:58 UTC - Grant Wu: Interesting
----
2019-04-26 01:09:37 UTC - durga: Ok. Thanks @Matteo Merli
----
2019-04-26 02:26:48 UTC - Sijie Guo: @Grant Wu I was looking into that issue
before but I didn’t get to the root cause yet. it is still on my backlog. even
as what @Jerry Peng there wasn’t enough non-faulty bookies in the cluster, that
bookkeeper should handle that. the ArrayIndexOutoOfBoundsException doesn’t
sound right to me.
but anyway I will look into it whenever I have time.
----