Slack digest for #general - 2019-04-26

Apache Pulsar Slack Fri, 26 Apr 2019 02:11:58 -0700

2019-04-25 09:11:20 UTC - Romain Castagnet: Hi, did you try "namespaces create" 
without cluster and next "namespaces set-clusters TENANT/NS -c 
cluster1,cluster2" ?
----
2019-04-25 09:14:15 UTC - stefan: hi, yes this is what i did
----
2019-04-25 09:16:04 UTC - stefan: i tried both ways
----
2019-04-25 09:23:24 UTC - Romain Castagnet: hum strange
----
2019-04-25 09:32:28 UTC - Matti-Pekka Laaksonen: Today I noticed one of our 
client applications had died, seemingly due to a lost connection. The last log 
message is:
{"timestamp":"2019-04-25T06:33:19.900Z","level":"WARN","thread":"pulsar-client-io-1-1","logger":"org.apache.pulsar.client.impl.ClientCnx","message":"[10.223.2.164/10.223.2.164:6650]
 Got exception NativeIoException : syscall:read(..) failed: Connection reset by 
peer","context":"default"}
----
2019-04-25 09:32:38 UTC - Matti-Pekka Laaksonen: This leads me to 
<https://github.com/apache/pulsar/blob/branch-2.2/pulsar-client/src/main/java/org/apache/pulsar/client/impl/ClientCnx.java#L219>
----
2019-04-25 09:37:15 UTC - Matti-Pekka Laaksonen: I don't quite understand this 
case. Normally when the Pulsar connection is lost we catch the exception, close 
down the application gracefully, and the orchestration service restarts the 
container after a delay. In this case, however, there is no error or a caught 
exception, simply a WARN level log message. I'm not familiar with the execution 
path of the ClientCnx, should the connection die after the state is set to 
State.Failed?
----
2019-04-25 10:15:13 UTC - Yuvaraj Loganathan: Because the client will retry and 
establish the connection :thinking_face:
----
2019-04-25 10:35:49 UTC - songxinlei: @songxinlei has joined the channel
----
2019-04-25 11:39:58 UTC - Matti-Pekka Laaksonen: Hmm, might be that the Pulsar 
client  was able to reconnect, but the non-Pulsar parts of the client failed. 
I'll look into it
----
2019-04-25 12:28:59 UTC - Chris Bartholomew: I wanted to let everyone know that 
I've built a service based on Pulsar. You can see it here: 
<https://kafkaesque.io> I am really hoping it helps people get started with 
Pulsar, testing their client code, etc. A basic account is free and includes an 
integrated dashboard for admin and monitoring of topics, namespaces, clusters, 
geo-replication. Would love it if everyone could try it out and give me some 
feedback. Thanks.
+1 : Sijie Guo, Ezequiel Lovelle, Guy Feldman, Karthik Ramasamy, DT, Ruud 
Kamphuis
----
2019-04-25 16:00:14 UTC - Grant Wu: What’s the current status of the Docker/k8s 
runtime for PFs?
----
2019-04-25 16:07:45 UTC - Sijie Guo: @Grant Wu k8s runtime is supported since 
2.3.0. The documentation is still missing though :disappointed:
----
2019-04-25 16:08:29 UTC - Grant Wu: :disappointed:
----
2019-04-25 17:43:08 UTC - Devin G. Bost: We increased parallelism for a 
high-traffic Pulsar function (from 3 to 5), but the data shows that the new 
function instances aren't getting any traffic. How would we figure out why 
these instances aren't getting any of the load?
----
2019-04-25 17:44:13 UTC - Matteo Merli: Can you share the topics stats for the 
topic these are consuming from?


`pulsar-admin topics stats $TOPIC`
----
2019-04-25 17:54:10 UTC - Thor Sigurjonsson: Devin is getting the topic stats 
ready...
----
2019-04-25 17:55:59 UTC - Thor Sigurjonsson: I guess to add a little color to 
the conversation, we noticed that the metrics in grafana showed higher latency 
on 0.999 quantile and wanted to see if we could bring that down, when we 
deployed parallelism 5 (from 3) we noticed 2 new functions share hosts with 2 
"older ones" and there are not metrics being shown either in grafana for those 
and 0 metrics from the pulsar-admin functions stats call.
----
2019-04-25 17:56:34 UTC - Devin G. Bost: I noticed that the instance with 
instance_id: "2" is missing from the list of subscriptions.
----
2019-04-25 17:56:49 UTC - Thor Sigurjonsson: those .999 quantile ones are 
around 100-125ms.
----
2019-04-25 17:57:23 UTC - Matteo Merli: It seems all 5 consumers (1 per 
function instance) are consuming at ~32 msg/s
----
2019-04-25 17:59:03 UTC - Matteo Merli: In the stats JSON, you have the 
`msgRateOut` for each consumer and the overall for the subscription
----
2019-04-25 18:02:36 UTC - Thor Sigurjonsson: when I do pulsar-admin functions 
status on that function I get this for instance 2:
```{
    "instanceId" : 2,
    "status" : {
      "running" : true,
      "error" : "",
      "numRestarts" : 0,
      "numReceived" : 0,
      "numSuccessfullyProcessed" : 0,
      "numUserExceptions" : 0,
      "latestUserExceptions" : [ ],
      "numSystemExceptions" : 0,
      "latestSystemExceptions" : [ ],
      "averageLatency" : 0.0,
      "lastInvocationTime" : 0,
      "workerId" : "REDACTED-8080"
    }```
----
2019-04-25 18:04:36 UTC - Devin G. Bost: I also only count 4 consumers.
----
2019-04-25 18:05:32 UTC - Thor Sigurjonsson: Also instance 3 and instance 2 are 
on the same host, and instance 2 shows no metrics from prometheus-grafana and 
instance 4 has ~100ms .999 quantile latency (and no data showing for instance 
2). It's roughly twice what other functions report... Made me guess maybe they 
were being rolled up for the host or something...
----
2019-04-25 18:07:45 UTC - Devin G. Bost: In the JSON output from `pulsar-admin 
topics stats $TOPIC`, I only see consumers with these instance_id values: 0, 3, 
1, 4. (2 is missing.)
----
2019-04-25 18:12:51 UTC - David Kjerrumgaard: Are there any errors in the log 
for instance 2?
----
2019-04-25 18:16:07 UTC - Thor Sigurjonsson: Full contents of log from instance 
2 at about the time of the parallelism update.
----
2019-04-25 18:20:33 UTC - Ruud Kamphuis: Interesting! Question: why does the 
name include kafka?
----
2019-04-25 18:21:18 UTC - Jerry Peng: @Thor Sigurjonsson there are not more 
logs for instance-2?  If so it seems to be getting stuck.
----
2019-04-25 18:21:37 UTC - Jerry Peng: @Thor Sigurjonsson do you guys have 
function state enabled?
----
2019-04-25 18:22:38 UTC - Thor Sigurjonsson: Hmm, we do have a log topic set...
----
2019-04-25 18:23:01 UTC - Thor Sigurjonsson: working to see about function 
state being enabled..
----
2019-04-25 18:25:06 UTC - Thor Sigurjonsson: would that be 
`stateStorageServiceUrl` in functions_worker.yml? (It's commented out).
----
2019-04-25 18:25:19 UTC - Jerry Peng: yes and gotcha
----
2019-04-25 18:25:47 UTC - Jerry Peng: you guys are running functions via Thread 
Runtime?
----
2019-04-25 18:26:09 UTC - Thor Sigurjonsson: Yes
----
2019-04-25 18:26:59 UTC - Thor Sigurjonsson: (kerberos jvm params plumbing the 
kafka connector made us go there for now)
----
2019-04-25 18:29:25 UTC - Jerry Peng: Give me a second to investigate
+1 : Thor Sigurjonsson, Devin G. Bost
----
2019-04-25 18:33:31 UTC - Thor Sigurjonsson: That's like OpenOffice calling 
their thing Microsofty. :slightly_smiling_face:
----
2019-04-25 18:38:23 UTC - Thor Sigurjonsson: Sorry about being flippant. 
:slightly_smiling_face: I get that there is a good search marketing angle to 
get streaming customers.. I'll give it a spin this week and see if I can give 
useful feedback.
----
2019-04-25 18:48:02 UTC - Ruud Kamphuis: Yeah to me the name is only gonna 
confuse people. People are looking for pulsar will see the name and think : 
nope, this is not what I want. People that want hosted kafka find this and 
think nope, this is not what I want. Just my 2 cents 
----
2019-04-25 18:48:26 UTC - Ruud Kamphuis: Great to have a hosted pulsar offering 
tho! :raised_hands:
----
2019-04-25 18:53:50 UTC - Chethan UK: @Chethan UK has joined the channel
----
2019-04-25 19:04:50 UTC - Jerry Peng: @Devin G. Bost @Thor Sigurjonsson I have 
reproduced the issue.  There was a PR that went in earlier this year that might 
have be causing race conditions when using the same pulsar client to create 
consumers as is the case for running functions via ThreadRuntime.  I am looking 
for a fix the issue.

  In the meantime, do you guys want to  try running with process runtime? We 
have added the ability to add runtime flags so that your kerberos configs can 
get passed in.  The functionality is not in a official release so you guys can 
either 1) try to build your own pulsar release from master or 2) you can try a 
streamlio pulsar release that contains the functionality since we create 
releases more often that apache does.
+1 : Thor Sigurjonsson
----
2019-04-25 19:05:21 UTC - Devin G. Bost: &gt; I have reproduced the issue.  
There was a PR that went in earlier this year that might have be causing race 
conditions when using the same pulsar client to create consumers as is the case 
for running functions via ThreadRuntime.  I am looking for a fix the issue.

Very impressive.
----
2019-04-25 19:06:51 UTC - Jerry Peng: Thanks! Can’t take all the credit.  
@Matteo Merli also helped
+1 : Devin G. Bost
----
2019-04-25 19:13:11 UTC - Devin G. Bost: Is there a temporary workaround?

We did have some concerns about the memory utilization that might be associated 
with the process runtime (with parallelization). Would we increase our memory 
requirements if we parallelized with the process runtime instead of 
parallelizing with the threading runtime?
----
2019-04-25 19:15:46 UTC - Thor Sigurjonsson: I think there are a few things 
that go into the decision for us: 1) bug fixes we need, 2) what runtime to 
"settle on" (and in which cluster maybe) 3) functions support for publishing 
properties and then the timing and how we roll out the prod env right now being 
used. We can roll faster in lower environment, but it would be good to pick a 
release soon that gets the most bang for the buck.
----
2019-04-25 19:16:22 UTC - Thor Sigurjonsson: This parallelism issue is not 
critical just yet, but it is part of 1) above I think.
----
2019-04-25 19:16:57 UTC - Thor Sigurjonsson: I guess we should consider the 
streamlio build also going forward.
----
2019-04-25 19:17:51 UTC - Thor Sigurjonsson: I'm guessing much of what we'd 
need would be in 2.3.2 (I may be wrong).
----
2019-04-25 19:20:08 UTC - Thor Sigurjonsson: Would we be getting those from 
here <https://hub.docker.com/r/streamlio/pulsar/tags> ? if we were rolling with 
docker?
----
2019-04-25 19:27:29 UTC - Jerry Peng: @Thor Sigurjonsson yes but its currently 
does not have the latest image.  We are actually in the process of doing a 
another release.  A new image should be up in the next half an hour
+1 : Thor Sigurjonsson
----
2019-04-25 19:28:22 UTC - Jerry Peng: @Devin G. Bost I am not sure of a 
temporary workaround at this moment, but this issue doesn’t happen everytime.  
It is a race condition.  I am only able to reproduce it once out of the many 
times I have tried.
----
2019-04-25 19:34:46 UTC - Chethan UK: Has anyone used MongoDB Source connector?
----
2019-04-25 19:45:57 UTC - David Kjerrumgaard: Not yet, are you having issues?
----
2019-04-25 19:47:22 UTC - Chethan UK: 
<https://pulsar.apache.org/docs/en/io-cdc/>

is there a good tutorial on MongoDB *Source*?
----
2019-04-25 19:47:57 UTC - Ali Ahmed: 
<https://github.com/bbonnin/pulsar-io-mongo>
----
2019-04-25 19:48:11 UTC - Chethan UK: Its sink, I want source
----
2019-04-25 19:48:49 UTC - Ali Ahmed: sorry the source is just debezium you can 
try debezium docs
----
2019-04-25 19:54:51 UTC - Chethan UK: Where is the helm chart 
<https://pulsar.apache.org/docs/en/deploy-kubernetes/#deploying-pulsar-components-helm>
 ?
----
2019-04-25 19:59:16 UTC - David Kjerrumgaard: @Chethan UK It is bundled with 
the code. If you close the pulsar repo, then go to 
/apache/pulsar/deployment/kubernetes/helm/pulsar
----
2019-04-25 19:59:52 UTC - Devin G. Bost: Gotcha.
----
2019-04-25 20:32:34 UTC - Jerry Peng: @Devin G. Bost @Thor Sigurjonsson a new 
docker release is not available:
<https://hub.docker.com/r/streamlio/pulsa>
----
2019-04-25 20:32:46 UTC - Jerry Peng: sorry: 
<https://hub.docker.com/r/streamlio/pulsar/tags>
----
2019-04-25 20:43:03 UTC - Devin G. Bost: Thanks!
----
2019-04-25 22:28:54 UTC - Steven Le Roux: Hi, I've deployed a local instance of 
pulsar, but with separated components (zk, bk)
----
2019-04-25 22:29:44 UTC - Steven Le Roux: Bk seems ok so far (bk shell 
listbookies, is listing bookies), they're registred into zk properly under 
/ledgers/available
----
2019-04-25 22:30:16 UTC - Steven Le Roux: but when starting pulsar, it connects 
to zk, then :
22:03:43.753 [main] ERROR org.apache.bookkeeper.client.BookieWatcherImpl - 
Failed to get bookie list :
----
2019-04-25 22:30:55 UTC - Steven Le Roux: I can't find where to configure the 
ledger zk path, but anyway, it defaults to /ledgers which should be fine :
----
2019-04-25 22:30:56 UTC - Steven Le Roux: 22:03:43.583 [main] INFO  
org.apache.bookkeeper.meta.zk.ZKMetadataDriverBase - Initialize zookeeper 
metadata driver with external zookeeper client : ledgersRootPath = /ledgers.
----
2019-04-25 22:31:13 UTC - Steven Le Roux: any idea what I'm missing ?
----
2019-04-25 22:31:53 UTC - Matteo Merli: What’s the `zookeeperServers` settings 
in `broker.conf`?
----
2019-04-25 22:33:24 UTC - Steven Le Roux: 
zookeeperServers=10.0.0.2:2181/pulsar-local
----
2019-04-25 22:33:54 UTC - Steven Le Roux: I've reduced to one for testing but 
there are three of them
----
2019-04-25 22:33:55 UTC - Matteo Merli: I see, you’re using a chroot for ZK
----
2019-04-25 22:34:05 UTC - Steven Le Roux: yes
----
2019-04-25 22:34:14 UTC - Matteo Merli: is BK also using the same chroot?
----
2019-04-25 22:34:36 UTC - Steven Le Roux: also, I'm testing to chroot zk so 
that I can collocalize local zk and global zk for testing purpose
----
2019-04-25 22:35:25 UTC - Steven Le Roux: ok from what you're saying, pulsar is 
expecting to read /ledgers at /pulsar-local/ledgers then ?
----
2019-04-25 22:35:31 UTC - Matteo Merli: You can co-locate them without needing 
the chroot
----
2019-04-25 22:35:47 UTC - Matteo Merli: the “global” zk is only using `/admin/` 
prefix
----
2019-04-25 22:36:02 UTC - Steven Le Roux: ok perfect
----
2019-04-25 22:36:23 UTC - Matteo Merli: &gt; ok from what you’re saying, pulsar 
is expecting to read /ledgers at /pulsar-local/ledgers then ?

Yes, both Pulsar and BK should share the same chroot
----
2019-04-25 22:36:36 UTC - Steven Le Roux: ok that's why
----
2019-04-25 22:36:39 UTC - Steven Le Roux: thx, testing ;:)
----
2019-04-25 22:36:42 UTC - Matteo Merli: :slightly_smiling_face:
----
2019-04-25 22:43:08 UTC - Steven Le Roux: Far better :wink: thx @Matteo Merli!
+1 : Matteo Merli
----
2019-04-26 00:10:48 UTC - Grant Wu: @Sijie Guo did you figure anything out 
about <https://github.com/apache/bookkeeper/issues/1970> ?
----
2019-04-26 01:05:37 UTC - Jerry Peng: @Grant Wu I talked with a few users that 
saw this problem, they all had errors in their bookies when this error was 
occurring.  There wasn’t enough non-faulty bookies in the cluster and that is 
what is causing this exception.
----
2019-04-26 01:06:58 UTC - Grant Wu: Interesting
----
2019-04-26 01:09:37 UTC - durga: Ok. Thanks @Matteo Merli
----
2019-04-26 02:26:48 UTC - Sijie Guo: @Grant Wu I was looking into that issue 
before but I didn’t get to the root cause yet. it is still on my backlog. even 
as what @Jerry Peng there wasn’t enough non-faulty bookies in the cluster, that 
bookkeeper should handle that. the ArrayIndexOutoOfBoundsException doesn’t 
sound right to me.

but anyway I will look into it whenever I have time.
----

Slack digest for #general - 2019-04-26

Reply via email to